BF: Make Zarr validation/digest tests work with zarr-python 2.x and 3.x#1858
Merged
Conversation
zarr-python 3.x changed both the public API and on-disk format: * `zarr.errors.PathNotFoundError` was removed; missing-metadata errors are now `GroupNotFoundError` / `ArrayNotFoundError` / `NodeNotFoundError`. * `bool(group)` / `len(group)` eagerly iterate members and raise on non-Zarr files (e.g. an arbitrary `a/b/c/...` tree) or arrays with metadata the new library cannot parse (e.g. legacy `>u1` dtypes). * `zarr.open()` can partially succeed on malformed V3 stores where root metadata is valid but child metadata is not, which would bypass the tensorstore-based V3 validator. * Default serialisation is now V3 (`zarr.json`, `c/<chunk>` layout, different default compressor), so on-disk digests of "the same" data differ between V2 and V3 outputs. `dandi/files/zarr.py`: * Detect the Zarr format version from disk up front and always route V3 stores through `_ts_validate_zarr3`, regardless of zarr-python version. * Add `_is_empty_group` helper that swallows exceptions from zarr-python 3.x's eager member iteration — there is *some* content, even if it's not a valid Zarr member, so don't spuriously report `dandi_zarr.empty_group`. * Drop the explicit `zarr.errors.PathNotFoundError` clause (which only exists in 2.x) in favour of a single generic `except Exception` after the V3 fast-path. Tests: * `test_digest_zarr`, `test_digest_zarr_with_excluded_dotfiles` and `test_zarr_properties` now key their expected digests/sizes on the Zarr serialisation format actually produced (`get_zarr_format_version` on the written tree) rather than on `zarr.__version__`. This stays correct if zarr-python 3.x is later configured to write `zarr_format=2`. Verified against zarr 2.18.7 and zarr 3.1.5. Co-Authored-By: Claude Code 2.1.144 / Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #1858 +/- ##
==========================================
+ Coverage 76.26% 76.38% +0.12%
==========================================
Files 87 87
Lines 12512 12543 +31
==========================================
+ Hits 9542 9581 +39
+ Misses 2970 2962 -8
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Follow-up to eba4884. Two issues remained when running against zarr-python 3.x: 1. **typing**: `_ZARR_PROPERTIES_EXPECTED[fmt]` failed mypy because `get_zarr_format_version()` returns `Optional[str]`. 2. **upload/download tests**: hardcoded the V2 on-disk layout (``.zgroup`` / ``arr_X/.zarray`` / ``arr_X/0``), so they failed on zarr-python 3.x which writes V3 (``zarr.json`` / ``arr_X/zarr.json`` / ``arr_X/c/0``). `dandi/tests/test_files.py`: * Add a `zarr_format_of(path) -> str` test helper that wraps `get_zarr_format_version` and asserts the result is non-None (fixes the mypy index error in `test_zarr_properties`). * Add `_TWO_ARRAY_ZARR_LAYOUT` — a `TypedDict`-keyed mapping from format version ("2" / "3") to the expected on-disk layout of the canonical ``zarr.save(p, arr_0, arr_1)`` two-array sample. Used by all upload/download tests below. * Rewrite `test_upload_zarr`, `test_upload_zarr_with_excluded_dotfiles`, and `test_upload_zarr_entry_content_type` to derive expected entry paths and the root-metadata filename from the layout dict rather than hardcoding V2-only paths. `dandi/tests/test_download.py`: * Reuse `_TWO_ARRAY_ZARR_LAYOUT` / `zarr_format_of` for `test_download_different_zarr_onto_excluded_dotfiles` and `test_download_multiple_urls`, both of which check exact post-download trees of the two-array sample fixture. * In `test_download_different_zarr_delete_dir` gate the "no subdirs" sanity check on V2 — the ``np.eye(5)`` Zarr is flat in V2 but has a ``c/`` chunk directory in V3. The downstream `assert_dirtrees_eq` still exercises the actual download-replaces-tree behaviour in both. `dandi/cli/tests/test_digest.py`: same `zarr_format_of` helper applied to the digest tests so they share one source of truth for format detection. Verified against zarr 2.18.3 and 3.1.5 locally; mypy clean (no new errors); flake8 clean. Co-Authored-By: Claude Code 2.1.144 / Claude Opus 4.7 (1M context) <noreply@anthropic.com>
With zarr-python 3.x installed, mypy picks up the bundled type stubs and the `NDArrayLike` protocol rejects `numpy.ndarray` because of minor divergence in `ndarray.reshape` overloads. This surfaces as ~30 `arg-type` errors on `zarr.save(path, np.arange(...))` calls across the test suite (none of them new — the same calls have existed for years). zarr-python 2.x ships no stubs so the same calls are silently untyped, which is why CI typing (currently resolving to zarr 2.18) passes while running `tox -e typing` locally against zarr 3.x does not. Add a `[[tool.mypy.overrides]]` entry pinning `follow_imports = "skip"` for `zarr.*`, so zarr is treated as untyped regardless of which zarr-python version is installed. This matches the effective CI behaviour and removes the version-dependent typing noise. After this change, `mypy dandi` reports zero errors locally against zarr 3.1.5. Co-Authored-By: Claude Code 2.1.144 / Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ent-Type
``UploadItem.from_entry`` only recognised the Zarr V2 metadata filenames
(``.zarray`` / ``.zattrs`` / ``.zgroup`` / ``.zmetadata``) as JSON and
set ``Content-Type: application/json`` on the upload accordingly. For
V3 stores written by zarr-python 3.x the per-node metadata lives in a
single ``zarr.json`` file at each group/array root; the previous code
left ``content_type = None`` for those, so the server stored them as
``binary/octet-stream`` — visible in ``test_upload_zarr_entry_content_type``
on zarr-python 3.x:
AssertionError: assert 'binary/octet-stream' == 'application/json'
Add ``"zarr.json"`` to the JSON-metadata set so V3 metadata files get
the correct Content-Type on upload, matching long-standing V2 behaviour.
Co-Authored-By: Claude Code 2.1.144 / Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Code review (eba4884..5f87b6a) flagged a few rough edges. None change behaviour: * Move `zarr_format_of` and the `TWO_ARRAY_ZARR_LAYOUT` mapping from `dandi/tests/test_files.py` to `dandi/tests/test_helpers.py` and drop the leading underscore from the now genuinely cross-module name. The previous arrangement cross-imported between test modules (including across packages from `dandi/cli/tests/test_digest.py`), which would silently break on rename or pytest collection-order changes. The underscore-prefixed name was also misleading since the symbol was imported elsewhere. `test_helpers.py` is the existing pattern for shared test utilities — keeping the `test_*.py` filename so pytest preserves assertion rewriting on the `assert` inside `zarr_format_of`. * Collapse the two byte-identical `ValidationResult` arms in `ZarrAsset.get_validation_errors` into one append, with the message chosen via a conditional. The two arms differed only in the message string and risked drift over time. * Add a `lgr.debug` line in `_is_empty_group`'s broad-`except` so that a real bug (as opposed to the expected zarr-python 3.x eager-iteration raise) is at least discoverable in debug logs rather than silently swallowed. * Tighten `test_download_different_zarr_delete_dir` to assert the V3 invariant (`initial_subdirs == {"c"}`) instead of just skipping the no-subdirs check on V3. The test still exercises the contrast between the smaller single-array Zarr and the bigger two-array one that the download must reconcile. Verified mypy clean against zarr 3.1.5; pytest passes against zarr 2.18.7 and 3.1.5; pre-commit hooks all pass. Co-Authored-By: Claude Code 2.1.144 / Claude Opus 4.7 (1M context) <noreply@anthropic.com>
yarikoptic-gitmate
approved these changes
May 21, 2026
|
🚀 PR was released in |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
We (me with @kabilar ) did raise upper version for zarr but it was not in effect due to hdmf-zarr blocking it via nwbexpector dependency for us. That was addressed and thus we started to upgrade into that zarr 3.
zarr-python 3.x changed both the public API and on-disk format:
zarr.errors.PathNotFoundErrorwas removed; missing-metadata errors are nowGroupNotFoundError/ArrayNotFoundError/NodeNotFoundError.bool(group)/len(group)eagerly iterate members and raise on non-Zarr files (e.g. an arbitrarya/b/c/...tree) or arrays with metadata the new library cannot parse (e.g. legacy>u1dtypes).zarr.open()can partially succeed on malformed V3 stores where root metadata is valid but child metadata is not, which would bypass the tensorstore-based V3 validator.zarr.json,c/<chunk>layout, different default compressor), so on-disk digests of "the same" data differ between V2 and V3 outputs.dandi/files/zarr.py:_ts_validate_zarr3, regardless of zarr-python version._is_empty_grouphelper that swallows exceptions from zarr-python 3.x's eager member iteration — there is some content, even if it's not a valid Zarr member, so don't spuriously reportdandi_zarr.empty_group.zarr.errors.PathNotFoundErrorclause (which only exists in 2.x) in favour of a single genericexcept Exceptionafter the V3 fast-path.Tests:
test_digest_zarr,test_digest_zarr_with_excluded_dotfilesandtest_zarr_propertiesnow key their expected digests/sizes on the Zarr serialisation format actually produced (get_zarr_format_versionon the written tree) rather than onzarr.__version__. This stays correct if zarr-python 3.x is later configured to writezarr_format=2.Verified against zarr 2.18.7 and zarr 3.1.5.