Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
95 commits
Select commit Hold shift + click to select a range
4f63867
Add prospectus
maxrjones Mar 9, 2026
fb96207
Initial prospectus POC
maxrjones Mar 9, 2026
b3e72ec
V2 prospectus
maxrjones Mar 9, 2026
d6d551a
V3 prospectus
maxrjones Mar 9, 2026
8b8af74
Fastforward POC to V3
maxrjones Mar 9, 2026
784f4e7
Remove prospectus
maxrjones Mar 9, 2026
f1a1bc3
Fix sharding
maxrjones Mar 10, 2026
30fa867
Fix bugs
maxrjones Mar 10, 2026
1282c7b
Support sequence in array functions
maxrjones Mar 10, 2026
ea89f33
Add end-to-end tests
maxrjones Mar 10, 2026
2eba460
Collapse indexing paths
maxrjones Mar 10, 2026
fa07396
Add DimensionGrid protocol
maxrjones Mar 10, 2026
a0acd95
Remove the try/except escape hatch from ChunkGrid.chunk_shape
maxrjones Mar 10, 2026
ce0527d
Cache is_regular
maxrjones Mar 10, 2026
f433668
Produce RLE directly
maxrjones Mar 10, 2026
02fd7c5
Fix bugs
maxrjones Mar 10, 2026
42ef639
Separate chunk grid serialization
maxrjones Mar 10, 2026
55e720b
Retain comments
maxrjones Mar 10, 2026
1b7871d
Update block and coordinate indexing
maxrjones Mar 10, 2026
9c0f582
POC: TiledDimension
maxrjones Mar 10, 2026
1e2fa97
Support rectilinear shards
maxrjones Mar 10, 2026
1f424b0
Revert "POC: TiledDimension"
maxrjones Mar 11, 2026
0967e53
Fix __getitem__ for 1d chunk grids
maxrjones Mar 11, 2026
67a684d
Implement resize
maxrjones Mar 11, 2026
61c48a4
Fix spec compliance
maxrjones Mar 11, 2026
7d5ebb8
Fix .info
maxrjones Mar 11, 2026
e74586a
Fix typing
maxrjones Mar 11, 2026
8ab3ca8
Adopt joe's property testing strategy
maxrjones Mar 11, 2026
80d8280
Remove RegularChunkGrid
maxrjones Mar 11, 2026
ffc7805
Use none rather than sentinel value
maxrjones Mar 11, 2026
9beaee6
Remove regular chunk grid
maxrjones Mar 11, 2026
da2c08b
Fix boundary handling in VaryingDimension
maxrjones Mar 12, 2026
6d9de38
Add chunk_sizes property
maxrjones Mar 12, 2026
cc2999a
Add docs
maxrjones Mar 12, 2026
b47ddba
Improve polymorphism
maxrjones Mar 12, 2026
44d845f
Merge branch 'main' into poc/unified-chunk-grid
maxrjones Mar 12, 2026
2caa927
always return based on inner chunks
maxrjones Mar 12, 2026
6af91a6
Fix from_array
maxrjones Mar 12, 2026
e04d864
Add V3 of the prospectus
maxrjones Mar 12, 2026
8dcea81
Fastforward design docs
maxrjones Mar 12, 2026
4eb01c5
Require array extent
maxrjones Mar 12, 2026
d893d6f
Add overflow chunk tests
maxrjones Mar 12, 2026
308bb24
Design doc for chunk grid metadata separation
maxrjones Mar 12, 2026
0f52822
minor simplifications
maxrjones Mar 12, 2026
5823fbb
Gatekeep rectilinear chunks behind feature flag
maxrjones Mar 13, 2026
27f28e7
Fix off-by-one bug
maxrjones Mar 13, 2026
a35cf56
Fix chunk indexing boundary checks
maxrjones Mar 13, 2026
cbb28fe
Standardize docstrings
maxrjones Mar 13, 2026
280eb68
fix spec compliance
maxrjones Mar 13, 2026
e88c06b
Handle integer floats
maxrjones Mar 13, 2026
58bd336
More spec compliance
maxrjones Mar 13, 2026
e0fbab4
Fix block indexing error
maxrjones Mar 13, 2026
5277739
Add V2 regression tests
maxrjones Mar 13, 2026
c9858c0
Add comments
maxrjones Mar 14, 2026
9e4fa30
Consistent bounds checking between dimension types
maxrjones Mar 14, 2026
a21d587
use pre-computed extent
maxrjones Mar 14, 2026
e062580
Improve sharding validation logic
maxrjones Mar 14, 2026
3591734
Improve sharding validation logic
maxrjones Mar 14, 2026
4be96b0
Remove deferred design
maxrjones Mar 14, 2026
087382b
Update design doc
maxrjones Mar 14, 2026
38fd5aa
Remove unnecessary casts
maxrjones Mar 14, 2026
bbc0703
Improve typing
maxrjones Mar 14, 2026
460d683
Add another deferred item
maxrjones Mar 14, 2026
73164b6
Add to design doc
maxrjones Mar 14, 2026
aec0abd
Add design principles
maxrjones Mar 14, 2026
abb9d9d
Polish design doc
maxrjones Mar 14, 2026
aa002c8
Update migration sequence
maxrjones Mar 14, 2026
fffe4da
Remove stale sections
maxrjones Mar 14, 2026
6777ec5
Use TypeGuard
maxrjones Mar 14, 2026
adec422
Cache nchunks
maxrjones Mar 14, 2026
4903b09
Add cubed example
maxrjones Mar 14, 2026
67e540c
move chunk grid off metadata (#6)
d-v-b Mar 20, 2026
14370e6
Fixup after refactor
maxrjones Mar 20, 2026
bfc5d6b
Fixup
maxrjones Mar 20, 2026
0f78339
Remove duplicated code
maxrjones Mar 21, 2026
fa6980d
Add to experimental
maxrjones Mar 21, 2026
2360392
Avoid divide by zero
maxrjones Mar 21, 2026
21aa18b
Improve RLE validation
maxrjones Mar 21, 2026
90476b8
Improve RLE validation
maxrjones Mar 21, 2026
7e171f5
Raise error on unknown chunk grid
maxrjones Mar 21, 2026
b6b271f
Add utility function
maxrjones Mar 21, 2026
c19e9db
Minor improvements
maxrjones Mar 21, 2026
764eeaf
Update shorthand
maxrjones Mar 21, 2026
54b399d
Fix zero chunks
maxrjones Mar 21, 2026
becd392
Remove extraneous validation
maxrjones Mar 21, 2026
6764ba1
Improve tests
maxrjones Mar 21, 2026
9b36448
Improve docstrings
maxrjones Mar 21, 2026
11a47ff
Update design doc
maxrjones Mar 21, 2026
4d7c724
Update docs
maxrjones Mar 21, 2026
826e030
DRY
maxrjones Mar 21, 2026
6f51e1c
Add test
maxrjones Mar 21, 2026
879f20f
Simplify
maxrjones Mar 21, 2026
edbdb5d
Consistent .chunks and .shards
maxrjones Mar 21, 2026
4a940b1
Remove separators
maxrjones Mar 21, 2026
475de21
Polish
maxrjones Mar 21, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
620 changes: 620 additions & 0 deletions docs/design/chunk-grid.md

Large diffs are not rendered by default.

165 changes: 165 additions & 0 deletions docs/user-guide/arrays.md
Original file line number Diff line number Diff line change
Expand Up @@ -599,6 +599,171 @@ In this example a shard shape of (1000, 1000) and a chunk shape of (100, 100) is
This means that `10*10` chunks are stored in each shard, and there are `10*10` shards in total.
Without the `shards` argument, there would be 10,000 chunks stored as individual files.

## Rectilinear (variable) chunk grids

!!! warning "Experimental"
Rectilinear chunk grids are an experimental feature and may change in
future releases. This feature is expected to stabilize in Zarr version 3.3.

Because the feature is still stabilizing, it is disabled by default and
must be explicitly enabled:

```python
import zarr
zarr.config.set({"array.rectilinear_chunks": True})
```

Or via the environment variable `ZARR_ARRAY__RECTILINEAR_CHUNKS=True`.

The examples below assume this config has been set.

By default, Zarr arrays use a regular chunk grid where every chunk along a
given dimension has the same size (except possibly the final boundary chunk).
Rectilinear chunk grids allow each chunk along a dimension to have a different
size. This is useful when the natural partitioning of the data is not uniform —
for example, satellite swaths of varying width, time series with irregular
intervals, or spatial tiles of different extents.

### Creating arrays with rectilinear chunks

To create an array with rectilinear chunks, pass a nested list to the `chunks`
parameter where each inner list gives the chunk sizes along one dimension:

```python exec="true" session="arrays" source="above" result="ansi"
zarr.config.set({"array.rectilinear_chunks": True})
z = zarr.create_array(
store=zarr.storage.MemoryStore(),
shape=(60, 100),
chunks=[[10, 20, 30], [50, 50]],
dtype='int32',
)
print(z.info)
```

In this example the first dimension is split into three chunks of sizes 10, 20,
and 30, while the second dimension is split into two equal chunks of size 50.

### Reading and writing data

Rectilinear arrays support the same indexing interface as regular arrays.
Reads and writes that cross chunk boundaries of different sizes are handled
automatically:

```python exec="true" session="arrays" source="above" result="ansi"
import numpy as np
data = np.arange(60 * 100, dtype='int32').reshape(60, 100)
z[:] = data
# Read a slice that spans the first two chunks (sizes 10 and 20) along axis 0
print(z[5:25, 0:5])
```

### Inspecting chunk sizes

The `.write_chunk_sizes` property returns the actual data size of each storage
chunk along every dimension. It works for both regular and rectilinear arrays
and returns a tuple of tuples (matching the dask `Array.chunks` convention).
When sharding is used, `.read_chunk_sizes` returns the inner chunk sizes instead:

```python exec="true" session="arrays" source="above" result="ansi"
print(z.write_chunk_sizes)
```

For regular arrays, this includes the boundary chunk:

```python exec="true" session="arrays" source="above" result="ansi"
z_regular = zarr.create_array(
store=zarr.storage.MemoryStore(),
shape=(100, 80),
chunks=(30, 40),
dtype='int32',
)
print(z_regular.write_chunk_sizes)
```

Note that the `.chunks` property is only available for regular chunk grids. For
rectilinear arrays, use `.write_chunk_sizes` (or `.read_chunk_sizes`) instead.

### Resizing and appending

Rectilinear arrays can be resized. When growing past the current edge sum, a
new chunk is appended covering the additional extent. When shrinking, the chunk
edges are preserved and the extent is re-bound (chunks beyond the new extent
simply become inactive):

```python exec="true" session="arrays" source="above" result="ansi"
z = zarr.create_array(
store=zarr.storage.MemoryStore(),
shape=(30,),
chunks=[[10, 20]],
dtype='float64',
)
z[:] = np.arange(30, dtype='float64')
print(f"Before resize: chunk_sizes={z.write_chunk_sizes}")
z.resize((50,))
print(f"After resize: chunk_sizes={z.write_chunk_sizes}")
```

The `append` method also works with rectilinear arrays:

```python exec="true" session="arrays" source="above" result="ansi"
z.append(np.arange(10, dtype='float64'))
print(f"After append: shape={z.shape}, chunk_sizes={z.write_chunk_sizes}")
```

### Compressors and filters

Rectilinear arrays work with all codecs — compressors, filters, and checksums.
Since each chunk may have a different size, the codec pipeline processes each
chunk independently:

```python exec="true" session="arrays" source="above" result="ansi"
z = zarr.create_array(
store=zarr.storage.MemoryStore(),
shape=(60, 100),
chunks=[[10, 20, 30], [50, 50]],
dtype='float64',
filters=[zarr.codecs.TransposeCodec(order=(1, 0))],
compressors=[zarr.codecs.BloscCodec(cname='zstd', clevel=3)],
)
z[:] = np.arange(60 * 100, dtype='float64').reshape(60, 100)
np.testing.assert_array_equal(z[:], np.arange(60 * 100, dtype='float64').reshape(60, 100))
print("Roundtrip OK")
```

### Rectilinear shard boundaries

Rectilinear chunk grids can also be used for shard boundaries when combined
with sharding. In this case, the outer grid (shards) is rectilinear while the
inner chunks remain regular. Each shard dimension must be divisible by the
corresponding inner chunk size:

```python exec="true" session="arrays" source="above" result="ansi"
z = zarr.create_array(
store=zarr.storage.MemoryStore(),
shape=(120, 100),
chunks=(10, 10),
shards=[[60, 40, 20], [50, 50]],
dtype='int32',
)
z[:] = np.arange(120 * 100, dtype='int32').reshape(120, 100)
print(z[50:70, 40:60])
```

Note that rectilinear inner chunks with sharding are not supported — only the
shard boundaries can be rectilinear.

### Metadata format

Rectilinear chunk grid metadata uses run-length encoding (RLE) for compact
serialization. When reading metadata, both bare integers and `[value, count]`
pairs are accepted:

- `[10, 20, 30]` — three chunks with explicit sizes
- `[[10, 3]]` — three chunks of size 10 (RLE shorthand)
- `[[10, 3], 5]` — three chunks of size 10, then one chunk of size 5

When writing, Zarr automatically compresses repeated values into RLE format.

## Missing features in 3.0

The following features have not been ported to 3.0 yet.
Expand Down
1 change: 1 addition & 0 deletions docs/user-guide/config.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ Configuration options include the following:
- Default Zarr format `default_zarr_version`
- Default array order in memory `array.order`
- Whether empty chunks are written to storage `array.write_empty_chunks`
- Enable experimental rectilinear chunk grids `array.rectilinear_chunks`
- Async and threading options, e.g. `async.concurrency` and `threading.max_workers`
- Selections of implementations of codecs, codec pipelines and buffers
- Enabling GPU support with `zarr.config.enable_gpu()`. See GPU support for more.
Expand Down
2 changes: 2 additions & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,8 @@ nav:
- Creation sub-module: api/zarr/deprecated/creation.md
- release-notes.md
- contributing.md
- Design documents:
- design/chunk-grid.md
watch:
- src/zarr
- docs
Expand Down
14 changes: 7 additions & 7 deletions src/zarr/abc/codec.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,10 +17,10 @@

from zarr.abc.store import ByteGetter, ByteSetter, Store
from zarr.core.array_spec import ArraySpec
from zarr.core.chunk_grids import ChunkGrid
from zarr.core.dtype.wrapper import TBaseDType, TBaseScalar, ZDType
from zarr.core.indexing import SelectorTuple
from zarr.core.metadata import ArrayMetadata
from zarr.core.metadata.v3 import ChunkGridMetadata

__all__ = [
"ArrayArrayCodec",
Expand Down Expand Up @@ -140,7 +140,7 @@ def validate(
*,
shape: tuple[int, ...],
dtype: ZDType[TBaseDType, TBaseScalar],
chunk_grid: ChunkGrid,
chunk_grid: ChunkGridMetadata,
) -> None:
"""Validates that the codec configuration is compatible with the array metadata.
Raises errors when the codec configuration is not compatible.
Expand All @@ -151,8 +151,8 @@ def validate(
The array shape
dtype : np.dtype[Any]
The array data type
chunk_grid : ChunkGrid
The array chunk grid
chunk_grid : ChunkGridMetadata
The array chunk grid metadata
"""

async def _decode_single(self, chunk_data: CodecOutput, chunk_spec: ArraySpec) -> CodecInput:
Expand Down Expand Up @@ -357,7 +357,7 @@ def validate(
*,
shape: tuple[int, ...],
dtype: ZDType[TBaseDType, TBaseScalar],
chunk_grid: ChunkGrid,
chunk_grid: ChunkGridMetadata,
) -> None:
"""Validates that all codec configurations are compatible with the array metadata.
Raises errors when a codec configuration is not compatible.
Expand All @@ -368,8 +368,8 @@ def validate(
The array shape
dtype : np.dtype[Any]
The array data type
chunk_grid : ChunkGrid
The array chunk grid
chunk_grid : ChunkGridMetadata
The array chunk grid metadata
"""
...

Expand Down
22 changes: 15 additions & 7 deletions src/zarr/api/synchronous.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
from zarr.errors import ZarrDeprecationWarning

if TYPE_CHECKING:
from collections.abc import Iterable
from collections.abc import Iterable, Sequence

import numpy as np
import numpy.typing as npt
Expand Down Expand Up @@ -822,7 +822,7 @@ def create_array(
shape: ShapeLike | None = None,
dtype: ZDTypeLike | None = None,
data: np.ndarray[Any, np.dtype[Any]] | None = None,
chunks: tuple[int, ...] | Literal["auto"] = "auto",
chunks: tuple[int, ...] | Sequence[Sequence[int]] | Literal["auto"] = "auto",
shards: ShardsLike | None = None,
filters: FiltersLike = "auto",
compressors: CompressorsLike = "auto",
Expand Down Expand Up @@ -858,9 +858,13 @@ def create_array(
data : np.ndarray, optional
Array-like data to use for initializing the array. If this parameter is provided, the
``shape`` and ``dtype`` parameters must be ``None``.
chunks : tuple[int, ...] | Literal["auto"], default="auto"
chunks : tuple[int, ...] | Sequence[Sequence[int]] | Literal["auto"], default="auto"
Chunk shape of the array.
If chunks is "auto", a chunk shape is guessed based on the shape of the array and the dtype.
A nested list of per-dimension edge sizes creates a rectilinear grid.
Rectilinear chunk grids are experimental and must be explicitly enabled
with ``zarr.config.set({'array.rectilinear_chunks': True})`` while the
feature is stabilizing.
shards : tuple[int, ...], optional
Shard shape of the array. The default value of ``None`` results in no sharding at all.
filters : Iterable[Codec] | Literal["auto"], optional
Expand Down Expand Up @@ -993,7 +997,7 @@ def from_array(
data: AnyArray | npt.ArrayLike,
write_data: bool = True,
name: str | None = None,
chunks: Literal["auto", "keep"] | tuple[int, ...] = "keep",
chunks: Literal["auto", "keep"] | tuple[int, ...] | Sequence[Sequence[int]] = "keep",
shards: ShardsLike | None | Literal["keep"] = "keep",
filters: FiltersLike | Literal["keep"] = "keep",
compressors: CompressorsLike | Literal["keep"] = "keep",
Expand Down Expand Up @@ -1025,13 +1029,17 @@ def from_array(
name : str or None, optional
The name of the array within the store. If ``name`` is ``None``, the array will be located
at the root of the store.
chunks : tuple[int, ...] or "auto" or "keep", optional
chunks : tuple[int, ...] or Sequence[Sequence[int]] or "auto" or "keep", optional
Chunk shape of the array.
Following values are supported:

- "auto": Automatically determine the chunk shape based on the array's shape and dtype.
- "keep": Retain the chunk shape of the data array if it is a zarr Array.
- tuple[int, ...]: A tuple of integers representing the chunk shape.
- "keep": Retain the chunk grid of the data array if it is a zarr Array.
- tuple[int, ...]: A tuple of integers representing the chunk shape (regular grid).
- Sequence[Sequence[int]]: Per-dimension chunk edge lists (rectilinear grid).
Rectilinear chunk grids are experimental and must be explicitly enabled
with ``zarr.config.set({'array.rectilinear_chunks': True})`` while the
feature is stabilizing.

If not specified, defaults to "keep" if data is a zarr Array, otherwise "auto".
shards : tuple[int, ...], optional
Expand Down
Loading