Skip to content

Fixes #881: dask-safe unique in zonal_stats and crosstab#894

Merged
brendancol merged 1 commit intomasterfrom
fix/881-zonal-dask-materialise
Feb 25, 2026
Merged

Fixes #881: dask-safe unique in zonal_stats and crosstab#894
brendancol merged 1 commit intomasterfrom
fix/881-zonal-dask-materialise

Conversation

@brendancol
Copy link
Contributor

Summary

  • zonal_stats() and crosstab() called np.unique(zones[np.isfinite(zones)]) at the top of their dask code paths. Both np.isfinite (boolean fancy-indexing) and np.unique silently force full materialisation of the dask array into RAM — guaranteed OOM on large rasters.
  • Added _unique_finite_zones() and _unique_finite_cats() helpers that use da.unique() (per-chunk reduction) then .compute() only the tiny result (just the distinct zone IDs). Updated all 5 call sites (2 dask paths + 2 numpy paths for consistency + 1 in _find_cats).
  • Wrapped the zone column in _stats_dask_numpy with da.from_delayed(..., shape=(np.nan,)) to preserve unknown-chunk metadata expected by downstream dd.from_dask_array.

Test plan

  • All 47 existing test_zonal.py tests pass unchanged
  • Added test_stats_does_not_materialise_dask_zones — monkeypatches np.unique to raise if passed a da.Array
  • Added test_crosstab_does_not_materialise_dask_zones — same guard for crosstab

…onal.py

np.unique(zones[np.isfinite(zones)]) silently materialises the full dask
array into RAM, causing OOM on large rasters. Replace with da.unique()
which reduces per-chunk and only .compute()s the tiny set of distinct
zone IDs.
@brendancol brendancol merged commit 0cf6ce2 into master Feb 25, 2026
10 checks passed
@brendancol brendancol deleted the fix/881-zonal-dask-materialise branch February 26, 2026 14:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant