Skip to content

Fixes #899, #902: fix dask zonal.stats() bug, add dask+cupy backend, edge-case tests#911

Merged
brendancol merged 3 commits intomasterfrom
fix-899-zonal-stats-boolean
Feb 27, 2026
Merged

Fixes #899, #902: fix dask zonal.stats() bug, add dask+cupy backend, edge-case tests#911
brendancol merged 3 commits intomasterfrom
fix-899-zonal-stats-boolean

Conversation

@brendancol
Copy link
Contributor

@brendancol brendancol commented Feb 27, 2026

Summary

  • Fix _stats_dask_numpy() boolean short-circuit bug where if 'mean' or 'std' or 'var' in stats_funcs always evaluated to True (the string 'mean' is truthy), causing compute_sum, compute_count, and compute_sum_squares to be set on every call regardless of requested stats
  • Replace with any(s in stats_funcs for s in ('mean', 'std', 'var')) for correct membership testing
  • Add _stats_dask_cupy() backend (Add zonal.stats() dask+cupy backend #902): converts dask+cupy blocks to numpy via map_blocks(x.get()) then delegates to the existing _stats_dask_numpy pipeline
  • Wire _stats_dask_cupy into the ArrayTypeFunctionMapping dispatcher (replacing not_implemented_func)
  • Add 'dask+cupy' to test_default_stats and test_zone_ids_stats parametrize lists, fix skip guards to use 'cupy' in backend
  • Add 5 new edge-case test groups (18 test cases across backends):
    • all-NaN zone: documents per-backend empty-zone behavior
    • single-cell zones: std/var must be 0, not NaN
    • negative zone IDs: exercises sort-and-stride with negatives
    • nodata wipes zone: all finite values match nodata_values
    • zone in subset of blocks: zone present in only some dask chunks

Test plan

  • test_stats_subset_columns — 7 parametrized stat subsets on numpy and dask+numpy
  • test_default_stats and test_zone_ids_stats — now include dask+cupy backend
  • 5 new edge-case tests pass across all applicable backends
  • Full test_zonal.py suite passes (106 tests)

The conditions `if 'mean' or 'std' or 'var' in stats_funcs` always
evaluated to True because the string 'mean' is truthy. This caused
compute_sum, compute_count, and compute_sum_squares to always be set,
wasting work on every dask zonal.stats() call regardless of which
stats were requested.

Fix: use `any(s in stats_funcs for s in (...))` for correct membership
testing. Add regression tests covering 7 stat subsets on both numpy
and dask backends to exercise each compute flag independently.
… semantics

Replaces the numpy-only, input-mutating apply() with a proper
multi-backend implementation (numpy, cupy, dask+numpy, dask+cupy) that
returns a new DataArray instead of mutating the input. Uses
ArrayTypeFunctionMapping for dispatch and .data.dtype for validation
to avoid materializing dask/cupy arrays.
Add _stats_dask_cupy() that converts dask+cupy blocks to numpy via
map_blocks(x.get()) then delegates to the existing _stats_dask_numpy
pipeline. Wire it into the ArrayTypeFunctionMapping dispatcher.

Add five new edge-case test groups (18 test cases across backends):
- all-NaN zone: documents per-backend empty-zone behavior
- single-cell zones: std/var must be 0, not NaN
- negative zone IDs: exercises sort-and-stride with negatives
- nodata wipes zone: all finite values match nodata_values
- zone in subset of blocks: zone present in only some dask chunks
@brendancol brendancol changed the title Fixes #899: fix boolean short-circuit bug in dask zonal.stats() Fixes #899, #902: fix dask zonal.stats() bug, add dask+cupy backend, edge-case tests Feb 27, 2026
@brendancol brendancol merged commit fe7921a into master Feb 27, 2026
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant