Skip to content

Rewrite zonal.apply() for multi-backend support and return semantics #900

@brendancol

Description

@brendancol

Problem

zonal.apply() (xrspatial/zonal.py lines 1190–1297) has multiple issues that make it unusable in production:

  1. Calls .values (lines 1256, 1260) — silently materialises dask arrays and copies GPU arrays to host, destroying scalability.
  2. Uses np.vectorize(func) (line 1289) — a Python loop disguised as vectorisation, with massive per-element overhead.
  3. Mutates the input DataArray in-place (values.values = ... line 1291) — the only function in the library that does this. Incompatible with dask's lazy evaluation and violates the principle of least surprise.
  4. No ArrayTypeFunctionMapping dispatch — pure numpy-only, unlike every other function in the library.

Proposed Fix

  • Rewrite with ArrayTypeFunctionMapping dispatch pattern.
  • Add _apply_numpy, _apply_dask_numpy, _apply_cupy, _apply_dask_cupy backends.
  • Return a new DataArray instead of mutating the input (breaking change — document in changelog).
  • For dask: use map_blocks since zones and values are chunk-aligned.
  • Replace np.vectorize with proper masked array operations.
  • Update README feature matrix row for Apply.

Breaking Change

The current API mutates values in-place and returns None. The new API should return a new DataArray. This is a deliberate breaking change to align with the rest of the library.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions