Skip to content

Fix zonal.stats() dask boolean short-circuit bug #899

@brendancol

Description

@brendancol

Bug

_stats_dask_numpy() in xrspatial/zonal.py (lines 243, 247) has a Python boolean short-circuit bug:

if 'mean' or 'std' or 'var' in stats_funcs:   # ALWAYS True
    compute_sum = True
    compute_count = True

if 'std' or 'var' in stats_funcs:              # ALWAYS True
    compute_sum_squares = True

In Python, 'mean' or 'std' or 'var' in stats_funcs evaluates as ('mean') or ('std') or ('var' in stats_funcs). The truthy string 'mean' short-circuits the entire expression, so compute_sum, compute_count, and compute_sum_squares are always set to True regardless of what statistics the user actually requested.

Impact

  • Wasted computation: Users asking for just ['min', 'max'] on a large dask array pay for sum, count, and sum-of-squares aggregations they never asked for.
  • Masked bugs: These code branches can never be disabled, so any future refactoring could introduce subtle data corruption without tests catching it.

Fix

if any(s in stats_funcs for s in ('mean', 'std', 'var')):
    compute_sum = True
    compute_count = True

if any(s in stats_funcs for s in ('std', 'var')):
    compute_sum_squares = True

Add tests that verify only the requested statistics trigger computation.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions