-
Notifications
You must be signed in to change notification settings - Fork 85
Open
Labels
Description
Bug
_stats_dask_numpy() in xrspatial/zonal.py (lines 243, 247) has a Python boolean short-circuit bug:
if 'mean' or 'std' or 'var' in stats_funcs: # ALWAYS True
compute_sum = True
compute_count = True
if 'std' or 'var' in stats_funcs: # ALWAYS True
compute_sum_squares = TrueIn Python, 'mean' or 'std' or 'var' in stats_funcs evaluates as ('mean') or ('std') or ('var' in stats_funcs). The truthy string 'mean' short-circuits the entire expression, so compute_sum, compute_count, and compute_sum_squares are always set to True regardless of what statistics the user actually requested.
Impact
- Wasted computation: Users asking for just
['min', 'max']on a large dask array pay for sum, count, and sum-of-squares aggregations they never asked for. - Masked bugs: These code branches can never be disabled, so any future refactoring could introduce subtle data corruption without tests catching it.
Fix
if any(s in stats_funcs for s in ('mean', 'std', 'var')):
compute_sum = True
compute_count = True
if any(s in stats_funcs for s in ('std', 'var')):
compute_sum_squares = TrueAdd tests that verify only the requested statistics trigger computation.
Reactions are currently unavailable