Add GPU (CuPy) backend for cost_distance by brendancol · Pull Request #910 · xarray-contrib/xarray-spatial

brendancol · 2026-02-26T17:46:45Z

Summary

Replaces CPU spill-to-host fallback with native CUDA iterative parallel relaxation (parallel Bellman-Ford)
Each CUDA thread processes one pixel per iteration, checking all 4/8 neighbours for shorter paths; the wavefront advances at least one pixel per iteration
Convergence in O(height + width) iterations with early termination when no pixel changes
CuPy path: seeds sources at cost 0, iterates relaxation kernel, converts inf/over-budget to NaN
Dask+CuPy: bounded max_cost uses da.map_overlap with per-chunk GPU relaxation (out-of-core, no OOM); unbounded falls back to CPU iterative tile Dijkstra
Tests extended with cupy and dask+cupy across 8 parametrized test functions + 2 new GPU-specific tests
README updated: Cost Distance row now shows checkmarks for CuPy and Dask GPU columns

Fixes #905

Test plan

All 44 cost_distance tests pass (including 16 new cupy/dask+cupy variants)
Verified on GPU (NVIDIA RTX A6000) with cupy 13.6.0
Uniform friction, analytic grids, barriers, multiple sources, max_cost truncation, target_values, 4/8 connectivity all validated against numpy reference
Bounded dask+cupy hits map_overlap GPU path (verified no iterative warning)
Unbounded dask+cupy correctly falls back to CPU iterative tile Dijkstra
Output types verified: cupy returns cupy array, dask+cupy returns dask+cupy chunks

Replace CPU fallback with native CUDA iterative parallel relaxation (parallel Bellman-Ford). Each thread processes one pixel per iteration, checking all neighbours for shorter paths. The wavefront advances at least one pixel per iteration; convergence takes O(height + width) iterations with early termination when no changes occur. CuPy path: seeds source pixels at cost 0, runs relaxation kernel until convergence, converts inf/over-budget to NaN. Dask+CuPy: bounded max_cost uses da.map_overlap with per-chunk GPU relaxation; unbounded falls back to CPU iterative tile Dijkstra (O(N log N) Dijkstra beats O(N * diameter) relaxation for global paths). Tests extended with cupy and dask+cupy backends across 8 parametrized test functions. New tests for unbounded dask+cupy and cupy chunk type.

brendancol merged commit 86da4e0 into master Feb 26, 2026
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add GPU (CuPy) backend for cost_distance#910

Add GPU (CuPy) backend for cost_distance#910
brendancol merged 1 commit intomasterfrom
fix-905-gpu-cost-distance

brendancol commented Feb 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

brendancol commented Feb 26, 2026

Summary

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant