Skip to content

Add GPU (CuPy) backend for cost_distance#910

Merged
brendancol merged 1 commit intomasterfrom
fix-905-gpu-cost-distance
Feb 26, 2026
Merged

Add GPU (CuPy) backend for cost_distance#910
brendancol merged 1 commit intomasterfrom
fix-905-gpu-cost-distance

Conversation

@brendancol
Copy link
Contributor

Summary

  • Replaces CPU spill-to-host fallback with native CUDA iterative parallel relaxation (parallel Bellman-Ford)
  • Each CUDA thread processes one pixel per iteration, checking all 4/8 neighbours for shorter paths; the wavefront advances at least one pixel per iteration
  • Convergence in O(height + width) iterations with early termination when no pixel changes
  • CuPy path: seeds sources at cost 0, iterates relaxation kernel, converts inf/over-budget to NaN
  • Dask+CuPy: bounded max_cost uses da.map_overlap with per-chunk GPU relaxation (out-of-core, no OOM); unbounded falls back to CPU iterative tile Dijkstra
  • Tests extended with cupy and dask+cupy across 8 parametrized test functions + 2 new GPU-specific tests
  • README updated: Cost Distance row now shows checkmarks for CuPy and Dask GPU columns

Fixes #905

Test plan

  • All 44 cost_distance tests pass (including 16 new cupy/dask+cupy variants)
  • Verified on GPU (NVIDIA RTX A6000) with cupy 13.6.0
  • Uniform friction, analytic grids, barriers, multiple sources, max_cost truncation, target_values, 4/8 connectivity all validated against numpy reference
  • Bounded dask+cupy hits map_overlap GPU path (verified no iterative warning)
  • Unbounded dask+cupy correctly falls back to CPU iterative tile Dijkstra
  • Output types verified: cupy returns cupy array, dask+cupy returns dask+cupy chunks

Replace CPU fallback with native CUDA iterative parallel relaxation
(parallel Bellman-Ford).  Each thread processes one pixel per iteration,
checking all neighbours for shorter paths.  The wavefront advances at
least one pixel per iteration; convergence takes O(height + width)
iterations with early termination when no changes occur.

CuPy path: seeds source pixels at cost 0, runs relaxation kernel until
convergence, converts inf/over-budget to NaN.

Dask+CuPy: bounded max_cost uses da.map_overlap with per-chunk GPU
relaxation; unbounded falls back to CPU iterative tile Dijkstra
(O(N log N) Dijkstra beats O(N * diameter) relaxation for global paths).

Tests extended with cupy and dask+cupy backends across 8 parametrized
test functions.  New tests for unbounded dask+cupy and cupy chunk type.
@brendancol brendancol merged commit 86da4e0 into master Feb 26, 2026
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add GPU (CuPy) backend for cost_distance()

1 participant