Skip to content

Add GPU (CuPy) backends for proximity, allocation, direction#909

Merged
brendancol merged 2 commits intomasterfrom
fix-901-gpu-proximity
Feb 26, 2026
Merged

Add GPU (CuPy) backends for proximity, allocation, direction#909
brendancol merged 2 commits intomasterfrom
fix-901-gpu-proximity

Conversation

@brendancol
Copy link
Contributor

@brendancol brendancol commented Feb 26, 2026

Summary

  • Adds CUDA brute-force nearest-target kernel with device functions for Euclidean, Manhattan, and great-circle distance metrics
  • Each CUDA thread processes one pixel, scanning all target pixels to find the nearest — supports proximity (distance), allocation (target value), and direction output modes
  • Adds _process_cupy() and _process_dask_cupy() host functions with dispatch wired into _process()
  • Out-of-core dask+cupy: bounded max_distance uses da.map_overlap with per-chunk GPU kernels (only one chunk + overlap padding on GPU at a time); unbounded falls back to the existing CPU KDTree path (O(N log T) beats brute-force O(NT))
  • Tests parametrized to include cupy and dask+cupy backends across all 9 proximity/allocation/direction test cases
  • README feature matrix updated with checkmarks for CuPy and Dask GPU columns

Fixes #901

Test plan

  • All 69 proximity tests pass (9 cupy + 9 dask+cupy + 51 existing)
  • Verified on GPU (NVIDIA RTX A6000) with cupy 13.6.0
  • Bounded dask+cupy path processes chunks independently via map_overlap (no full materialization)
  • Unbounded dask+cupy path converts to dask+numpy for KDTree (avoids OOM on large rasters)

Add CUDA brute-force nearest-target kernel with device functions for
Euclidean, Manhattan, and great-circle distance metrics. Each thread
processes one pixel, scanning all targets to find the nearest. Supports
proximity (distance), allocation (target value), and direction modes.

Adds _process_cupy() and _process_dask_cupy() host functions with
dispatch wired into _process(). Tests parametrized over cupy backend.
@brendancol brendancol force-pushed the fix-901-gpu-proximity branch from 39b2fdc to 549cddf Compare February 26, 2026 16:43
…or unbounded

The previous _process_dask_cupy called .compute() which would materialise
the entire raster into GPU memory — OOM on large datasets.

Bounded max_distance: use da.map_overlap with per-chunk CUDA kernel, so
only one chunk + overlap padding is on GPU at a time.

Unbounded max_distance: convert dask+cupy to dask+numpy and use the
existing KDTree path (CPU-based O(N log T) beats brute-force O(NT),
and KDTree is inherently a CPU data structure).

Also adds 'dask+cupy' backend to all 9 proximity test parametrize lists.
@brendancol brendancol merged commit 593e559 into master Feb 26, 2026
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add GPU (CuPy) backends for proximity, allocation, and direction

1 participant