Add GPU (CuPy) backends for proximity, allocation, direction#909
Merged
brendancol merged 2 commits intomasterfrom Feb 26, 2026
Merged
Add GPU (CuPy) backends for proximity, allocation, direction#909brendancol merged 2 commits intomasterfrom
brendancol merged 2 commits intomasterfrom
Conversation
Add CUDA brute-force nearest-target kernel with device functions for Euclidean, Manhattan, and great-circle distance metrics. Each thread processes one pixel, scanning all targets to find the nearest. Supports proximity (distance), allocation (target value), and direction modes. Adds _process_cupy() and _process_dask_cupy() host functions with dispatch wired into _process(). Tests parametrized over cupy backend.
39b2fdc to
549cddf
Compare
…or unbounded The previous _process_dask_cupy called .compute() which would materialise the entire raster into GPU memory — OOM on large datasets. Bounded max_distance: use da.map_overlap with per-chunk CUDA kernel, so only one chunk + overlap padding is on GPU at a time. Unbounded max_distance: convert dask+cupy to dask+numpy and use the existing KDTree path (CPU-based O(N log T) beats brute-force O(NT), and KDTree is inherently a CPU data structure). Also adds 'dask+cupy' backend to all 9 proximity test parametrize lists.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
_process_cupy()and_process_dask_cupy()host functions with dispatch wired into_process()max_distanceusesda.map_overlapwith per-chunk GPU kernels (only one chunk + overlap padding on GPU at a time); unbounded falls back to the existing CPU KDTree path (O(N log T) beats brute-force O(NT))cupyanddask+cupybackends across all 9 proximity/allocation/direction test casesFixes #901
Test plan