Implement selective GPU dispatch for optimal CPU-GPU heterogeneous computing

## Problem

Current implementation sends all functions to GPU when `options(propr.use_gpu = TRUE)` is set. Benchmarks show some functions are **much slower** on GPU due to transfer overhead and kernel launch costs.

## Benchmark Results

**GPU Winners (always beneficial):**
- `lrm`, `lrv`: 500x-2500x speedup 🔥
- `corRcpp`, `linRcpp`, `rhoRcpp`: 100x-700x speedup
- `clrRcpp`, `phiRcpp`: 20x-370x speedup

**GPU Losers (slower than CPU):**
- `coordToIndex`: 0.1x-0.4x (3-9x SLOWER)
- `count_*` functions: 0.01x-0.03x (30-100x SLOWER)
- `wtvRcpp`, `wtmRcpp`: 0.6x-2x

## Solution

Implement **internal selective dispatch**: automatically route functions to GPU only when beneficial, regardless of global `options(propr.use_gpu = TRUE)` setting.

```cpp
bool should_use_gpu_internal(const char* func_name) {
    // Never use CPU for these
    if (func_name in ["coordToIndex", "count_*", "wtmRcpp", "wtvRcpp"]) {
        return false;
    }
    // Always use GPU for these (when available)
    if (func_name in ["lrm", "lrv", "corRcpp", "linRcpp", "rhoRcpp"]) {
        return true;
    }
}
```

## User Experience

```r
options(propr.use_gpu = TRUE)  # Enable GPU globally

# Package automatically decides:
# - lrm/lrv/corRcpp → GPU (fast!)
# - count_*/coordToIndex → CPU (avoid slowdown)

pr <- propr(counts, metric = "rho")  # Optimal performance by default
```

## Benefits

- Optimal performance by default
- No performance regressions
- Fewer unnecessary CPU↔GPU transfers
- Users don't need to think about which functions benefit from GPU

## Next Steps

1. Implement `should_use_gpu_internal()` dispatch logic
2. Keep global `options(propr.use_gpu)` as master switch
3. Test on different hardware configurations

---
*Issue created by Luna (AI assistant for @suzannejin)*

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement selective GPU dispatch for optimal CPU-GPU heterogeneous computing #87

Problem

Benchmark Results

Solution

User Experience

Benefits

Next Steps

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Implement selective GPU dispatch for optimal CPU-GPU heterogeneous computing #87

Description

Problem

Benchmark Results

Solution

User Experience

Benefits

Next Steps

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions