Skip to content

fix(qdp-core): link cudart locally and stub FFI when toolkit is absent #1321

Open
andrewmusselman wants to merge 1 commit into
mainfrom
qdp-core-cudart-linkage
Open

fix(qdp-core): link cudart locally and stub FFI when toolkit is absent #1321
andrewmusselman wants to merge 1 commit into
mainfrom
qdp-core-cudart-linkage

Conversation

@andrewmusselman
Copy link
Copy Markdown
Contributor

@andrewmusselman andrewmusselman commented May 17, 2026

Closes #1318 .

qdp-core/build.rs (extended; it already existed for protoc) now:

  • probes for nvcc with the same logic as qdp-kernels/build.rs,
  • emits cargo:rustc-link-search=native=$CUDA_PATH/lib64 and
    cargo:rustc-link-lib=cudart when found,
  • emits cargo:rustc-cfg=qdp_no_cuda when not found, with a clear
    cargo:warning pointing at the toolkit install,
  • respects QDP_NO_CUDA=1 for explicit forcing (matching qdp-kernels).
    qdp-core/src/gpu/cuda_ffi.rs wraps the extern "C" block in
    #[cfg(not(qdp_no_cuda))] and adds matching #[cfg(qdp_no_cuda)] mod no_cuda_stubs { ... } with pub(crate) unsafe fn stubs for all 14
    declared functions. Each stub returns 999 — the same sentinel
    qdp-kernels uses for its kernel-launcher stubs — so existing caller
    error paths (if ret != 0 { return Err(...) }) surface a clean runtime
    error if anyone calls a CUDA function on a no-toolkit build, instead of
    failing at link time. The whole stub module is wrapped in a single
    #[allow(non_snake_case)] since the originals are camelCase to match
    the real CUDA Runtime API.

Cross-crate behaviour after this PR:

environment qdp-kernels qdp-core result
toolkit installed links cudart links cudart normal GPU build
driver only / macOS / CI stub launchers stub Runtime API links; runtime err 999
QDP_NO_CUDA=1 stub launchers stub Runtime API links; runtime err 999

Verified on Linux + CUDA 12.4: cargo build --workspace --tests --exclude qdp-python succeeds both with and without QDP_NO_CUDA=1;
make test_rust runs full integration tests on the GPU; 12 lint
warnings introduced by the stubs are silenced.

@ryankert01
Copy link
Copy Markdown
Member

ryankert01 commented May 19, 2026

Like the high-level idea:

  • With CUDA toolkit installed: normal GPU-enabled build.
  • Without CUDA toolkit: build still links successfully, but CUDA calls return a controlled runtime error.
  • With QDP_NO_CUDA=1: same forced no-CUDA behavior.

Copy link
Copy Markdown
Member

@ryankert01 ryankert01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Two small things worth doing. Also, needs to fix the pre-commit.

Comment thread qdp/qdp-core/build.rs
/// This function:
/// * emits `cargo:rustc-link-lib=cudart` and the appropriate
/// `cargo:rustc-link-search` path when nvcc is found, and
/// * emits `cargo:rustc-cfg=qdp_no_cuda` when it is not, gating the
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

output().is_ok() is true even if nvcc exits non-zero — a half-installed nvcc would set has_cuda = true and fall through to a link error. Suggest .map(|o| o.status.success()).unwrap_or(false). Same idiom in qdp-kernels/build.rs:177; fix both so they can't disagree.

Comment thread qdp/qdp-core/build.rs
///
/// `qdp-core` declares CUDA Runtime API extern symbols in `src/gpu/cuda_ffi.rs`
/// (cudaHostAlloc, cudaMemGetInfo, cudaEventCreateWithFlags, ...). Those symbols
/// must be resolved at link time, which requires `libcudart` from the CUDA
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing cargo:rerun-if-env-changed=PATH. The whole decision hinges on nvcc-on-PATH, so installing CUDA after a stub build won't re-trigger the script until cargo clean. Same gap in qdp-kernels.

@ryankert01 ryankert01 changed the title qdp-core: own the cudart link directive; gate extern block on qdp_no_cuda feat(qdp-core): own the cudart link directive; gate extern block on qdp_no_cuda May 19, 2026
@ryankert01 ryankert01 changed the title feat(qdp-core): own the cudart link directive; gate extern block on qdp_no_cuda fix(qdp-core): link cudart locally and stub FFI when toolkit is absent May 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] qdp-core does not own its libcudart link directive; build fails on driver-only systems

2 participants