Fix CI crash on RTX6000#1184
Conversation
Signed-off-by: Nicolas L. Guidotti <nguidotti@nvidia.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughA compile-time preprocessor definition ChangesBuild/CMake change
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
cpp/CMakeLists.txt (1)
166-166: ⚡ Quick winGate
CCCL_DISABLE_WARPSPEED_SCANto CUDA 13.1 and earlier to avoid unnecessary performance loss on newer toolchains.Your own comments confirm this workaround is required only for CUDA 13.1 and fixed in NVCC ≥ 13.2. Applying it unconditionally disables the optimized warpspeed scan path on all later versions. Gate the definition by CUDA compiler version to preserve performance where the bug doesn't exist.
Suggested patch
-add_definitions(-DCCCL_DISABLE_WARPSPEED_SCAN) +if (CMAKE_CUDA_COMPILER_VERSION VERSION_LESS 13.2) + add_definitions(-DCCCL_DISABLE_WARPSPEED_SCAN) +endif ()🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cpp/CMakeLists.txt` at line 166, Gate the add_definitions(-DCCCL_DISABLE_WARPSPEED_SCAN) so it only applies to NVCC <= 13.1: check the CUDA compiler version (CMAKE_CUDA_COMPILER_VERSION) and wrap the add_definitions(...) call in an if block that only executes when CMAKE_CUDA_COMPILER_VERSION VERSION_LESS "13.2" (or VERSION_LESS_EQUAL "13.1"), leaving the definition out for >= 13.2 so the optimized warpspeed scan remains enabled; update CMakeLists.txt around the existing add_definitions(-DCCCL_DISABLE_WARPSPEED_SCAN) line accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@cpp/CMakeLists.txt`:
- Line 166: Gate the add_definitions(-DCCCL_DISABLE_WARPSPEED_SCAN) so it only
applies to NVCC <= 13.1: check the CUDA compiler version
(CMAKE_CUDA_COMPILER_VERSION) and wrap the add_definitions(...) call in an if
block that only executes when CMAKE_CUDA_COMPILER_VERSION VERSION_LESS "13.2"
(or VERSION_LESS_EQUAL "13.1"), leaving the definition out for >= 13.2 so the
optimized warpspeed scan remains enabled; update CMakeLists.txt around the
existing add_definitions(-DCCCL_DISABLE_WARPSPEED_SCAN) line accordingly.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: bbf808d6-5876-4748-b966-14128204a756
📒 Files selected for processing (1)
cpp/CMakeLists.txt
Signed-off-by: Nicolas L. Guidotti <nguidotti@nvidia.com>
This PR disables the
warpspeed scanin CUB, which is causing the CI test on RTX6000 to crash. More specifically, there is a Warp MMU Fault incub::detail::scan::DeviceScanKernelduringthrust::inclusive_scan(..., thrust::maximum<int>{})called fromtrivial_presolve.cuh:124.CCCL 3.4.0 introduced an SM90+ "warpspeed" scan kernel that uses Hopper/Blackwell TMA (
cp_async_bulk). When computing the byte mask for a partial TMA copy (cp_async_bulk_cp_mask), the code has two branches:On NVCC 13.1, the
#elseformula can produce a non-contiguous byte mask. Blackwell's TMA hardware requires a strictly contiguous bit range in the mask — a non-contiguous mask causes a hardware MMU fault.Checklist