Skip to content

Fix CI crash on RTX6000#1184

Open
nguidotti wants to merge 2 commits intoNVIDIA:mainfrom
nguidotti:fix-rtx-6000-crash
Open

Fix CI crash on RTX6000#1184
nguidotti wants to merge 2 commits intoNVIDIA:mainfrom
nguidotti:fix-rtx-6000-crash

Conversation

@nguidotti
Copy link
Copy Markdown
Contributor

@nguidotti nguidotti commented May 6, 2026

This PR disables the warpspeed scan in CUB, which is causing the CI test on RTX6000 to crash. More specifically, there is a Warp MMU Fault in cub::detail::scan::DeviceScanKernel during thrust::inclusive_scan(..., thrust::maximum<int>{}) called from trivial_presolve.cuh:124.

CCCL 3.4.0 introduced an SM90+ "warpspeed" scan kernel that uses Hopper/Blackwell TMA (cp_async_bulk). When computing the byte mask for a partial TMA copy (cp_async_bulk_cp_mask), the code has two branches:

  #if _CCCL_CUDA_COMPILER(NVCC, >=, 13, 2)
      byteMaskSmall = byteMaskStart & byteMaskEnd;              // correct
  #else
      byteMaskSmall = byteMaskStart & (byteMask >> (16 - (ptrGmemEnd - ptrGmemStartAlignDown)));

On NVCC 13.1, the #else formula can produce a non-contiguous byte mask. Blackwell's TMA hardware requires a strictly contiguous bit range in the mask — a non-contiguous mask causes a hardware MMU fault.

Checklist

  • I am familiar with the Contributing Guidelines.
  • Testing
    • New or existing tests cover these changes
    • Added tests
    • Created an issue to follow-up
    • NA
  • Documentation
    • The documentation is up to date with these changes
    • Added new documentation
    • NA

Signed-off-by: Nicolas L. Guidotti <nguidotti@nvidia.com>
@nguidotti nguidotti added this to the 26.06 milestone May 6, 2026
@nguidotti nguidotti self-assigned this May 6, 2026
@nguidotti nguidotti requested review from a team as code owners May 6, 2026 17:29
@nguidotti nguidotti requested a review from rgsl888prabhu May 6, 2026 17:29
@nguidotti nguidotti added bug Something isn't working non-breaking Introduces a non-breaking change labels May 6, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 6, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 0dc99e8a-3baf-47bc-928a-aa1874dcf534

📥 Commits

Reviewing files that changed from the base of the PR and between 861c324 and 4508695.

📒 Files selected for processing (1)
  • cpp/CMakeLists.txt

📝 Walkthrough

Walkthrough

A compile-time preprocessor definition CCCL_DISABLE_WARPSPEED_SCAN is added to cpp/CMakeLists.txt, guarded by a CUDA version check (<= 13.2) with explanatory comments to disable CCCL warp-speed scanning and avoid Warp MMU faults. No public APIs or exported signatures were changed.

Changes

Build/CMake change

Layer / File(s) Summary
Build Definition
cpp/CMakeLists.txt
Adds CCCL_DISABLE_WARPSPEED_SCAN to target compile definitions.
Version Guard / Rationale
cpp/CMakeLists.txt
Wraps the define with a CUDA compiler version check (<= 13.2) and includes explanatory comments about Warp MMU fault behavior.
Documentation in-code
cpp/CMakeLists.txt
Comments inserted near existing DEFINE_PDLP_VERBOSE_MODE block to explain the reason for the compile-time toggle.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Fix CI crash on RTX6000' directly describes the main change—disabling warpspeed scan to resolve a hardware-specific crash issue.
Description check ✅ Passed The description clearly explains the problem (Warp MMU Fault), root cause (non-contiguous byte mask in NVCC 13.1), and solution (disabling warpspeed scan), all directly relevant to the changeset.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
cpp/CMakeLists.txt (1)

166-166: ⚡ Quick win

Gate CCCL_DISABLE_WARPSPEED_SCAN to CUDA 13.1 and earlier to avoid unnecessary performance loss on newer toolchains.

Your own comments confirm this workaround is required only for CUDA 13.1 and fixed in NVCC ≥ 13.2. Applying it unconditionally disables the optimized warpspeed scan path on all later versions. Gate the definition by CUDA compiler version to preserve performance where the bug doesn't exist.

Suggested patch
-add_definitions(-DCCCL_DISABLE_WARPSPEED_SCAN)
+if (CMAKE_CUDA_COMPILER_VERSION VERSION_LESS 13.2)
+    add_definitions(-DCCCL_DISABLE_WARPSPEED_SCAN)
+endif ()
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cpp/CMakeLists.txt` at line 166, Gate the
add_definitions(-DCCCL_DISABLE_WARPSPEED_SCAN) so it only applies to NVCC <=
13.1: check the CUDA compiler version (CMAKE_CUDA_COMPILER_VERSION) and wrap the
add_definitions(...) call in an if block that only executes when
CMAKE_CUDA_COMPILER_VERSION VERSION_LESS "13.2" (or VERSION_LESS_EQUAL "13.1"),
leaving the definition out for >= 13.2 so the optimized warpspeed scan remains
enabled; update CMakeLists.txt around the existing
add_definitions(-DCCCL_DISABLE_WARPSPEED_SCAN) line accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@cpp/CMakeLists.txt`:
- Line 166: Gate the add_definitions(-DCCCL_DISABLE_WARPSPEED_SCAN) so it only
applies to NVCC <= 13.1: check the CUDA compiler version
(CMAKE_CUDA_COMPILER_VERSION) and wrap the add_definitions(...) call in an if
block that only executes when CMAKE_CUDA_COMPILER_VERSION VERSION_LESS "13.2"
(or VERSION_LESS_EQUAL "13.1"), leaving the definition out for >= 13.2 so the
optimized warpspeed scan remains enabled; update CMakeLists.txt around the
existing add_definitions(-DCCCL_DISABLE_WARPSPEED_SCAN) line accordingly.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: bbf808d6-5876-4748-b966-14128204a756

📥 Commits

Reviewing files that changed from the base of the PR and between 285990b and 861c324.

📒 Files selected for processing (1)
  • cpp/CMakeLists.txt

Comment thread cpp/CMakeLists.txt Outdated
Signed-off-by: Nicolas L. Guidotti <nguidotti@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working non-breaking Introduces a non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants