Skip to content

【Hackathon 10th Spring No.45-part2】Add missing cpp_extensions.cc compile guards for SM70/SM75#6977

Open
cloudforge1 wants to merge 1 commit intoPaddlePaddle:developfrom
cloudforge1:task/045-t4-v100-compile-guards-replace
Open

【Hackathon 10th Spring No.45-part2】Add missing cpp_extensions.cc compile guards for SM70/SM75#6977
cloudforge1 wants to merge 1 commit intoPaddlePaddle:developfrom
cloudforge1:task/045-t4-v100-compile-guards-replace

Conversation

@cloudforge1
Copy link
Contributor

@cloudforge1 cloudforge1 commented Mar 23, 2026

Motivation

PR #6488 (merged as -part) introduced T4/V100 compile support but left two registration blocks in cpp_extensions.cc unguarded:

  1. 5 cutlass/FP8 ops (lines 1635–1673): .cu sources compile only at SM≥75, but registration is unconditional → linker error on V100 (SM70)
  2. 7 tail MoE/MLA ops (lines 1890–1925): sources compile only at SM≥80, registration unconditional → linker error on SM70/SM75

This is a minimal, additive-only fix — 4 lines added, 0 lines removed. See PR #6941 for a full wholesale replacement alternative.

Modifications

  • custom_ops/gpu_ops/cpp_extensions.cc: Add #ifdef ENABLE_SM75_EXT_OPS / #endif around 5 cutlass/FP8 ops. Add #ifdef ENABLE_SM80_EXT_OPS / #endif around 7 tail MoE ops.

No changes to setup_ops.py — keeps #6488's code as-is.

Usage or Command

No user-facing changes. Build correctly gates ops per SM tier after this fix.

Accuracy Tests

Guard macro verified in setup_ops.py: ENABLE_SM75_EXT_OPS and ENABLE_SM80_EXT_OPS are both in cc_compile_args (host compiler visibility — required for .cc files).

Wholesale version tested on Baidu AI Studio V100 (pipeline p-1051a228d3c7).

Checklist

  • 4 additive lines, 0 deletions — minimal diff
  • Pre-commit hooks pass (clang-format)
  • Guards use macros visible to host compiler (cc_compile_args)
  • Correct SM tier: cutlass→SM75, MoE→SM80

@paddle-bot
Copy link

paddle-bot bot commented Mar 23, 2026

Thanks for your contribution!

@paddle-bot paddle-bot bot added the contributor External developers label Mar 23, 2026
@cloudforge1 cloudforge1 force-pushed the task/045-t4-v100-compile-guards-replace branch from 87e2426 to d45e88c Compare March 23, 2026 12:48
…0/SM75

Additive fix on top of merged PaddlePaddle#6488:
- Add #ifdef ENABLE_SM75_EXT_OPS guard for 5 cutlass/FP8 op
  registrations (prevents linker error on SM70)
- Add #ifdef ENABLE_SM80_EXT_OPS guard for 7 tail MoE/MLA op
  registrations (prevents linker error on SM70/SM75)

Uses ENABLE_SM75_EXT_OPS (passed to both cxx and nvcc compilers)
instead of ENABLE_SCALED_MM_C2X (nvcc-only) for the cutlass guard,
since cpp_extensions.cc is compiled by the host C++ compiler.
@cloudforge1 cloudforge1 force-pushed the task/045-t4-v100-compile-guards-replace branch from d45e88c to 975e788 Compare March 23, 2026 12:56
@cloudforge1 cloudforge1 changed the title 【Hackathon 10th Spring No.45】Fix compile guard bugs in #6488 — V100-tested replacement 【Hackathon 10th Spring No.45-part2】Add missing cpp_extensions.cc compile guards for SM70/SM75 Mar 23, 2026
@codecov-commenter
Copy link

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (develop@defaffd). Learn more about missing BASE report.

Additional details and impacted files
@@            Coverage Diff             @@
##             develop    #6977   +/-   ##
==========================================
  Coverage           ?   73.71%           
==========================================
  Files              ?      399           
  Lines              ?    55950           
  Branches           ?     8828           
==========================================
  Hits               ?    41246           
  Misses             ?    11784           
  Partials           ?     2920           
Flag Coverage Δ
GPU 73.71% <ø> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants