fix(quantization): enforce Blackwell MX-unit alignment for NVFP4 block-size validation#1413
fix(quantization): enforce Blackwell MX-unit alignment for NVFP4 block-size validation#1413makroumi wants to merge 2 commits intoNVIDIA:mainfrom
Conversation
NVFP4 quantization (num_bits=(2,1), scale_bits=(4,3)) targets Blackwell
MMA tiles which are hardwired for block sizes of 16 or 32 elements.
Prior to this change, illegal block sizes (e.g. 64, 128) passed
validation silently, corrupting scale tensors at export time after
wasting GPU hours on calibration.
Add a guard in QuantizerAttributeConfig.validate_block_sizes that
rejects any integer block_size not in {16, 32} when the NVFP4
signature is detected. Non-NVFP4 formats are unaffected.
Signed-off-by: Mehdi Makroumi <134870510+makroumi@users.noreply.github.com>
|
ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (3)
✅ Files skipped from review due to trivial changes (1)
🚧 Files skipped from review as they are similar to previous changes (2)
📝 WalkthroughWalkthroughAdds NVFP4-specific validation that restricts quantization block_size values to 16 or 32 at config construction, accompanies it with unit tests covering valid/invalid and non-applicable cases, and documents the change in the 0.45 changelog. ChangesNVFP4 Block Size Validation
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 6✅ Passed checks (6 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Comment |
4fcb798 to
19d26f8
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@modelopt/torch/quantization/config.py`:
- Around line 458-464: The Pydantic validator validate_block_sizes in
modelopt.torch.quantization.config uses Python assert statements which are
removed under -O; replace each assert with an explicit exception (e.g., raise
ValueError with the same message) so validation always runs, including the NVFP4
check that currently asserts "_v in (16, 32)" and the other asserts in
validate_block_sizes (the checks at the locations referenced around the NVFP4
block). Update all assertions in validate_block_sizes to raise ValueError with
clear messages preserving the original assertion text and ensure any
loop/conditional logic and return behavior remains unchanged.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: b2f49d17-e721-4201-b641-7be9d07a287e
📒 Files selected for processing (3)
CHANGELOG.rstmodelopt/torch/quantization/config.pytests/unit/torch/quantization/test_config_validation.py
What does this PR do?
Type of change: Bug fix
QuantizerAttributeConfig.validate_block_sizesvalidates axis conflicts, dynamic single-axis constraints, and key types, but does not constrain the integer block size values for NVFP4 quantization. NVFP4 (num_bits=(2,1),scale_bits=(4,3)) targets Blackwell MMA tiles hardwired for block sizes of 16 or 32 elements. An illegal block size (e.g. 64, 128, 4) passes validation silently, propagates through calibration, and corrupts scale tensors at TensorRT-LLM export or produces garbage at deployment.This PR adds a guard that rejects any integer block size not in {16, 32} when the NVFP4 format signature is detected. The check fires at Pydantic config construction time, before any GPU work is spent. Non-NVFP4 formats (INT8, FP8, INT4, MXFP4, etc.) are unaffected.
Usage
Testing
8 new test cases in
TestNVFP4BlockSizeValidation:-
test_nvfp4_block_16_accepted: canonical tile passes-
test_nvfp4_block_32_accepted: alternative tile passes-
test_nvfp4_illegal_block_size_rejected[8]through[256]: 5 parametrized illegal values rejected with correct error message-
test_non_nvfp4_block_size_unaffected: INT4 block_size=128 still passes-
test_nvfp4_without_scale_bits_unaffected: MXFP4 (scale_bits=(8,0)) skips constraintAll 60 tests in
test_config_validation.py pass(52 existing + 8 new). All existing NVFP4 preset configs (NVFP4_DEFAULT_CFG, etc.) validated against the new constraint with zero regressions.Before your PR is "Ready for review"
Make sure you read and follow Contributor guidelines and your commits are signed (
git commit -s -S).Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded
trust_remote_code=True,torch.load(..., weights_only=False),pickle, etc.).CONTRIBUTING.md: N/AAdditional Information
7 lines added to
modelopt/torch/quantization/config.py, 85 lines added totests/unit/torch/quantization/test_config_validation.py, 4 lines added toCHANGELOG.rst. Zero lines modified, zero lines deleted, zero new imports, zero new functions.Summary by CodeRabbit
Bug Fixes
Tests