fix(quantization): enforce Blackwell MX-unit alignment for NVFP4 block-size validation by makroumi · Pull Request #1413 · NVIDIA/Model-Optimizer

makroumi · 2026-05-07T23:31:12Z

What does this PR do?

Type of change: Bug fix

QuantizerAttributeConfig.validate_block_sizes validates axis conflicts, dynamic single-axis constraints, and key types, but does not constrain the integer block size values for NVFP4 quantization. NVFP4 (num_bits=(2,1), scale_bits=(4,3)) targets Blackwell MMA tiles hardwired for block sizes of 16 or 32 elements. An illegal block size (e.g. 64, 128, 4) passes validation silently, propagates through calibration, and corrupts scale tensors at TensorRT-LLM export or produces garbage at deployment.

This PR adds a guard that rejects any integer block size not in {16, 32} when the NVFP4 format signature is detected. The check fires at Pydantic config construction time, before any GPU work is spent. Non-NVFP4 formats (INT8, FP8, INT4, MXFP4, etc.) are unaffected.

Usage

from modelopt.torch.quantization.config import QuantizeConfig

# This now raises ValidationError immediately
QuantizeConfig(
    quant_cfg=[{
        "quantizer_name": "*weight_quantizer",
        "cfg": {
            "num_bits": (2, 1),
            "block_sizes": {-1: 64, "scale_bits": (4, 3)},
        },
    }],
    algorithm="max",
)
# PydanticValidationError: NVFP4 block_size must be 16 or 32 (Blackwell MMA tile), got 64

# block_size=16 and block_size=32 continue to work as before

Testing

8 new test cases in TestNVFP4BlockSizeValidation:

-test_nvfp4_block_16_accepted: canonical tile passes
-test_nvfp4_block_32_accepted: alternative tile passes
-test_nvfp4_illegal_block_size_rejected[8] through [256]: 5 parametrized illegal values rejected with correct error message
-test_non_nvfp4_block_size_unaffected: INT4 block_size=128 still passes
-test_nvfp4_without_scale_bits_unaffected: MXFP4 (scale_bits=(8,0)) skips constraint
All 60 tests in test_config_validation.py pass (52 existing + 8 new). All existing NVFP4 preset configs (NVFP4_DEFAULT_CFG, etc.) validated against the new constraint with zero regressions.

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed (git commit -s -S).

Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded trust_remote_code=True, torch.load(..., weights_only=False), pickle, etc.).

Is this change backward compatible?: ✅
If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: N/A
Did you write any new necessary tests?: ✅
Did you update Changelog?: ✅
Did you get Claude approval on this PR?: N/A

Additional Information

7 lines added to modelopt/torch/quantization/config.py, 85 lines added to tests/unit/torch/quantization/test_config_validation.py, 4 lines added to CHANGELOG.rst. Zero lines modified, zero lines deleted, zero new imports, zero new functions.

Summary by CodeRabbit

Bug Fixes
- Enforced NVFP4 block_size values to only 16 or 32 at configuration creation, preventing invalid configurations that could later corrupt scale data during export.
Tests
- Added unit tests covering NVFP4 block_size validation, ensuring correct acceptance of 16/32 and rejection of invalid values while not affecting non-NVFP4 configurations.

NVFP4 quantization (num_bits=(2,1), scale_bits=(4,3)) targets Blackwell MMA tiles which are hardwired for block sizes of 16 or 32 elements. Prior to this change, illegal block sizes (e.g. 64, 128) passed validation silently, corrupting scale tensors at export time after wasting GPU hours on calibration. Add a guard in QuantizerAttributeConfig.validate_block_sizes that rejects any integer block_size not in {16, 32} when the NVFP4 signature is detected. Non-NVFP4 formats are unaffected. Signed-off-by: Mehdi Makroumi <134870510+makroumi@users.noreply.github.com>

copy-pr-bot · 2026-05-07T23:31:15Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2026-05-07T23:31:26Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 1c75f5bc-5206-4b35-9456-05d1f32f06cd

📥 Commits

Reviewing files that changed from the base of the PR and between 4fcb798 and b6852b0.

📒 Files selected for processing (3)

CHANGELOG.rst
modelopt/torch/quantization/config.py
tests/unit/torch/quantization/test_config_validation.py

✅ Files skipped from review due to trivial changes (1)

CHANGELOG.rst

🚧 Files skipped from review as they are similar to previous changes (2)

tests/unit/torch/quantization/test_config_validation.py
modelopt/torch/quantization/config.py

📝 Walkthrough

Walkthrough

Adds NVFP4-specific validation that restricts quantization block_size values to 16 or 32 at config construction, accompanies it with unit tests covering valid/invalid and non-applicable cases, and documents the change in the 0.45 changelog.

Changes

NVFP4 Block Size Validation

Layer / File(s)	Summary
Validation Constraint `modelopt/torch/quantization/config.py`	`QuantizerAttributeConfig.validate_block_sizes` adds an NVFP4-specific check: when `num_bits == (2, 1)` and `scale_bits == (4, 3)`, integer block_size values must be `16` or `32`.
Test Coverage `tests/unit/torch/quantization/test_config_validation.py`	`TestNVFP4BlockSizeValidation` verifies that NVFP4 configs accept `16`/`32`, reject other values with a `ValidationError` matching the NVFP4 message, and that non-NVFP4 or mismatched `scale_bits` cases are unaffected.
Release Documentation `CHANGELOG.rst`	Adds a 0.45 bug-fix entry documenting that NVFP4 `block_size` is now validated at Pydantic config construction time and limited to 16 or 32.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 6

✅ Passed checks (6 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately and specifically describes the main change: enforcing block-size validation for NVFP4 by requiring alignment with Blackwell MX-unit constraints (16 or 32), which directly matches the core objective of the PR.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Security Anti-Patterns	✅ Passed	No security anti-patterns detected. PR contains only validation logic enhancements and unit tests. No torch.load, numpy.load, trust_remote_code, eval/exec, nosec comments, or new dependencies present.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

Generate code and open pull requests
Plan features and break down work
Investigate incidents and troubleshoot customer tickets together
Automate recurring tasks and respond to alerts with triggers
Summarize progress and report instantly

Built for teams:

Shared memory across your entire org—no repeating context
Per-thread sandboxes to safely plan and execute work
Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@modelopt/torch/quantization/config.py`:
- Around line 458-464: The Pydantic validator validate_block_sizes in
modelopt.torch.quantization.config uses Python assert statements which are
removed under -O; replace each assert with an explicit exception (e.g., raise
ValueError with the same message) so validation always runs, including the NVFP4
check that currently asserts "_v in (16, 32)" and the other asserts in
validate_block_sizes (the checks at the locations referenced around the NVFP4
block). Update all assertions in validate_block_sizes to raise ValueError with
clear messages preserving the original assertion text and ensure any
loop/conditional logic and return behavior remains unchanged.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: b2f49d17-e721-4201-b641-7be9d07a287e

📥 Commits

Reviewing files that changed from the base of the PR and between 6a3b6b8 and 4fcb798.

📒 Files selected for processing (3)

CHANGELOG.rst
modelopt/torch/quantization/config.py
tests/unit/torch/quantization/test_config_validation.py

makroumi requested a review from a team as a code owner May 7, 2026 23:31

makroumi requested a review from Edwardf0t1 May 7, 2026 23:31

makroumi force-pushed the nvfp4-block-size-validation branch from 4fcb798 to 19d26f8 Compare May 7, 2026 23:35

coderabbitai Bot reviewed May 7, 2026

View reviewed changes

Comment thread modelopt/torch/quantization/config.py

Merge branch 'main' into nvfp4-block-size-validation

b6852b0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(quantization): enforce Blackwell MX-unit alignment for NVFP4 block-size validation#1413

fix(quantization): enforce Blackwell MX-unit alignment for NVFP4 block-size validation#1413
makroumi wants to merge 2 commits intoNVIDIA:mainfrom
makroumi:nvfp4-block-size-validation

makroumi commented May 7, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

copy-pr-bot Bot commented May 7, 2026

Uh oh!

coderabbitai Bot commented May 7, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

makroumi commented May 7, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Usage

Testing

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Uh oh!

copy-pr-bot Bot commented May 7, 2026

Uh oh!

coderabbitai Bot commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

makroumi commented May 7, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 7, 2026 •

edited

Loading