Skip to content

[OMNIML-3689] PTQ quant_cfg semantic correction. Design in doc _quant_cfg.rst#1094

Open
shengliangxu wants to merge 28 commits intomainfrom
shengliangx/quant_cfg-list
Open

[OMNIML-3689] PTQ quant_cfg semantic correction. Design in doc _quant_cfg.rst#1094
shengliangxu wants to merge 28 commits intomainfrom
shengliangx/quant_cfg-list

Conversation

@shengliangxu
Copy link
Contributor

@shengliangxu shengliangxu commented Mar 22, 2026

What does this PR do?

Summary

Redesigns the quant_cfg configuration format in ModelOpt's PyTorch quantization stack, replacing the previous dict-based format with an ordered list of typed QuantizerCfgEntry dicts.

Motivation

The old quant_cfg dict had several pain points:

  • Ambiguous precedence: no explicit way to reason about which entry wins when multiple keys match a quantizer
  • Mixed key namespaces: wildcard paths and PyTorch class names lived in the same dict level, requiring ad-hoc dispatch
  • Magic "default" key: an implicit, undocumented catch-all that was easy to misuse
  • Poor composability: merging two configs required dict updates that silently discarded keys
  • No YAML round-trip fidelity: the nested structure couldn't be expressed cleanly in YAML
New format

quant_cfg is now an ordered list of QuantizerCfgEntry TypedDicts. Each entry has:

  • quantizer_path (required): fnmatch wildcard matched against quantizer module names
  • cfg (optional): dict (or list of dicts) of QuantizerAttributeConfig fields
  • enable (optional): toggles quantizer on/off independently of cfg
  • parent_class (optional): restricts match to quantizers inside the given PyTorch module class

Entries are applied in list order; later entries override earlier ones. The canonical pattern is deny-all first (_base_disable_all), then selectively re-enable, then apply standard exclusions (_default_disabled_quantizer_cfg).

Changes
  • config.py: QuantizerCfgEntry TypedDict and normalize_quant_cfg_list() added; _default_disabled_quantizer_cfg / _mamba_moe_disabled_quantizer_cfg converted from dicts to lists; all built-in configs (INT8_DEFAULT_CFG, FP8_DEFAULT_CFG, NVFP4_DEFAULT_CFG, etc.) converted to list format; two QuantizeConfig field validators added for normalization and early validation
  • conversion.py: set_quantizer_by_cfg rewritten to iterate the list directly; set_quantizer_attributes_full and set_quantizer_attributes_partial distinguished; parent_class resolution uses new _DMRegistryCls.get_nn_class() against the original nn.Module class
  • dynamic.py: _DMRegistryCls.get_nn_class() added to resolve string class names to their original nn.Module subclass for isinstance checks
  • tensor_quantizer.py: set_from_attribute_config signature narrowed to QuantizerAttributeConfig; uses model_dump() (not exclude_unset) to enforce entry atomicity — unset fields reset to defaults rather than inheriting from prior entries
  • YAML recipes: all modelopt_recipes/general/ptq/*.yml files updated to the new list format
  • Docs: new docs/source/guides/_quant_cfg.rst covering entry format, ordering, atomicity, and common patterns
Backward compatibility

normalize_quant_cfg_list() is called automatically by the QuantizeConfig Pydantic validator, so existing code using the old single-key dict format ({"*weight_quantizer": {"num_bits": 8}}) continues to work without modification.

Test coverage

  • Unit testing: new unit tests to cover the semantic changes
  • system testing:
python examples/llm_ptq/hf_ptq.py \
      --model Qwen/Qwen3-8B  \
      --recipe general/ptq/fp8_default-fp8_kv \
      --export_path=build/fp8_default-fp8_kv42  \
      --calib_size=16 \
      --batch_size=0 \
      --trust_remote_code 
      --export_fmt=hf

Additional Information

Summary by CodeRabbit

  • New Features

    • Introduced an ordered, list-based quantization config format with explicit rule entries and parent-class selectors.
  • Documentation

    • Added a comprehensive guide for the new quant_cfg schema and updated examples across guides and recipes.
  • Refactor

    • Migrated configs, examples, and APIs to the new list-oriented quant_cfg model for predictable precedence and composition.
  • Tests

    • Updated test suites and examples to use and validate the new quant_cfg structure.

@copy-pr-bot
Copy link

copy-pr-bot bot commented Mar 22, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 22, 2026

📝 Walkthrough

Walkthrough

Refactors quantization configuration from a dictionary-of-patterns schema to an ordered list-of-entry schema (entries with quantizer_path, optional parent_class, optional cfg, optional enable). Updates core quantization internals, conversion APIs, recipes, examples, tests, and docs to use, normalize, and validate the new list format.

Changes

Cohort / File(s) Summary
Core Config & Validation
modelopt/torch/quantization/config.py
Replaced dict-based quant_cfg with list[QuantizerCfgEntry]. Added QuantizerCfgEntry TypedDict, normalize_quant_cfg_list(), Pydantic validators, and stricter validation (entries must include cfg or enable). Updated defaults and canonicalization logic.
Conversion API
modelopt/torch/quantization/conversion.py
set_quantizer_by_cfg now accepts normalized list configs; added set_quantizer_attributes_full and set_quantizer_attributes_partial; introduced _match_quantizer; split full vs partial attribute application; context manager typing updated.
Algorithms & Recipe Integration
modelopt/torch/quantization/algorithms.py, modelopt/torch/quantization/compress.py
Adapted estimation and recipe generation to read cfg from list entries; updated KV-cache handling to append list entries; switched some callers to set_quantizer_attributes_partial.
TensorQuantizer / Sequential Handling
modelopt/torch/quantization/nn/modules/tensor_quantizer.py
Tightened type annotations; added attribute setters for axis and block_sizes that update calibrator state; narrowed SequentialQuantizer.set_from_attribute_config to list forms.
Backend Availability Checks
modelopt/torch/quantization/backends/fp8_per_tensor_gemm.py, modelopt/torch/quantization/backends/nvfp4_gemm.py
Now locate *input_quantizer/*weight_quantizer by scanning list entries and assert extracted cfgs are dicts before validation.
Conversion Entry Points & Model APIs
modelopt/torch/quantization/model_quant.py, modelopt/torch/quantization/model_calib.py, modelopt/torch/export/unified_export_hf.py
Updated runtime/example calls to pass list-based configs to set_quantizer_by_cfg/context managers; changed disable/enable helpers to use partial attribute setter.
Conversion Consumer Utilities
modelopt/torch/quantization/utils/core_utils.py
disable_lora_quantizers_in_config and update_quant_cfg_with_kv_cache_quant now append/concatenate list entries and normalize shapes; KV-cache param typing updated.
ONNX / Export Utilities
modelopt/onnx/llm_export_utils/quantization_utils.py
get_quant_config rebuilds quant_cfg as a list and appends lm_head overrides as list entries.
Examples & Notebooks
examples/... (diffusers, llm_autodeploy, llm_eval, llm_ptq, llm_qat, vllm_serve, windows/...)
Converted example configs and helper logic from dict-based quant_cfg to list-based entries; updated pattern lookups, appends, calibration/context calls, and some function signatures (e.g., MLA KV config).
Recipes (YAML)
modelopt_recipes/general/ptq/*.yml
Refactored recipe quant_cfg sections from key-mapped forms to ordered lists of rule objects (quantizer_path, enable, cfg, optional parent_class); replaced implicit default with explicit catch-all entries.
Tests & Test Utilities
tests/_test_utils/..., tests/.../torch/quantization/*
Migrated test fixtures/helpers to list-based quant_cfg, updated many call sites to use set_quantizer_attributes_partial and new traversal logic; added unit tests for normalize_quant_cfg_list() and set_quantizer_attributes_full behavior; updated assertions to iterate/search list entries.
Docs
docs/source/guides/1_quantization.rst, docs/source/guides/_pytorch_quantization.rst, docs/source/guides/_quant_cfg.rst
Added _quant_cfg.rst documenting the new list schema, entry fields, validation rules, precedence and atomicity semantics; updated PyTorch guide examples to deep-copy defaults and use list-based quant_cfg; included new page in toctree.
Misc Examples/Helpers
examples/diffusers/quantization/config.py, examples/llm_ptq/*, examples/vllm_serve/fakequant_worker.py, examples/windows/...
Reworked module-level default config constants and helper functions to produce and mutate list-form quant_cfg; replaced direct dict indexing with list searches and appends.

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

🚥 Pre-merge checks | ✅ 3 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 60.40% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title directly and clearly describes the primary change: redesigning the PTQ quant_cfg format from dict-based to list-based, with documentation added in _quant_cfg.rst.
Security Anti-Patterns ✅ Passed The pull request introduces no new security anti-patterns. Changes are configuration schema refactoring with safe normalization and validation functions.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch shengliangx/quant_cfg-list

Comment @coderabbitai help to get the list of available commands and usage tips.

Right now the quant_cfg is a dict, but we are using the quant_cfg as if
it is a list. When we apply the quant_cfg, we enumerate the items in the
dict and apply the config one by one in
modelopt/torch/quantization/conversion.py. This implementation actually
has the semantic that the latter configs has higher precedence than the
former configs. However, dicts do not have reliable ordering.

Therefore, we make quant_cfg a list of patterns:

1. The latter config patterns have higher precedence. A latter config in
   the list overrides a fomer config if they target the same module.

2. A config to each module is atomic, each config provides the full
   information. We do not compose a quant module config from multiple
   config lines

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
…s_partial

set_quantizer_attributes_full updates the full quantizer attributes, it
has the atomic semantic

set_quantizer_attributes_partial updates just a partial set of quantizer
attributes, it has the merge semantic

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
@shengliangxu shengliangxu force-pushed the shengliangx/quant_cfg-list branch from 192ea05 to fb3bb07 Compare March 23, 2026 00:19
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
@codecov
Copy link

codecov bot commented Mar 23, 2026

Codecov Report

❌ Patch coverage is 83.76963% with 31 lines in your changes missing coverage. Please review.
✅ Project coverage is 70.25%. Comparing base (b61fb4e) to head (3a3b112).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
modelopt/torch/quantization/algorithms.py 75.00% 8 Missing ⚠️
modelopt/torch/quantization/utils/core_utils.py 14.28% 6 Missing ⚠️
modelopt/torch/quantization/backends/nvfp4_gemm.py 0.00% 5 Missing ⚠️
modelopt/torch/quantization/conversion.py 90.74% 5 Missing ⚠️
modelopt/torch/quantization/config.py 94.59% 4 Missing ⚠️
modelopt/torch/quantization/model_quant.py 50.00% 2 Missing ⚠️
modelopt/torch/quantization/compress.py 66.66% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff            @@
##             main    #1094    +/-   ##
========================================
  Coverage   70.24%   70.25%            
========================================
  Files         227      227            
  Lines       25909    26014   +105     
========================================
+ Hits        18201    18277    +76     
- Misses       7708     7737    +29     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
@shengliangxu shengliangxu force-pushed the shengliangx/quant_cfg-list branch from 4f38294 to 5115452 Compare March 23, 2026 03:18
@shengliangxu shengliangxu marked this pull request as ready for review March 23, 2026 04:11
@shengliangxu shengliangxu requested review from a team as code owners March 23, 2026 04:11
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 20

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (5)
examples/llm_autodeploy/run_auto_quantize.py (1)

134-143: ⚠️ Potential issue | 🔴 Critical

Remove hardcoded trust_remote_code=True in HF model and tokenizer loading.

Both AutoModelForCausalLM.from_pretrained() (line 134) and AutoTokenizer.from_pretrained() (line 142) hardcode trust_remote_code=True, creating a critical RCE vector from untrusted model repositories. Expose this as a user-controlled parameter defaulting to False in the modelopt_ptq() function signature.

🔒 Proposed fix
 def modelopt_ptq(
     model_path: str,
     output_dir: str,
     qformat: str | None = None,
     num_samples: int = 512,
     auto_quantize_bits: float | None = None,
     calib_dataset: str = "cnn_dailymail",
     calib_batch_size: int = 8,
+    trust_remote_code: bool = False,
 ) -> torch.nn.Module:
     """Quantize the model with modelopt."""
     model = AutoModelForCausalLM.from_pretrained(
-        model_path, trust_remote_code=True, torch_dtype="auto", device_map="auto"
+        model_path,
+        trust_remote_code=trust_remote_code,
+        torch_dtype="auto",
+        device_map="auto",
     )
     model.eval()
 
     tokenizer = AutoTokenizer.from_pretrained(
         model_path,
         model_max_length=2048,
         padding_side="left",
-        trust_remote_code=True,
+        trust_remote_code=trust_remote_code,
     )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/llm_autodeploy/run_auto_quantize.py` around lines 134 - 143, The
code currently hardcodes trust_remote_code=True when calling
AutoModelForCausalLM.from_pretrained and AutoTokenizer.from_pretrained, creating
an RCE risk; update the modelopt_ptq(...) function signature to accept a new
parameter (e.g., trust_remote_code: bool = False) and replace the hardcoded True
with that parameter for both AutoModelForCausalLM.from_pretrained and
AutoTokenizer.from_pretrained so callers can opt-in explicitly while defaulting
to False.
examples/llm_ptq/example_utils.py (1)

208-247: ⚠️ Potential issue | 🟠 Major

Normalize quant_cfg["quant_cfg"] before mutating it.

Lines 210 and 244-247 now assume the new list schema. If a caller still passes a legacy dict-style config, the AWQ branch will fail before mtq.quantize() reaches the backward-compat normalizer, and the phi4mm branch will hit .append() on a dict. Even in list form, walking from the front can rewrite a shadowed *weight_quantizer rule instead of the effective last match.

💡 Suggested fix
+from modelopt.torch.quantization.config import normalize_quant_cfg_list
@@
 def build_quant_cfg(
@@
 ) -> dict[str, Any]:
     quant_cfg = copy.deepcopy(quant_cfg)
+    quant_cfg["quant_cfg"] = normalize_quant_cfg_list(quant_cfg["quant_cfg"])
     if "awq" in str(quant_cfg.get("algorithm")):
         weight_quantizer_entry = next(
             e
-            for e in quant_cfg["quant_cfg"]
-            if isinstance(e, dict) and e.get("quantizer_path") == "*weight_quantizer"
+            for e in reversed(quant_cfg["quant_cfg"])
+            if e.get("quantizer_path") == "*weight_quantizer"
         )
         weight_quantizer = weight_quantizer_entry.get("cfg", {})
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/llm_ptq/example_utils.py` around lines 208 - 247, The code mutates
quant_cfg["quant_cfg"] assuming it's a list; normalize quant_cfg["quant_cfg"] to
the list schema up front (convert legacy dict form into the new list form)
before any mutations so weight_quantizer lookup and .append() calls are safe;
when locating the weight quantizer use a reverse-scan (e.g., search from the
end) to find the effective "*weight_quantizer" entry referenced by
weight_quantizer_entry so awq_block_size handling (weight_quantizer_entry /
weight_quantizer) and the model_type == "phi4mm" .append() calls operate on the
correct, normalized list.
modelopt/onnx/llm_export_utils/quantization_utils.py (1)

57-111: ⚠️ Potential issue | 🟠 Major

Copy the preset before rewriting quant_cfg.

Line 111 mutates mtq.FP8_DEFAULT_CFG / mtq.NVFP4_DEFAULT_CFG / mtq.INT4_AWQ_CFG in place. With list-based rules, repeated calls can now retain or duplicate previously appended lm_head entries, so get_quant_config() becomes order-dependent across calls in the same process.

💡 Suggested fix
+import copy
 import time
@@
     if precision == "fp8":
-        quant_cfg = mtq.FP8_DEFAULT_CFG
+        quant_cfg = copy.deepcopy(mtq.FP8_DEFAULT_CFG)
 
     elif precision == "nvfp4":
-        quant_cfg = mtq.NVFP4_DEFAULT_CFG
+        quant_cfg = copy.deepcopy(mtq.NVFP4_DEFAULT_CFG)
 
     elif precision == "int4_awq":
-        quant_cfg = mtq.INT4_AWQ_CFG
+        quant_cfg = copy.deepcopy(mtq.INT4_AWQ_CFG)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/onnx/llm_export_utils/quantization_utils.py` around lines 57 - 111,
get_quant_config currently mutates the shared presets mtq.FP8_DEFAULT_CFG /
mtq.NVFP4_DEFAULT_CFG / mtq.INT4_AWQ_CFG in place which causes duplicated
lm_head entries on repeated calls; fix by making a deep copy of the chosen
preset (e.g., import copy and do quant_cfg = copy.deepcopy(mtq.FP8_DEFAULT_CFG)
/ copy.deepcopy(mtq.NVFP4_DEFAULT_CFG) / copy.deepcopy(mtq.INT4_AWQ_CFG) inside
get_quant_config) before building quant_cfg_list and appending lm_head entries
so the original mtq presets remain unchanged.
modelopt/torch/quantization/conversion.py (1)

423-454: ⚠️ Potential issue | 🟠 Major

Reject sequential cfgs here or restore full module topology.

This context manager snapshots only TensorQuantizer properties. If a temporary config contains a list cfg, set_quantizer_by_cfg() can replace a TensorQuantizer with a SequentialQuantizer, and the exit path never converts it back. The docstring already says sequential cfg lists are unsupported here, but that isn’t enforced.

🧯 Suggested fix
     quant_cfg = normalize_quant_cfg_list(quant_cfg)
+    if any(isinstance(entry["cfg"], list) for entry in quant_cfg):
+        raise ValueError("set_quantizer_by_cfg_context does not support sequential cfg lists.")
 
     original_attributes = {}
modelopt/torch/quantization/config.py (1)

541-545: ⚠️ Potential issue | 🔴 Critical

Remove enable from _nvfp4_cfg_bs32.

The nested "enable": True in _nvfp4_cfg_bs32 (line 545) causes a duplicate keyword argument error at runtime. When _nvfp4_selective_quant_cfg() passes this dict as the cfg field in QuantizerCfgEntry (line 559), the conversion code later unpacks it with QuantizerAttributeConfig(**cfg, enable=enable) (conversion.py line 266). This passes enable twice and raises TypeError before conversion can complete.

Suggested fix
 _nvfp4_cfg_bs32 = {
     "num_bits": (2, 1),
     "block_sizes": {-1: 32, "type": "dynamic", "scale_bits": (4, 3)},
-    "enable": True,
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/quantization/config.py` around lines 541 - 545, The dict
_nvfp4_cfg_bs32 contains an internal "enable" key which causes a duplicate
keyword when later unpacked into QuantizerAttributeConfig; remove the "enable"
entry from _nvfp4_cfg_bs32 so it only contains its attribute settings (num_bits,
block_sizes, scale_bits/type entries) and let the caller
(_nvfp4_selective_quant_cfg -> QuantizerCfgEntry) supply the enable flag
separately; specifically edit the _nvfp4_cfg_bs32 constant to drop the "enable":
True key so that conversion code that does QuantizerAttributeConfig(**cfg,
enable=enable) no longer passes enable twice.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/source/guides/_pytorch_quantization.rst`:
- Around line 260-272: The example references _default_disabled_quantizer_cfg
but doesn't define or import it; fix by either adding an import for it (e.g.,
from mtq.config import _default_disabled_quantizer_cfg) at the top of the
snippet or qualify the reference inline as
mtq.config._default_disabled_quantizer_cfg so the example is self-contained and
runnable; update the code block containing "quant_cfg" to use the qualified name
or add the import near the example.

In `@docs/source/guides/_quant_cfg.rst`:
- Around line 123-143: The docs currently state that _base_disable_all (a.k.a.
the deny-all entry) is prepended to every built-in config; update the text to
accurately say that the deny-all entry (referenced as _base_disable_all or
mtq.config._default_disabled_quantizer_cfg) is included in most built-in configs
but not all, and call out that some specialized recipes (e.g., the KV-only
configs in the quantization config set) intentionally omit the deny-all prelude;
adjust the example wording and the note to avoid the absolute "every built-in
config" claim and add a short parenthetical pointer to the specialized recipes
that omit it (referencing their config names such as the KV-only configs).

In `@examples/diffusers/quantization/config.py`:
- Around line 106-109: The loop that injects trt_high_precision_dtype into
quant_config["quant_cfg"] only handles when cfg is a dict (variable p) and skips
the case where cfg is a list of attribute dicts; update the logic in the loop
that iterates quant_config["quant_cfg"] to detect if p is a list and, when so,
iterate its elements and for each element that is a dict and contains "num_bits"
but not "trt_high_precision_dtype", set element["trt_high_precision_dtype"] =
trt_high_precision_dtype; keep the existing dict-handling branch for backward
compatibility so both dict and list cfg forms are supported.
- Around line 128-147: The current append in reset_set_int8_config causes
duplicate entries in quant_config["quant_cfg"]; instead, make the operation
idempotent by searching quant_config["quant_cfg"] for an existing dict whose
"quantizer_path" equals aq_name and replace that dict in-place (or update its
"cfg") if found, otherwise append the new entry; modify the code around the
append that builds the payload (the block referencing aq_name,
quant_config["quant_cfg"], and the PercentileCalibrator tuple) to perform a
find-and-replace by quantizer_path rather than always appending.

In `@examples/llm_eval/quantization_utils.py`:
- Around line 36-49: The loop that detects BMM-attention entries fails because
mtq_cfg["quant_cfg"] is now a list of dicts; update _quantize_model_with_dataset
to handle both list and dict formats by inspecting each entry: if an entry is a
dict, read its "quantizer_path" (or keys for legacy string entries) and test
whether the path string contains "bmm_quantizer" or matches "*bmm_quantizer"; if
mtq_cfg["quant_cfg"] is still a dict or contains raw strings, preserve the old
membership check. Ensure this detection is used before calling
register_hf_attentions_on_the_fly(net) so KV-attention quantizers targeted via
"quantizer_path": "*bmm_quantizer" are not skipped.

In `@examples/llm_ptq/hf_ptq.py`:
- Line 332: The code is using attribute access on plain dicts in
_default_disabled_quantizer_cfg which will raise AttributeError; update the
disabled_layers construction to access the dict key "quantizer_path" for each
entry (i.e., replace entry.quantizer_path with entry["quantizer_path"]) so that
disabled_layers is built from the correct dictionary key.

In `@examples/llm_ptq/notebooks/2_PTQ_AWQ_Calibration.ipynb`:
- Around line 193-198: The code fails because quant_cfg["quant_cfg"] contains
dict entries, not 2-tuples; change the extraction to find the dict whose
quantizer_path (or appropriate key) equals "*weight_quantizer" and then take its
"cfg" field (e.g. use next(entry["cfg"] for entry in quant_cfg["quant_cfg"] if
entry.get("quantizer_path") == "*weight_quantizer")), then handle the case where
weight_quantizer is a list as before and set weight_quantizer["block_sizes"][-1]
= 128; update references to quant_cfg, weight_quantizer, and
quant_cfg["quant_cfg"] accordingly.

In `@examples/vllm_serve/fakequant_worker.py`:
- Around line 173-184: The code currently finds only the first kv rule
(kv_entry) and appends minimal clones at the end losing order and full fields;
instead iterate over kv_quant_cfg with indexes, find every entry where
isinstance(e, dict) and e.get("quantizer_path") == "*[kv]_bmm_quantizer", and
for each matching entry insert (not append) two full deep-copied clones of that
entire entry right after the original entry with quantizer_path set to
"*kv_c_bmm_quantizer" and "*k_pe_bmm_quantizer" respectively so you preserve
list order and all other fields (enable, parent_class, etc.) — operate on
kv_quant_cfg and use copy.deepcopy to clone the whole entry before changing
quantizer_path.

In
`@examples/windows/torch_onnx/diffusers/qad_example/sample_example_qad_diffusers.py`:
- Around line 260-271: The quant_cfg list currently uses tuple entries and
embeds "enable" inside the quantizer config dict (_nvfp4_cfg), which fails
validation in normalize_quant_cfg_list(); change quant_cfg to a list of
QuantizerCfgEntry dicts where each entry is e.g. {"quantizer_path": "<pattern>",
"cfg": <cfg_dict>} for enabled entries (move "enable" out of _nvfp4_cfg) and
{"quantizer_path": "<pattern>", "enable": False} for disabled patterns (use
SENSITIVE_LAYER_PATTERNS and the f"*transformer_blocks.{i}.*" patterns for
exclude_blocks), ensuring _nvfp4_cfg no longer contains a top-level "enable" key
and only used as the "cfg" value.

In `@modelopt/torch/quantization/algorithms.py`:
- Around line 1333-1344: The loop in get_auto_quantize_config is calling
_match_quantizer_cfg with only the short names
("input_quantizer"/"weight_quantizer"), so path- or parent_class-scoped
selectors never match and enable-only matches are dropped; change the call to
pass the fully qualified quantizer name (e.g.,
f"{module_name}.{quantizer_attr}") to _match_quantizer_cfg and ensure the caller
appends an entry when either matched_cfg is not None or matched_enable is True
(so enable-only matches are preserved); apply the same fix for the analogous
block that starts at the later occurrence (the other loop referencing
_match_quantizer_cfg, module_name, and quantizer_attr).
- Around line 96-98: The code currently drops the entry-level "enable" flag and
estimates compression from every entry's "cfg"; change the selection to only
include cfgs from QuantizerCfgEntry items where e.get("enable", True) is truthy
and the "cfg" is not None. Concretely, in the block using quant_cfg.quant_cfg
and calling estimate_quant_compression_for_quantizer, filter entries by
e.get("enable", True) before extracting e.get("cfg") (and still skip None cfgs)
so estimate_quant_compression_for_quantizer receives only enabled cfgs.

In `@modelopt/torch/quantization/backends/fp8_per_tensor_gemm.py`:
- Around line 100-116: The availability probe currently uses next(...) without a
default and assertions on input_cfg/weight_cfg which can raise
StopIteration/AssertionError; change the two next(...) calls that produce
input_cfg and weight_cfg to include a default (e.g., next(..., None)) and
replace the assert isinstance(...) checks with explicit conditional checks that
return False from the probe when input_cfg or weight_cfg is missing or not a
dict; ensure you reference FP8_DEFAULT_CFG["quant_cfg"], quant_cfg_list,
input_cfg and weight_cfg in the probe logic so malformed or missing default
entries cause the function to return False instead of throwing.

In `@modelopt/torch/quantization/backends/nvfp4_gemm.py`:
- Around line 216-231: The availability check currently uses next(...) without
defaults and asserts which can raise instead of returning False; update the
_nvfp4_availability_check implementation to retrieve input_cfg and weight_cfg
from quant_cfg_list using next(..., None) (or equivalent), verify they are dicts
with proper "cfg" entries and that their "quantizer_path" matches
"*input_quantizer" / "*weight_quantizer", and if either is missing or not a dict
return False instead of asserting or allowing StopIteration; reference the
symbols quant_cfg_list, input_cfg, weight_cfg and the existing next(...)/assert
lines to locate and change the lookup and validation to defensive checks that
return False on malformed/missing configs.

In `@modelopt/torch/quantization/config.py`:
- Around line 1625-1631: The before-validator normalize_quant_cfg currently
returns non-list inputs unchanged causing legacy dict-style quant_cfg to bypass
normalization; update normalize_quant_cfg to detect dict inputs and pass them
through normalize_quant_cfg_list (which handles dict -> entry conversion via
_dict_to_entry) so legacy {"*weight_quantizer": {...}} shapes are converted to
the expected list form; ensure this change preserves existing behavior for lists
and other non-dict types so downstream callers like need_calibration (which
iterates entries) receive normalized entry lists.

In `@modelopt/torch/quantization/conversion.py`:
- Around line 341-356: The current logic in _match_quantizer branch mutates
existing SequentialQuantizer instances instead of fully replacing them when the
incoming attributes change shape; to fix, ensure full replacement instead of
partial assignment: when attributes is a list, if module is not a
SequentialQuantizer create a new SequentialQuantizer (as before) OR if module is
a SequentialQuantizer but len(attributes) != len(module) replace the existing
module with a new SequentialQuantizer of the correct length before calling
set_from_attribute_config; conversely, when attributes is not a list and module
is a SequentialQuantizer, replace it with a fresh TensorQuantizer instance and
then call set_from_attribute_config on that new instance (use
quant_model.get_submodule/name.rpartition to locate parent_module and setattr to
swap the module).

In `@modelopt/torch/quantization/model_quant.py`:
- Around line 183-191: The docstring for quantize() is out of sync: it still
describes quant_cfg as a dict mapping wildcard keys, but the implementation and
example use a list-of-dicts schema. Update the prose in the quantize() doc block
to describe quant_cfg as a list where each item is a dict with keys like
"quantizer_path" (wildcard pattern), optional "enable" (bool), optional "cfg"
(dict, e.g., {"num_bits": int, "axis": int}), and that multiple entries are
applied in order; also mention the top-level "algorithm" field (e.g., "max") and
how entries such as
{"quantizer_path":"*weight_quantizer","cfg":{"num_bits":8,"axis":0}} and
{"quantizer_path":"*input_quantizer","cfg":{"num_bits":8,"axis":-1}} are
interpreted by quantize()/quant_cfg.

In `@modelopt/torch/quantization/utils/core_utils.py`:
- Around line 835-836: The code is appending dict.items() tuples into
quant_cfg["quant_cfg"], creating mixed tuples instead of the expected
list[QuantizerCfgEntry] and can error when quant_cfg["quant_cfg"] is a dict;
normalize both sources into proper QuantizerCfgEntry dicts: ensure inner is a
list (if quant_cfg.get("quant_cfg") is a dict, convert it to a list of entries),
and transform kv_cache_quant_cfg (the dict of pattern->cfg) into a list of
{"quantizer_path": pattern, ...cfg} or at least {"quantizer_path": pattern,
"enable": cfg.get("enable", False)} entries before concatenating; assign the
concatenated list back to quant_cfg["quant_cfg"] so downstream consumers only
see a homogeneous list[QuantizerCfgEntry].

In `@tests/_test_utils/torch/quantization/quantize_common.py`:
- Around line 50-61: The loop over config["quant_cfg"] looks for a
"*weight_quantizer" entry and mutates its "block_sizes" but silently does
nothing if no match is found; add a fail-fast check by tracking whether a
matching entry was found (e.g., set a boolean flag when pat ==
"*weight_quantizer") and after the loop raise an explicit error (ValueError or
AssertionError) if no match was found so tests fail early rather than silently
proceeding without applying block_size to the intended entry.

In `@tests/gpu/torch/quantization/test_quantize_cuda.py`:
- Around line 133-135: The test is mutating the shared constant
mtq.FP8_2D_BLOCKWISE_WEIGHT_ONLY_CFG by editing config["quant_cfg"] in place;
instead, create a deep copy of that constant (e.g., via copy.deepcopy) into the
local variable config before patching so changes to entry.setdefault("cfg",
{})["block_sizes"] = {-1: 8, -2: 8} only affect this test; locate the code that
assigns config from mtq.FP8_2D_BLOCKWISE_WEIGHT_ONLY_CFG and replace it with a
deep-copied version so modifications to quant_cfg/ block_sizes/quantizer_path do
not leak to other parametrized cases.

In `@tests/unit/torch/quantization/test_autoquant.py`:
- Around line 492-507: The current assertion allows tuple-style payloads by
falling back to entry[0]/entry[1]; restrict the check to dict-style
QuantizerCfgEntry entries only by iterating over config["quant_cfg"] and
filtering with isinstance(entry, dict), then assert that some dict has
entry.get("quantizer_path") == "*" and entry.get("enable") is False; update the
assertion around the quant_cfg check in
tests/unit/torch/quantization/test_autoquant.py accordingly (referencing
config["quant_cfg"] and the dict keys "quantizer_path" and "enable").

---

Outside diff comments:
In `@examples/llm_autodeploy/run_auto_quantize.py`:
- Around line 134-143: The code currently hardcodes trust_remote_code=True when
calling AutoModelForCausalLM.from_pretrained and AutoTokenizer.from_pretrained,
creating an RCE risk; update the modelopt_ptq(...) function signature to accept
a new parameter (e.g., trust_remote_code: bool = False) and replace the
hardcoded True with that parameter for both AutoModelForCausalLM.from_pretrained
and AutoTokenizer.from_pretrained so callers can opt-in explicitly while
defaulting to False.

In `@examples/llm_ptq/example_utils.py`:
- Around line 208-247: The code mutates quant_cfg["quant_cfg"] assuming it's a
list; normalize quant_cfg["quant_cfg"] to the list schema up front (convert
legacy dict form into the new list form) before any mutations so
weight_quantizer lookup and .append() calls are safe; when locating the weight
quantizer use a reverse-scan (e.g., search from the end) to find the effective
"*weight_quantizer" entry referenced by weight_quantizer_entry so awq_block_size
handling (weight_quantizer_entry / weight_quantizer) and the model_type ==
"phi4mm" .append() calls operate on the correct, normalized list.

In `@modelopt/onnx/llm_export_utils/quantization_utils.py`:
- Around line 57-111: get_quant_config currently mutates the shared presets
mtq.FP8_DEFAULT_CFG / mtq.NVFP4_DEFAULT_CFG / mtq.INT4_AWQ_CFG in place which
causes duplicated lm_head entries on repeated calls; fix by making a deep copy
of the chosen preset (e.g., import copy and do quant_cfg =
copy.deepcopy(mtq.FP8_DEFAULT_CFG) / copy.deepcopy(mtq.NVFP4_DEFAULT_CFG) /
copy.deepcopy(mtq.INT4_AWQ_CFG) inside get_quant_config) before building
quant_cfg_list and appending lm_head entries so the original mtq presets remain
unchanged.

In `@modelopt/torch/quantization/config.py`:
- Around line 541-545: The dict _nvfp4_cfg_bs32 contains an internal "enable"
key which causes a duplicate keyword when later unpacked into
QuantizerAttributeConfig; remove the "enable" entry from _nvfp4_cfg_bs32 so it
only contains its attribute settings (num_bits, block_sizes, scale_bits/type
entries) and let the caller (_nvfp4_selective_quant_cfg -> QuantizerCfgEntry)
supply the enable flag separately; specifically edit the _nvfp4_cfg_bs32
constant to drop the "enable": True key so that conversion code that does
QuantizerAttributeConfig(**cfg, enable=enable) no longer passes enable twice.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: fcb69c76-5a46-4dd0-82f2-905613c6cc83

📥 Commits

Reviewing files that changed from the base of the PR and between 08e5f92 and 5115452.

📒 Files selected for processing (53)
  • docs/source/guides/1_quantization.rst
  • docs/source/guides/_pytorch_quantization.rst
  • docs/source/guides/_quant_cfg.rst
  • examples/diffusers/quantization/config.py
  • examples/llm_autodeploy/run_auto_quantize.py
  • examples/llm_eval/quantization_utils.py
  • examples/llm_ptq/example_utils.py
  • examples/llm_ptq/hf_ptq.py
  • examples/llm_ptq/notebooks/2_PTQ_AWQ_Calibration.ipynb
  • examples/llm_ptq/notebooks/3_PTQ_AutoQuantization.ipynb
  • examples/llm_qat/main.py
  • examples/vllm_serve/fakequant_worker.py
  • examples/windows/torch_onnx/diffusers/qad_example/sample_example_qad_diffusers.py
  • modelopt/onnx/llm_export_utils/quantization_utils.py
  • modelopt/torch/export/unified_export_hf.py
  • modelopt/torch/quantization/algorithms.py
  • modelopt/torch/quantization/backends/fp8_per_tensor_gemm.py
  • modelopt/torch/quantization/backends/nvfp4_gemm.py
  • modelopt/torch/quantization/compress.py
  • modelopt/torch/quantization/config.py
  • modelopt/torch/quantization/conversion.py
  • modelopt/torch/quantization/model_calib.py
  • modelopt/torch/quantization/model_quant.py
  • modelopt/torch/quantization/nn/modules/tensor_quantizer.py
  • modelopt/torch/quantization/utils/core_utils.py
  • modelopt/torch/sparsity/attention_sparsity/conversion.py
  • modelopt_recipes/general/ptq/fp8_default-fp8_kv.yml
  • modelopt_recipes/general/ptq/nvfp4_default-fp8_kv.yml
  • modelopt_recipes/general/ptq/nvfp4_mlp_only-fp8_kv.yml
  • modelopt_recipes/general/ptq/nvfp4_omlp_only-fp8_kv.yml
  • tests/_test_utils/torch/export/utils.py
  • tests/_test_utils/torch/quantization/onnx_export.py
  • tests/_test_utils/torch/quantization/quantize_common.py
  • tests/gpu/torch/quantization/test_hadamard.py
  • tests/gpu/torch/quantization/test_quant_rnn_cuda.py
  • tests/gpu/torch/quantization/test_quantize_cuda.py
  • tests/gpu_megatron/torch/peft/plugins/test_megatron_peft.py
  • tests/gpu_megatron/torch/quantization/plugins/test_apex.py
  • tests/gpu_megatron/torch/quantization/plugins/test_megatron.py
  • tests/unit/recipe/test_loader.py
  • tests/unit/torch/quantization/plugins/test_attention_quant.py
  • tests/unit/torch/quantization/plugins/test_huggingface.py
  • tests/unit/torch/quantization/plugins/test_peft.py
  • tests/unit/torch/quantization/test_autoquant.py
  • tests/unit/torch/quantization/test_compute_quantization_mse.py
  • tests/unit/torch/quantization/test_config_validation.py
  • tests/unit/torch/quantization/test_custom_backend.py
  • tests/unit/torch/quantization/test_quant_activations.py
  • tests/unit/torch/quantization/test_quant_batchnorm.py
  • tests/unit/torch/quantization/test_quant_rnn.py
  • tests/unit/torch/quantization/test_quantize_cpu.py
  • tests/unit/torch/quantization/test_quantize_replace.py
  • tests/unit/torch/quantization/test_tensor_quant_cpu.py

Comment on lines +260 to +272
"quant_cfg": [
# Disable all quantizers by default, then enable selectively
{"quantizer_path": "*", "enable": False},

# Configure weight quantizers with 4-bit precision and 128-element blocks
"*weight_quantizer": {"num_bits": 4, "block_sizes": {-1: 128}, "enable": True},
{"quantizer_path": "*weight_quantizer", "cfg": {"num_bits": 4, "block_sizes": {-1: 128}}, "enable": True},

# Configure input quantizers with 8-bit dynamic quantization
"*input_quantizer": {"num_bits": 8, "type": "dynamic", "block_sizes": {-1: None}},
{"quantizer_path": "*input_quantizer", "cfg": {"num_bits": 8, "type": "dynamic", "block_sizes": {-1: None}}},

# Include default disabled quantizer configurations
**_default_disabled_quantizer_cfg,
},
*_default_disabled_quantizer_cfg,
],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Define or qualify _default_disabled_quantizer_cfg in this example.

This snippet now references _default_disabled_quantizer_cfg without importing or qualifying it, so it fails when copied verbatim. Please either add the import or qualify it as mtq.config._default_disabled_quantizer_cfg.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/source/guides/_pytorch_quantization.rst` around lines 260 - 272, The
example references _default_disabled_quantizer_cfg but doesn't define or import
it; fix by either adding an import for it (e.g., from mtq.config import
_default_disabled_quantizer_cfg) at the top of the snippet or qualify the
reference inline as mtq.config._default_disabled_quantizer_cfg so the example is
self-contained and runnable; update the code block containing "quant_cfg" to use
the qualified name or add the import near the example.

Comment on lines +123 to +143
The recommended pattern used by all built-in configs is:

.. code-block:: python
"quant_cfg": [
# 1. Deny all quantizers by default
{"quantizer_path": "*", "enable": False},
# 2. Enable and configure the target quantizers
{"quantizer_path": "*weight_quantizer", "cfg": {"num_bits": 8, "axis": 0}},
{"quantizer_path": "*input_quantizer", "cfg": {"num_bits": 8, "axis": None}},
# 3. Apply standard exclusions last (BatchNorm, LM head, MoE routers, etc.)
*mtq.config._default_disabled_quantizer_cfg,
]
.. note::

The deny-all entry ``{"quantizer_path": "*", "enable": False}`` is available as
:data:`modelopt.torch.quantization.config._base_disable_all` and is prepended to every
built-in config. This ensures quantizers not explicitly targeted remain disabled.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Don’t describe _base_disable_all as universal across built-ins.

Several specialized recipes in modelopt/torch/quantization/config.py intentionally omit the deny-all prelude (for example the KV-only configs). Saying it is prepended to every built-in config will send users to the wrong pattern for those formats.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/source/guides/_quant_cfg.rst` around lines 123 - 143, The docs currently
state that _base_disable_all (a.k.a. the deny-all entry) is prepended to every
built-in config; update the text to accurately say that the deny-all entry
(referenced as _base_disable_all or mtq.config._default_disabled_quantizer_cfg)
is included in most built-in configs but not all, and call out that some
specialized recipes (e.g., the KV-only configs in the quantization config set)
intentionally omit the deny-all prelude; adjust the example wording and the note
to avoid the absolute "every built-in config" claim and add a short
parenthetical pointer to the specialized recipes that omit it (referencing their
config names such as the KV-only configs).

Comment on lines +106 to 109
for entry in quant_config["quant_cfg"]:
p = entry.get("cfg", {})
if isinstance(p, dict) and "num_bits" in p and "trt_high_precision_dtype" not in p:
p["trt_high_precision_dtype"] = trt_high_precision_dtype
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Handle sequential cfg lists when injecting trt_high_precision_dtype.

The new quant_cfg format allows cfg to be a list of attribute dicts, but Line 107 only updates plain dicts. Any sequential entry will skip this dtype injection entirely.

🛠️ Proposed fix
-    for entry in quant_config["quant_cfg"]:
-        p = entry.get("cfg", {})
-        if isinstance(p, dict) and "num_bits" in p and "trt_high_precision_dtype" not in p:
-            p["trt_high_precision_dtype"] = trt_high_precision_dtype
+    for entry in quant_config["quant_cfg"]:
+        cfg = entry.get("cfg")
+        cfg_items = [cfg] if isinstance(cfg, dict) else (cfg or [])
+        for attrs in cfg_items:
+            if "num_bits" in attrs and "trt_high_precision_dtype" not in attrs:
+                attrs["trt_high_precision_dtype"] = trt_high_precision_dtype
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/diffusers/quantization/config.py` around lines 106 - 109, The loop
that injects trt_high_precision_dtype into quant_config["quant_cfg"] only
handles when cfg is a dict (variable p) and skips the case where cfg is a list
of attribute dicts; update the logic in the loop that iterates
quant_config["quant_cfg"] to detect if p is a list and, when so, iterate its
elements and for each element that is a dict and contains "num_bits" but not
"trt_high_precision_dtype", set element["trt_high_precision_dtype"] =
trt_high_precision_dtype; keep the existing dict-handling branch for backward
compatibility so both dict and list cfg forms are supported.

Comment on lines +128 to +147
quant_config["quant_cfg"].append(
{
"quantizer_path": aq_name,
"cfg": {
"num_bits": 8,
"axis": None,
"percentile": percentile,
"total_step": n_steps,
"collect_method": collect_method,
"calibrator": (
PercentileCalibrator,
(),
{
"num_bits": 8,
"axis": None,
"percentile": percentile,
"total_step": n_steps,
"collect_method": collect_method,
},
),
},
),
}
}
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Make reset_set_int8_config() idempotent under list-based quant_cfg.

Line 128 switched this helper from a replace-by-key update to an unconditional append. Reusing the same quant_config object now accumulates duplicate or stale *{name}*input_quantizer* rules, which changes later-match behavior under the new schema.

🧼 Proposed fix
         if isinstance(module, nn.Conv2d):
             aq_name = f"*{name}*input_quantizer*"
+            quant_config["quant_cfg"] = [
+                entry
+                for entry in quant_config["quant_cfg"]
+                if not (isinstance(entry, dict) and entry.get("quantizer_path") == aq_name)
+            ]
             quant_config["quant_cfg"].append(
                 {
                     "quantizer_path": aq_name,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/diffusers/quantization/config.py` around lines 128 - 147, The
current append in reset_set_int8_config causes duplicate entries in
quant_config["quant_cfg"]; instead, make the operation idempotent by searching
quant_config["quant_cfg"] for an existing dict whose "quantizer_path" equals
aq_name and replace that dict in-place (or update its "cfg") if found, otherwise
append the new entry; modify the code around the append that builds the payload
(the block referencing aq_name, quant_config["quant_cfg"], and the
PercentileCalibrator tuple) to perform a find-and-replace by quantizer_path
rather than always appending.

Comment on lines +36 to +49
"quant_cfg": [
*mtq.config._base_disable_all,
{
"quantizer_path": "*weight_quantizer",
"cfg": {"num_bits": 4, "block_sizes": {-1: 128}},
"enable": True,
},
{
"quantizer_path": "*input_quantizer",
"cfg": {"num_bits": 8, "type": "dynamic", "block_sizes": {-1: None}},
},
# Disable sensitive layers such as `lm_head`, gate layers in MoE etc.
**mtq.config._default_disabled_quantizer_cfg,
},
*mtq.config._default_disabled_quantizer_cfg,
],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Update the BMM-attention detection for list-based quant_cfg.

After switching quant_cfg to a list of entries, _quantize_model_with_dataset()'s later for key in mtq_cfg["quant_cfg"] / "bmm_quantizer" in key check no longer works: iterating the list yields dicts, so that membership test only inspects dict keys. Configs that target *bmm_quantizer now skip register_hf_attentions_on_the_fly(net) and silently miss KV-attention quantization.

💡 Suggested update outside this hunk
-        quantize_bmm_attention = False
-        for key in mtq_cfg["quant_cfg"]:
-            if "bmm_quantizer" in key:
-                quantize_bmm_attention = True
+        quantize_bmm_attention = any(
+            "bmm_quantizer" in entry["quantizer_path"]
+            for entry in mtq.config.normalize_quant_cfg_list(mtq_cfg["quant_cfg"])
+        )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/llm_eval/quantization_utils.py` around lines 36 - 49, The loop that
detects BMM-attention entries fails because mtq_cfg["quant_cfg"] is now a list
of dicts; update _quantize_model_with_dataset to handle both list and dict
formats by inspecting each entry: if an entry is a dict, read its
"quantizer_path" (or keys for legacy string entries) and test whether the path
string contains "bmm_quantizer" or matches "*bmm_quantizer"; if
mtq_cfg["quant_cfg"] is still a dict or contains raw strings, preserve the old
membership check. Ensure this detection is used before calling
register_hf_attentions_on_the_fly(net) so KV-attention quantizers targeted via
"quantizer_path": "*bmm_quantizer" are not skipped.

Comment on lines +183 to +191
"quant_cfg": [
# Disable all quantizers by default
{"quantizer_path": "*", "enable": False},
# "num_bits" specifies the number of bits for quantization
# "axis" specifies the axis for quantization
"*weight_quantizer": {"num_bits": 8, "axis": 0},
"*input_quantizer": {"num_bits": 8, "axis": -1},

# Default quantization settings
"default": {"num_bits": 8, "axis": None},
}
"algorithm": "max"
{"quantizer_path": "*weight_quantizer", "cfg": {"num_bits": 8, "axis": 0}},
{"quantizer_path": "*input_quantizer", "cfg": {"num_bits": 8, "axis": -1}},
],
"algorithm": "max",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Finish updating this quantize() doc block to the list-based schema.

Lines 183-191 show the new list-of-dicts format, but the prose immediately above still describes quant_cfg as a dictionary mapping wildcard keys to attributes. That leaves the public API docs contradictory in the same section.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/quantization/model_quant.py` around lines 183 - 191, The
docstring for quantize() is out of sync: it still describes quant_cfg as a dict
mapping wildcard keys, but the implementation and example use a list-of-dicts
schema. Update the prose in the quantize() doc block to describe quant_cfg as a
list where each item is a dict with keys like "quantizer_path" (wildcard
pattern), optional "enable" (bool), optional "cfg" (dict, e.g., {"num_bits":
int, "axis": int}), and that multiple entries are applied in order; also mention
the top-level "algorithm" field (e.g., "max") and how entries such as
{"quantizer_path":"*weight_quantizer","cfg":{"num_bits":8,"axis":0}} and
{"quantizer_path":"*input_quantizer","cfg":{"num_bits":8,"axis":-1}} are
interpreted by quantize()/quant_cfg.

Comment on lines +50 to +61
for entry in config["quant_cfg"]:
pat = (
entry["quantizer_path"]
if isinstance(entry, dict) and "quantizer_path" in entry
else entry[0]
)
if pat == "*weight_quantizer":
if isinstance(entry, dict) and "quantizer_path" in entry:
entry.setdefault("cfg", {})["block_sizes"] = {-1: block_size}
else:
entry[1]["block_sizes"] = {-1: block_size}
break
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fail fast when *weight_quantizer is missing.

If no matching entry exists, this function currently returns without applying block_sizes, which can silently invalidate downstream test intent.

🔧 Proposed fix
 def get_awq_config(algorithm="awq_lite", block_size=8):
     config = copy.deepcopy(mtq.INT4_AWQ_CFG)
+    found_weight_entry = False
     for entry in config["quant_cfg"]:
         pat = (
             entry["quantizer_path"]
             if isinstance(entry, dict) and "quantizer_path" in entry
             else entry[0]
         )
         if pat == "*weight_quantizer":
+            found_weight_entry = True
             if isinstance(entry, dict) and "quantizer_path" in entry:
                 entry.setdefault("cfg", {})["block_sizes"] = {-1: block_size}
             else:
                 entry[1]["block_sizes"] = {-1: block_size}
             break
+    if not found_weight_entry:
+        raise KeyError("No '*weight_quantizer' entry found in quant_cfg")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/_test_utils/torch/quantization/quantize_common.py` around lines 50 -
61, The loop over config["quant_cfg"] looks for a "*weight_quantizer" entry and
mutates its "block_sizes" but silently does nothing if no match is found; add a
fail-fast check by tracking whether a matching entry was found (e.g., set a
boolean flag when pat == "*weight_quantizer") and after the loop raise an
explicit error (ValueError or AssertionError) if no match was found so tests
fail early rather than silently proceeding without applying block_size to the
intended entry.

Comment on lines +133 to +135
for entry in config["quant_cfg"]:
if entry.get("quantizer_path") == "*weight_quantizer":
entry.setdefault("cfg", {})["block_sizes"] = {-1: 8, -2: 8}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Deep-copy the shared config before patching block_sizes.

config here is the imported mtq.FP8_2D_BLOCKWISE_WEIGHT_ONLY_CFG object. Mutating config["quant_cfg"] in place leaks the reduced block size into later parametrized cases and any other tests that reuse the same constant.

💡 Suggested fix
     if config == mtq.FP8_2D_BLOCKWISE_WEIGHT_ONLY_CFG:
+        import copy
+
+        config = copy.deepcopy(config)
         # reduce block sizes for simple testing models
         for entry in config["quant_cfg"]:
             if entry.get("quantizer_path") == "*weight_quantizer":
                 entry.setdefault("cfg", {})["block_sizes"] = {-1: 8, -2: 8}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
for entry in config["quant_cfg"]:
if entry.get("quantizer_path") == "*weight_quantizer":
entry.setdefault("cfg", {})["block_sizes"] = {-1: 8, -2: 8}
if config == mtq.FP8_2D_BLOCKWISE_WEIGHT_ONLY_CFG:
import copy
config = copy.deepcopy(config)
# reduce block sizes for simple testing models
for entry in config["quant_cfg"]:
if entry.get("quantizer_path") == "*weight_quantizer":
entry.setdefault("cfg", {})["block_sizes"] = {-1: 8, -2: 8}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/gpu/torch/quantization/test_quantize_cuda.py` around lines 133 - 135,
The test is mutating the shared constant mtq.FP8_2D_BLOCKWISE_WEIGHT_ONLY_CFG by
editing config["quant_cfg"] in place; instead, create a deep copy of that
constant (e.g., via copy.deepcopy) into the local variable config before
patching so changes to entry.setdefault("cfg", {})["block_sizes"] = {-1: 8, -2:
8} only affect this test; locate the code that assigns config from
mtq.FP8_2D_BLOCKWISE_WEIGHT_ONLY_CFG and replace it with a deep-copied version
so modifications to quant_cfg/ block_sizes/quantizer_path do not leak to other
parametrized cases.

Comment on lines +492 to +507
assert isinstance(config["quant_cfg"], list)
assert any(
(
entry["quantizer_path"]
if isinstance(entry, dict) and "quantizer_path" in entry
else entry[0]
)
== "*"
and (
entry.get("enable")
if isinstance(entry, dict) and "quantizer_path" in entry
else entry[1].get("enable")
)
is False
for entry in config["quant_cfg"]
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Keep this assertion strict to QuantizerCfgEntry dicts.

The fallback to entry[0] / entry[1] means this test now accepts tuple-style payloads, so it no longer protects the new list-of-dicts contract from regressions. Assert on dict entries directly.

🧪 Proposed fix
-    assert any(
-        (
-            entry["quantizer_path"]
-            if isinstance(entry, dict) and "quantizer_path" in entry
-            else entry[0]
-        )
-        == "*"
-        and (
-            entry.get("enable")
-            if isinstance(entry, dict) and "quantizer_path" in entry
-            else entry[1].get("enable")
-        )
-        is False
-        for entry in config["quant_cfg"]
-    )
+    assert all(isinstance(entry, dict) for entry in config["quant_cfg"])
+    assert any(
+        entry.get("quantizer_path") == "*" and entry.get("enable") is False
+        for entry in config["quant_cfg"]
+    )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/torch/quantization/test_autoquant.py` around lines 492 - 507, The
current assertion allows tuple-style payloads by falling back to
entry[0]/entry[1]; restrict the check to dict-style QuantizerCfgEntry entries
only by iterating over config["quant_cfg"] and filtering with isinstance(entry,
dict), then assert that some dict has entry.get("quantizer_path") == "*" and
entry.get("enable") is False; update the assertion around the quant_cfg check in
tests/unit/torch/quantization/test_autoquant.py accordingly (referencing
config["quant_cfg"] and the dict keys "quantizer_path" and "enable").

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
@shengliangxu shengliangxu force-pushed the shengliangx/quant_cfg-list branch from 5971a7b to fe2d2f3 Compare March 23, 2026 07:51
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
modelopt/torch/quantization/config.py (1)

1622-1628: ⚠️ Potential issue | 🟠 Major

Legacy top-level dict format bypasses normalization.

The before validator returns non-list inputs unchanged (line 1626-1627). If a user passes a legacy dict-style quant_cfg like {"*weight_quantizer": {"num_bits": 8}} directly (not wrapped in a list), it will not be normalized. This can cause need_calibration() and downstream consumers to fail when iterating, as they expect list entries.

Consider converting top-level dict inputs to list form before calling normalize_quant_cfg_list():

Suggested fix
 `@field_validator`("quant_cfg", mode="before")
 `@classmethod`
 def normalize_quant_cfg(cls, v):
     """Normalize quant_cfg entries: convert dict and tuple forms to QuantizerCfgEntry dicts."""
+    if isinstance(v, dict):
+        # Convert legacy top-level dict format to list of single-key dicts
+        v = [{k: val} for k, val in v.items()]
     if not isinstance(v, list):
         return v
     return normalize_quant_cfg_list(v)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/quantization/config.py` around lines 1622 - 1628, The
before-validator normalize_quant_cfg currently returns non-list inputs
unchanged, so legacy top-level dict forms for quant_cfg bypass normalization;
update normalize_quant_cfg (the `@field_validator` for "quant_cfg") to detect a
dict input (e.g., mapping of pattern->entry) and wrap/convert it into a one-item
list (or otherwise translate into the list form expected by
normalize_quant_cfg_list) before calling normalize_quant_cfg_list, then return
the normalized list so downstream functions like need_calibration() always
receive a list of QuantizerCfgEntry dicts.
🧹 Nitpick comments (2)
modelopt/torch/quantization/config.py (2)

1630-1639: Consider validating list-form cfg entries as well.

The validator validates dict cfg values against QuantizerAttributeConfig, but skips list cfg (used for SequentialQuantizer). While downstream processing may catch errors, early validation would surface misconfigurations sooner.

Optional enhancement
 `@field_validator`("quant_cfg", mode="after")
 `@classmethod`
 def validate_quant_cfg_entries(cls, v):
     """Validate quantizer attribute configs to surface errors (e.g. invalid axis/block_sizes)."""
     qac_fields = set(QuantizerAttributeConfig.model_fields.keys())
     for entry in v:
         cfg = entry.get("cfg", {})
         if isinstance(cfg, dict) and qac_fields & set(cfg.keys()):
             QuantizerAttributeConfig.model_validate(cfg)
+        elif isinstance(cfg, list):
+            for sub_cfg in cfg:
+                if isinstance(sub_cfg, dict) and qac_fields & set(sub_cfg.keys()):
+                    QuantizerAttributeConfig.model_validate(sub_cfg)
     return v
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/quantization/config.py` around lines 1630 - 1639, The
validator validate_quant_cfg_entries currently only validates dict cfg values
against QuantizerAttributeConfig but ignores list-form cfg (used by
SequentialQuantizer); update validate_quant_cfg_entries to also handle when
entry.get("cfg") is a list by iterating over its items and, for each item that
is a dict and contains keys in QuantizerAttributeConfig.model_fields, call
QuantizerAttributeConfig.model_validate(item) so each element is validated
early; preserve existing behavior for non-dict/non-list cfgs and return v as
before.

1554-1567: Replace assert with ValueError for clearer error messaging.

The assert at line 1556 will produce an unclear AssertionError if the legacy nn.* format has unexpected structure. A ValueError with a descriptive message would be more helpful for users debugging config issues.

Suggested improvement
     def _dict_to_entry(key: str, value) -> QuantizerCfgEntry:
         if isinstance(key, str) and key.startswith("nn."):
-            assert isinstance(value, dict) and len(value) == 1
+            if not isinstance(value, dict) or len(value) != 1:
+                raise ValueError(
+                    f"Legacy nn.*-scoped entry expects a single-key dict value, got: {value!r}"
+                )
             q_path, sub_cfg = next(iter(value.items()))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/quantization/config.py` around lines 1554 - 1567, In
_dict_to_entry, replace the brittle assert inside the branch that handles legacy
"nn." keys with a ValueError that gives a clear message; specifically in the
function _dict_to_entry where you check if key.startswith("nn."), validate that
value is a dict with exactly one item and if not raise ValueError("invalid
legacy nn.* quantizer entry: expected a dict with a single mapping from
quantizer path to config, got: {repr(value)}") (include the offending value in
the message) so callers get a descriptive error instead of an AssertionError.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@modelopt/torch/quantization/config.py`:
- Around line 1677-1699: The loop over quant_cfg can silently mis-index string
keys when an unnormalized dict is present; update the logic in the loop that
determines name/cfg (the code using entry["quantizer_path"] vs
entry[0]/entry[1]) to explicitly check types: if entry is a dict require the
"quantizer_path" key and use entry.get("cfg"), otherwise require entry to be a
sequence/tuple/list of length >= 2 before using entry[0] and entry[1]; if the
input is neither a valid dict nor a valid 2+ element sequence, raise a clear
ValueError (or log and skip) so _not_dynamic and downstream logic never index
into a string or malformed structure. Ensure you reference quant_cfg, entry,
name, cfg, and _not_dynamic when implementing the checks.

---

Duplicate comments:
In `@modelopt/torch/quantization/config.py`:
- Around line 1622-1628: The before-validator normalize_quant_cfg currently
returns non-list inputs unchanged, so legacy top-level dict forms for quant_cfg
bypass normalization; update normalize_quant_cfg (the `@field_validator` for
"quant_cfg") to detect a dict input (e.g., mapping of pattern->entry) and
wrap/convert it into a one-item list (or otherwise translate into the list form
expected by normalize_quant_cfg_list) before calling normalize_quant_cfg_list,
then return the normalized list so downstream functions like need_calibration()
always receive a list of QuantizerCfgEntry dicts.

---

Nitpick comments:
In `@modelopt/torch/quantization/config.py`:
- Around line 1630-1639: The validator validate_quant_cfg_entries currently only
validates dict cfg values against QuantizerAttributeConfig but ignores list-form
cfg (used by SequentialQuantizer); update validate_quant_cfg_entries to also
handle when entry.get("cfg") is a list by iterating over its items and, for each
item that is a dict and contains keys in QuantizerAttributeConfig.model_fields,
call QuantizerAttributeConfig.model_validate(item) so each element is validated
early; preserve existing behavior for non-dict/non-list cfgs and return v as
before.
- Around line 1554-1567: In _dict_to_entry, replace the brittle assert inside
the branch that handles legacy "nn." keys with a ValueError that gives a clear
message; specifically in the function _dict_to_entry where you check if
key.startswith("nn."), validate that value is a dict with exactly one item and
if not raise ValueError("invalid legacy nn.* quantizer entry: expected a dict
with a single mapping from quantizer path to config, got: {repr(value)}")
(include the offending value in the message) so callers get a descriptive error
instead of an AssertionError.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: a4a13435-75c6-4c01-93a8-09afd75e09ad

📥 Commits

Reviewing files that changed from the base of the PR and between 5115452 and 5971a7b.

📒 Files selected for processing (2)
  • modelopt/torch/quantization/config.py
  • modelopt/torch/quantization/utils/core_utils.py

Comment on lines +1677 to 1699
quant_cfg: list = config.get("quant_cfg") or []
for entry in quant_cfg:
name = (
entry["quantizer_path"]
if isinstance(entry, dict) and "quantizer_path" in entry
else entry[0]
)
if isinstance(entry, dict) and "quantizer_path" in entry:
cfg = dict(entry.get("cfg") or {})
if "enable" in entry:
cfg["enable"] = entry["enable"]
else:
cfg = entry[1]
if "weight_quantizer" in name:
# We don't calibrate weight quantizer
continue
# quantization like W4A8 has a list of weight quantizers
if isinstance(cfg, list):
for _config in cfg:
if _not_dynamic(_config):
print(f"{cfg}: True")
return True
elif _not_dynamic(cfg):
print(f"{cfg}: True")
elif isinstance(cfg, dict) and _not_dynamic(cfg):
return True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fallback logic for tuple format may mask errors from unnormalized dict inputs.

The fallback to entry[0]/entry[1] (lines 1682, 1689) is intended for legacy tuple format support. However, if a top-level dict bypasses the validator normalization, iterating over it yields string keys, and entry[0]/entry[1] would index into string characters (e.g., "*"[0]"*"), leading to silent misbehavior rather than a clear error.

Consider adding explicit type checking or removing the tuple fallback if tuples are no longer a supported input format:

Suggested defensive check
 quant_cfg: list = config.get("quant_cfg") or []
 for entry in quant_cfg:
+    if not isinstance(entry, dict):
+        raise TypeError(f"Expected dict entry in quant_cfg, got {type(entry).__name__}: {entry!r}")
     name = (
         entry["quantizer_path"]
         if isinstance(entry, dict) and "quantizer_path" in entry
         else entry[0]
     )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/quantization/config.py` around lines 1677 - 1699, The loop
over quant_cfg can silently mis-index string keys when an unnormalized dict is
present; update the logic in the loop that determines name/cfg (the code using
entry["quantizer_path"] vs entry[0]/entry[1]) to explicitly check types: if
entry is a dict require the "quantizer_path" key and use entry.get("cfg"),
otherwise require entry to be a sequence/tuple/list of length >= 2 before using
entry[0] and entry[1]; if the input is neither a valid dict nor a valid 2+
element sequence, raise a clear ValueError (or log and skip) so _not_dynamic and
downstream logic never index into a string or malformed structure. Ensure you
reference quant_cfg, entry, name, cfg, and _not_dynamic when implementing the
checks.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
modelopt/torch/quantization/config.py (2)

1677-1699: ⚠️ Potential issue | 🟡 Minor

Fallback logic for tuple format may mask errors from unnormalized dict inputs.

The fallback to entry[0]/entry[1] (lines 1682, 1689) is intended for legacy tuple format. However, if a top-level dict bypasses the validator normalization, iterating over it yields string keys, and entry[0]/entry[1] would index into string characters (e.g., "*weight_quantizer"[0]"*"), causing silent misbehavior.

Suggested defensive check
 quant_cfg: list = config.get("quant_cfg") or []
 for entry in quant_cfg:
+    if not isinstance(entry, (dict, tuple, list)):
+        raise TypeError(f"Expected dict or tuple entry in quant_cfg, got {type(entry).__name__}: {entry!r}")
     name = (
         entry["quantizer_path"]
         if isinstance(entry, dict) and "quantizer_path" in entry
         else entry[0]
     )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/quantization/config.py` around lines 1677 - 1699, The loop
over quant_cfg can mis-index string keys when a dict slips through the legacy
tuple/path fallback; update the logic in the quant_cfg iteration (around
variables entry, name, cfg) to first check types explicitly: if entry is a dict
use entry["quantizer_path"] and entry.get("cfg"), else if entry is a sequence
(tuple/list) with length >= 2 use entry[0]/entry[1]; otherwise raise a clear
ValueError or skip with a logged warning. Ensure the branch that checks
"weight_quantizer" still uses the computed name and that _not_dynamic is only
called with validated cfg objects.

1622-1628: ⚠️ Potential issue | 🟠 Major

Legacy dict-style quant_cfg is not normalized by the before-validator.

The validator returns non-list inputs unchanged (line 1626-1627), so a legacy dict format like {"*weight_quantizer": {"num_bits": 8}} bypasses normalize_quant_cfg_list() entirely. If such a dict reaches downstream code expecting a list, it will fail.

Since normalize_quant_cfg_list() already supports converting legacy dict formats via _dict_to_entry(), the validator should route dicts to it:

Suggested fix
 `@field_validator`("quant_cfg", mode="before")
 `@classmethod`
 def normalize_quant_cfg(cls, v):
     """Normalize quant_cfg entries: convert dict and tuple forms to QuantizerCfgEntry dicts."""
-    if not isinstance(v, list):
-        return v
-    return normalize_quant_cfg_list(v)
+    if isinstance(v, dict):
+        # Legacy top-level dict format: convert to list of single-key dicts
+        v = [{k: val} for k, val in v.items()]
+    if isinstance(v, list):
+        return normalize_quant_cfg_list(v)
+    return v
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/quantization/config.py` around lines 1622 - 1628, The
before-validator normalize_quant_cfg currently returns non-list values
unchanged, so legacy dict-style quant_cfg like {"*weight_quantizer":
{"num_bits": 8}} bypasses normalization; update normalize_quant_cfg to detect
dict inputs and pass them into normalize_quant_cfg_list (which uses
_dict_to_entry) so all legacy dict formats are converted into the expected list
of QuantizerCfgEntry objects—i.e., in normalize_quant_cfg (the `@field_validator`
for "quant_cfg") call normalize_quant_cfg_list(v) for dicts as well as lists so
downstream code always receives a normalized list.
🧹 Nitpick comments (1)
modelopt/torch/quantization/config.py (1)

162-168: TypedDict total=False conflicts with required quantizer_path documentation.

The docstring states quantizer_path is required, but total=False makes all fields optional at the type level. Consider using typing.Required from typing_extensions to mark quantizer_path as required while keeping other fields optional:

-class QuantizerCfgEntry(TypedDict, total=False):
+from typing_extensions import Required
+
+class QuantizerCfgEntry(TypedDict, total=False):
     """A single entry in a ``quant_cfg`` list."""
 
-    quantizer_path: str  # required; matched against quantizer module names
+    quantizer_path: Required[str]  # required; matched against quantizer module names
     parent_class: str | None  # optional; filters by pytorch module class name (e.g. "nn.Linear")

This ensures static type checkers flag missing quantizer_path while allowing other fields to be omitted.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/quantization/config.py` around lines 162 - 168,
QuantizerCfgEntry currently uses TypedDict(total=False) which makes all keys
optional but the docstring says quantizer_path is required; update
QuantizerCfgEntry to mark quantizer_path as required using
typing_extensions.Required (or typing.Required on newer Python): change the
TypedDict definition so quantizer_path: Required[str] while keeping
parent_class, cfg, and enable as optional, and import Required from
typing_extensions if not available in typing to ensure static type-checkers
enforce the required key.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@modelopt/torch/quantization/utils/core_utils.py`:
- Around line 830-849: The function update_quant_cfg_with_kv_cache_quant can
raise a TypeError if quant_cfg.get("quant_cfg") returns a legacy dict instead of
a list; change it to defensively normalize inner by checking the type of
quant_cfg.get("quant_cfg") and converting a dict into a single-item list (or
treating None/Falsey as the existing default list), then set
quant_cfg["quant_cfg"] = inner + list(kv_cache_quant_cfg) and return the
deep-copied quant_cfg; update the logic around the inner variable in
update_quant_cfg_with_kv_cache_quant to handle list, dict, and None cases
explicitly.

---

Duplicate comments:
In `@modelopt/torch/quantization/config.py`:
- Around line 1677-1699: The loop over quant_cfg can mis-index string keys when
a dict slips through the legacy tuple/path fallback; update the logic in the
quant_cfg iteration (around variables entry, name, cfg) to first check types
explicitly: if entry is a dict use entry["quantizer_path"] and entry.get("cfg"),
else if entry is a sequence (tuple/list) with length >= 2 use entry[0]/entry[1];
otherwise raise a clear ValueError or skip with a logged warning. Ensure the
branch that checks "weight_quantizer" still uses the computed name and that
_not_dynamic is only called with validated cfg objects.
- Around line 1622-1628: The before-validator normalize_quant_cfg currently
returns non-list values unchanged, so legacy dict-style quant_cfg like
{"*weight_quantizer": {"num_bits": 8}} bypasses normalization; update
normalize_quant_cfg to detect dict inputs and pass them into
normalize_quant_cfg_list (which uses _dict_to_entry) so all legacy dict formats
are converted into the expected list of QuantizerCfgEntry objects—i.e., in
normalize_quant_cfg (the `@field_validator` for "quant_cfg") call
normalize_quant_cfg_list(v) for dicts as well as lists so downstream code always
receives a normalized list.

---

Nitpick comments:
In `@modelopt/torch/quantization/config.py`:
- Around line 162-168: QuantizerCfgEntry currently uses TypedDict(total=False)
which makes all keys optional but the docstring says quantizer_path is required;
update QuantizerCfgEntry to mark quantizer_path as required using
typing_extensions.Required (or typing.Required on newer Python): change the
TypedDict definition so quantizer_path: Required[str] while keeping
parent_class, cfg, and enable as optional, and import Required from
typing_extensions if not available in typing to ensure static type-checkers
enforce the required key.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: ea0ca383-fd43-415d-8bef-252ce66fbfc5

📥 Commits

Reviewing files that changed from the base of the PR and between 5971a7b and fe2d2f3.

📒 Files selected for processing (2)
  • modelopt/torch/quantization/config.py
  • modelopt/torch/quantization/utils/core_utils.py

Comment on lines 830 to +849
def update_quant_cfg_with_kv_cache_quant(
quant_cfg: dict[str, Any], kv_cache_quant_cfg: dict[str, Any]
quant_cfg: dict[str, Any], kv_cache_quant_cfg: list[QuantizerCfgEntry]
) -> dict[str, Any]:
"""Update the quant_cfg with the kv cache quant_cfg."""
"""Update the quant_cfg with the kv cache quant_cfg.

Args:
quant_cfg: The outer quantization config dict (with ``"quant_cfg"`` and ``"algorithm"`` keys).
kv_cache_quant_cfg: A list of :class:`QuantizerCfgEntry
<modelopt.torch.quantization.config.QuantizerCfgEntry>` dicts for KV cache quantization,
typically ``some_kv_cfg["quant_cfg"]``.

Returns:
A deep copy of ``quant_cfg`` with the KV cache entries appended to ``quant_cfg["quant_cfg"]``.
"""
# If quant_cfg["quant_cfg"] is None, it corresponds to only kv cache quantization case
quant_cfg = copy.deepcopy(quant_cfg)
quant_cfg["quant_cfg"] = quant_cfg.get("quant_cfg") or {"default": {"enable": False}}
quant_cfg["quant_cfg"].update(kv_cache_quant_cfg)
inner: list[QuantizerCfgEntry] = quant_cfg.get("quant_cfg") or [
{"quantizer_path": "*", "enable": False}
]
quant_cfg["quant_cfg"] = inner + list(kv_cache_quant_cfg)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Potential issue if quant_cfg["quant_cfg"] is a legacy dict format.

The function assumes quant_cfg.get("quant_cfg") returns a list (or falsey value). If a caller passes an unnormalized config where quant_cfg["quant_cfg"] is still a legacy dict, line 849 would raise TypeError when attempting dict + list.

Consider normalizing the inner config or adding a type check:

Suggested defensive fix
+from modelopt.torch.quantization.config import normalize_quant_cfg_list
+
 def update_quant_cfg_with_kv_cache_quant(
     quant_cfg: dict[str, Any], kv_cache_quant_cfg: list[QuantizerCfgEntry]
 ) -> dict[str, Any]:
     ...
     quant_cfg = copy.deepcopy(quant_cfg)
-    inner: list[QuantizerCfgEntry] = quant_cfg.get("quant_cfg") or [
-        {"quantizer_path": "*", "enable": False}
-    ]
+    raw_inner = quant_cfg.get("quant_cfg")
+    if raw_inner is None:
+        inner = [{"quantizer_path": "*", "enable": False}]
+    elif isinstance(raw_inner, dict):
+        # Legacy dict format - convert to list
+        inner = normalize_quant_cfg_list([{k: v} for k, v in raw_inner.items()])
+    else:
+        inner = list(raw_inner)
     quant_cfg["quant_cfg"] = inner + list(kv_cache_quant_cfg)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/quantization/utils/core_utils.py` around lines 830 - 849, The
function update_quant_cfg_with_kv_cache_quant can raise a TypeError if
quant_cfg.get("quant_cfg") returns a legacy dict instead of a list; change it to
defensively normalize inner by checking the type of quant_cfg.get("quant_cfg")
and converting a dict into a single-item list (or treating None/Falsey as the
existing default list), then set quant_cfg["quant_cfg"] = inner +
list(kv_cache_quant_cfg) and return the deep-copied quant_cfg; update the logic
around the inner variable in update_quant_cfg_with_kv_cache_quant to handle
list, dict, and None cases explicitly.

@github-actions
Copy link
Contributor

PR Preview Action v1.8.1

QR code for preview link

🚀 View preview at
https://NVIDIA.github.io/Model-Optimizer/pr-preview/pr-1094/

Built to branch gh-pages at 2026-03-23 20:23 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

Comment on lines +128 to +133
# 1. Deny all quantizers by default
{"quantizer_path": "*", "enable": False},
# 2. Enable and configure the target quantizers
{"quantizer_path": "*weight_quantizer", "cfg": {"num_bits": 8, "axis": 0}},
{"quantizer_path": "*input_quantizer", "cfg": {"num_bits": 8, "axis": None}},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# 1. Deny all quantizers by default
{"quantizer_path": "*", "enable": False},
# 2. Enable and configure the target quantizers
{"quantizer_path": "*weight_quantizer", "cfg": {"num_bits": 8, "axis": 0}},
{"quantizer_path": "*input_quantizer", "cfg": {"num_bits": 8, "axis": None}},
# 1. Deny all quantizers by default
{"*", {"enable": False}},
# 2. Enable and configure the target quantizers
{"*weight_quantizer": {"num_bits": 8, "axis": 0}},
{"*input_quantizer": {"num_bits": 8, "axis": None}},

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need the following fields "quantizer_path", "cfg"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants