[OMNIML-3689] PTQ quant_cfg semantic correction. Design in doc _quant_cfg.rst#1094
[OMNIML-3689] PTQ quant_cfg semantic correction. Design in doc _quant_cfg.rst#1094shengliangxu wants to merge 28 commits intomainfrom
Conversation
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
📝 WalkthroughWalkthroughRefactors quantization configuration from a dictionary-of-patterns schema to an ordered list-of-entry schema (entries with Changes
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes 🚥 Pre-merge checks | ✅ 3 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
Right now the quant_cfg is a dict, but we are using the quant_cfg as if it is a list. When we apply the quant_cfg, we enumerate the items in the dict and apply the config one by one in modelopt/torch/quantization/conversion.py. This implementation actually has the semantic that the latter configs has higher precedence than the former configs. However, dicts do not have reliable ordering. Therefore, we make quant_cfg a list of patterns: 1. The latter config patterns have higher precedence. A latter config in the list overrides a fomer config if they target the same module. 2. A config to each module is atomic, each config provides the full information. We do not compose a quant module config from multiple config lines Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
…s_partial set_quantizer_attributes_full updates the full quantizer attributes, it has the atomic semantic set_quantizer_attributes_partial updates just a partial set of quantizer attributes, it has the merge semantic Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
192ea05 to
fb3bb07
Compare
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #1094 +/- ##
========================================
Coverage 70.24% 70.25%
========================================
Files 227 227
Lines 25909 26014 +105
========================================
+ Hits 18201 18277 +76
- Misses 7708 7737 +29 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
4f38294 to
5115452
Compare
There was a problem hiding this comment.
Actionable comments posted: 20
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (5)
examples/llm_autodeploy/run_auto_quantize.py (1)
134-143:⚠️ Potential issue | 🔴 CriticalRemove hardcoded
trust_remote_code=Truein HF model and tokenizer loading.Both
AutoModelForCausalLM.from_pretrained()(line 134) andAutoTokenizer.from_pretrained()(line 142) hardcodetrust_remote_code=True, creating a critical RCE vector from untrusted model repositories. Expose this as a user-controlled parameter defaulting toFalsein themodelopt_ptq()function signature.🔒 Proposed fix
def modelopt_ptq( model_path: str, output_dir: str, qformat: str | None = None, num_samples: int = 512, auto_quantize_bits: float | None = None, calib_dataset: str = "cnn_dailymail", calib_batch_size: int = 8, + trust_remote_code: bool = False, ) -> torch.nn.Module: """Quantize the model with modelopt.""" model = AutoModelForCausalLM.from_pretrained( - model_path, trust_remote_code=True, torch_dtype="auto", device_map="auto" + model_path, + trust_remote_code=trust_remote_code, + torch_dtype="auto", + device_map="auto", ) model.eval() tokenizer = AutoTokenizer.from_pretrained( model_path, model_max_length=2048, padding_side="left", - trust_remote_code=True, + trust_remote_code=trust_remote_code, )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/llm_autodeploy/run_auto_quantize.py` around lines 134 - 143, The code currently hardcodes trust_remote_code=True when calling AutoModelForCausalLM.from_pretrained and AutoTokenizer.from_pretrained, creating an RCE risk; update the modelopt_ptq(...) function signature to accept a new parameter (e.g., trust_remote_code: bool = False) and replace the hardcoded True with that parameter for both AutoModelForCausalLM.from_pretrained and AutoTokenizer.from_pretrained so callers can opt-in explicitly while defaulting to False.examples/llm_ptq/example_utils.py (1)
208-247:⚠️ Potential issue | 🟠 MajorNormalize
quant_cfg["quant_cfg"]before mutating it.Lines 210 and 244-247 now assume the new list schema. If a caller still passes a legacy dict-style config, the AWQ branch will fail before
mtq.quantize()reaches the backward-compat normalizer, and thephi4mmbranch will hit.append()on a dict. Even in list form, walking from the front can rewrite a shadowed*weight_quantizerrule instead of the effective last match.💡 Suggested fix
+from modelopt.torch.quantization.config import normalize_quant_cfg_list @@ def build_quant_cfg( @@ ) -> dict[str, Any]: quant_cfg = copy.deepcopy(quant_cfg) + quant_cfg["quant_cfg"] = normalize_quant_cfg_list(quant_cfg["quant_cfg"]) if "awq" in str(quant_cfg.get("algorithm")): weight_quantizer_entry = next( e - for e in quant_cfg["quant_cfg"] - if isinstance(e, dict) and e.get("quantizer_path") == "*weight_quantizer" + for e in reversed(quant_cfg["quant_cfg"]) + if e.get("quantizer_path") == "*weight_quantizer" ) weight_quantizer = weight_quantizer_entry.get("cfg", {})🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/llm_ptq/example_utils.py` around lines 208 - 247, The code mutates quant_cfg["quant_cfg"] assuming it's a list; normalize quant_cfg["quant_cfg"] to the list schema up front (convert legacy dict form into the new list form) before any mutations so weight_quantizer lookup and .append() calls are safe; when locating the weight quantizer use a reverse-scan (e.g., search from the end) to find the effective "*weight_quantizer" entry referenced by weight_quantizer_entry so awq_block_size handling (weight_quantizer_entry / weight_quantizer) and the model_type == "phi4mm" .append() calls operate on the correct, normalized list.modelopt/onnx/llm_export_utils/quantization_utils.py (1)
57-111:⚠️ Potential issue | 🟠 MajorCopy the preset before rewriting
quant_cfg.Line 111 mutates
mtq.FP8_DEFAULT_CFG/mtq.NVFP4_DEFAULT_CFG/mtq.INT4_AWQ_CFGin place. With list-based rules, repeated calls can now retain or duplicate previously appendedlm_headentries, soget_quant_config()becomes order-dependent across calls in the same process.💡 Suggested fix
+import copy import time @@ if precision == "fp8": - quant_cfg = mtq.FP8_DEFAULT_CFG + quant_cfg = copy.deepcopy(mtq.FP8_DEFAULT_CFG) elif precision == "nvfp4": - quant_cfg = mtq.NVFP4_DEFAULT_CFG + quant_cfg = copy.deepcopy(mtq.NVFP4_DEFAULT_CFG) elif precision == "int4_awq": - quant_cfg = mtq.INT4_AWQ_CFG + quant_cfg = copy.deepcopy(mtq.INT4_AWQ_CFG)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/onnx/llm_export_utils/quantization_utils.py` around lines 57 - 111, get_quant_config currently mutates the shared presets mtq.FP8_DEFAULT_CFG / mtq.NVFP4_DEFAULT_CFG / mtq.INT4_AWQ_CFG in place which causes duplicated lm_head entries on repeated calls; fix by making a deep copy of the chosen preset (e.g., import copy and do quant_cfg = copy.deepcopy(mtq.FP8_DEFAULT_CFG) / copy.deepcopy(mtq.NVFP4_DEFAULT_CFG) / copy.deepcopy(mtq.INT4_AWQ_CFG) inside get_quant_config) before building quant_cfg_list and appending lm_head entries so the original mtq presets remain unchanged.modelopt/torch/quantization/conversion.py (1)
423-454:⚠️ Potential issue | 🟠 MajorReject sequential cfgs here or restore full module topology.
This context manager snapshots only
TensorQuantizerproperties. If a temporary config contains a listcfg,set_quantizer_by_cfg()can replace aTensorQuantizerwith aSequentialQuantizer, and the exit path never converts it back. The docstring already says sequential cfg lists are unsupported here, but that isn’t enforced.🧯 Suggested fix
quant_cfg = normalize_quant_cfg_list(quant_cfg) + if any(isinstance(entry["cfg"], list) for entry in quant_cfg): + raise ValueError("set_quantizer_by_cfg_context does not support sequential cfg lists.") original_attributes = {}modelopt/torch/quantization/config.py (1)
541-545:⚠️ Potential issue | 🔴 CriticalRemove
enablefrom_nvfp4_cfg_bs32.The nested
"enable": Truein_nvfp4_cfg_bs32(line 545) causes a duplicate keyword argument error at runtime. When_nvfp4_selective_quant_cfg()passes this dict as thecfgfield inQuantizerCfgEntry(line 559), the conversion code later unpacks it withQuantizerAttributeConfig(**cfg, enable=enable)(conversion.py line 266). This passesenabletwice and raises TypeError before conversion can complete.Suggested fix
_nvfp4_cfg_bs32 = { "num_bits": (2, 1), "block_sizes": {-1: 32, "type": "dynamic", "scale_bits": (4, 3)}, - "enable": True, }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/quantization/config.py` around lines 541 - 545, The dict _nvfp4_cfg_bs32 contains an internal "enable" key which causes a duplicate keyword when later unpacked into QuantizerAttributeConfig; remove the "enable" entry from _nvfp4_cfg_bs32 so it only contains its attribute settings (num_bits, block_sizes, scale_bits/type entries) and let the caller (_nvfp4_selective_quant_cfg -> QuantizerCfgEntry) supply the enable flag separately; specifically edit the _nvfp4_cfg_bs32 constant to drop the "enable": True key so that conversion code that does QuantizerAttributeConfig(**cfg, enable=enable) no longer passes enable twice.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/source/guides/_pytorch_quantization.rst`:
- Around line 260-272: The example references _default_disabled_quantizer_cfg
but doesn't define or import it; fix by either adding an import for it (e.g.,
from mtq.config import _default_disabled_quantizer_cfg) at the top of the
snippet or qualify the reference inline as
mtq.config._default_disabled_quantizer_cfg so the example is self-contained and
runnable; update the code block containing "quant_cfg" to use the qualified name
or add the import near the example.
In `@docs/source/guides/_quant_cfg.rst`:
- Around line 123-143: The docs currently state that _base_disable_all (a.k.a.
the deny-all entry) is prepended to every built-in config; update the text to
accurately say that the deny-all entry (referenced as _base_disable_all or
mtq.config._default_disabled_quantizer_cfg) is included in most built-in configs
but not all, and call out that some specialized recipes (e.g., the KV-only
configs in the quantization config set) intentionally omit the deny-all prelude;
adjust the example wording and the note to avoid the absolute "every built-in
config" claim and add a short parenthetical pointer to the specialized recipes
that omit it (referencing their config names such as the KV-only configs).
In `@examples/diffusers/quantization/config.py`:
- Around line 106-109: The loop that injects trt_high_precision_dtype into
quant_config["quant_cfg"] only handles when cfg is a dict (variable p) and skips
the case where cfg is a list of attribute dicts; update the logic in the loop
that iterates quant_config["quant_cfg"] to detect if p is a list and, when so,
iterate its elements and for each element that is a dict and contains "num_bits"
but not "trt_high_precision_dtype", set element["trt_high_precision_dtype"] =
trt_high_precision_dtype; keep the existing dict-handling branch for backward
compatibility so both dict and list cfg forms are supported.
- Around line 128-147: The current append in reset_set_int8_config causes
duplicate entries in quant_config["quant_cfg"]; instead, make the operation
idempotent by searching quant_config["quant_cfg"] for an existing dict whose
"quantizer_path" equals aq_name and replace that dict in-place (or update its
"cfg") if found, otherwise append the new entry; modify the code around the
append that builds the payload (the block referencing aq_name,
quant_config["quant_cfg"], and the PercentileCalibrator tuple) to perform a
find-and-replace by quantizer_path rather than always appending.
In `@examples/llm_eval/quantization_utils.py`:
- Around line 36-49: The loop that detects BMM-attention entries fails because
mtq_cfg["quant_cfg"] is now a list of dicts; update _quantize_model_with_dataset
to handle both list and dict formats by inspecting each entry: if an entry is a
dict, read its "quantizer_path" (or keys for legacy string entries) and test
whether the path string contains "bmm_quantizer" or matches "*bmm_quantizer"; if
mtq_cfg["quant_cfg"] is still a dict or contains raw strings, preserve the old
membership check. Ensure this detection is used before calling
register_hf_attentions_on_the_fly(net) so KV-attention quantizers targeted via
"quantizer_path": "*bmm_quantizer" are not skipped.
In `@examples/llm_ptq/hf_ptq.py`:
- Line 332: The code is using attribute access on plain dicts in
_default_disabled_quantizer_cfg which will raise AttributeError; update the
disabled_layers construction to access the dict key "quantizer_path" for each
entry (i.e., replace entry.quantizer_path with entry["quantizer_path"]) so that
disabled_layers is built from the correct dictionary key.
In `@examples/llm_ptq/notebooks/2_PTQ_AWQ_Calibration.ipynb`:
- Around line 193-198: The code fails because quant_cfg["quant_cfg"] contains
dict entries, not 2-tuples; change the extraction to find the dict whose
quantizer_path (or appropriate key) equals "*weight_quantizer" and then take its
"cfg" field (e.g. use next(entry["cfg"] for entry in quant_cfg["quant_cfg"] if
entry.get("quantizer_path") == "*weight_quantizer")), then handle the case where
weight_quantizer is a list as before and set weight_quantizer["block_sizes"][-1]
= 128; update references to quant_cfg, weight_quantizer, and
quant_cfg["quant_cfg"] accordingly.
In `@examples/vllm_serve/fakequant_worker.py`:
- Around line 173-184: The code currently finds only the first kv rule
(kv_entry) and appends minimal clones at the end losing order and full fields;
instead iterate over kv_quant_cfg with indexes, find every entry where
isinstance(e, dict) and e.get("quantizer_path") == "*[kv]_bmm_quantizer", and
for each matching entry insert (not append) two full deep-copied clones of that
entire entry right after the original entry with quantizer_path set to
"*kv_c_bmm_quantizer" and "*k_pe_bmm_quantizer" respectively so you preserve
list order and all other fields (enable, parent_class, etc.) — operate on
kv_quant_cfg and use copy.deepcopy to clone the whole entry before changing
quantizer_path.
In
`@examples/windows/torch_onnx/diffusers/qad_example/sample_example_qad_diffusers.py`:
- Around line 260-271: The quant_cfg list currently uses tuple entries and
embeds "enable" inside the quantizer config dict (_nvfp4_cfg), which fails
validation in normalize_quant_cfg_list(); change quant_cfg to a list of
QuantizerCfgEntry dicts where each entry is e.g. {"quantizer_path": "<pattern>",
"cfg": <cfg_dict>} for enabled entries (move "enable" out of _nvfp4_cfg) and
{"quantizer_path": "<pattern>", "enable": False} for disabled patterns (use
SENSITIVE_LAYER_PATTERNS and the f"*transformer_blocks.{i}.*" patterns for
exclude_blocks), ensuring _nvfp4_cfg no longer contains a top-level "enable" key
and only used as the "cfg" value.
In `@modelopt/torch/quantization/algorithms.py`:
- Around line 1333-1344: The loop in get_auto_quantize_config is calling
_match_quantizer_cfg with only the short names
("input_quantizer"/"weight_quantizer"), so path- or parent_class-scoped
selectors never match and enable-only matches are dropped; change the call to
pass the fully qualified quantizer name (e.g.,
f"{module_name}.{quantizer_attr}") to _match_quantizer_cfg and ensure the caller
appends an entry when either matched_cfg is not None or matched_enable is True
(so enable-only matches are preserved); apply the same fix for the analogous
block that starts at the later occurrence (the other loop referencing
_match_quantizer_cfg, module_name, and quantizer_attr).
- Around line 96-98: The code currently drops the entry-level "enable" flag and
estimates compression from every entry's "cfg"; change the selection to only
include cfgs from QuantizerCfgEntry items where e.get("enable", True) is truthy
and the "cfg" is not None. Concretely, in the block using quant_cfg.quant_cfg
and calling estimate_quant_compression_for_quantizer, filter entries by
e.get("enable", True) before extracting e.get("cfg") (and still skip None cfgs)
so estimate_quant_compression_for_quantizer receives only enabled cfgs.
In `@modelopt/torch/quantization/backends/fp8_per_tensor_gemm.py`:
- Around line 100-116: The availability probe currently uses next(...) without a
default and assertions on input_cfg/weight_cfg which can raise
StopIteration/AssertionError; change the two next(...) calls that produce
input_cfg and weight_cfg to include a default (e.g., next(..., None)) and
replace the assert isinstance(...) checks with explicit conditional checks that
return False from the probe when input_cfg or weight_cfg is missing or not a
dict; ensure you reference FP8_DEFAULT_CFG["quant_cfg"], quant_cfg_list,
input_cfg and weight_cfg in the probe logic so malformed or missing default
entries cause the function to return False instead of throwing.
In `@modelopt/torch/quantization/backends/nvfp4_gemm.py`:
- Around line 216-231: The availability check currently uses next(...) without
defaults and asserts which can raise instead of returning False; update the
_nvfp4_availability_check implementation to retrieve input_cfg and weight_cfg
from quant_cfg_list using next(..., None) (or equivalent), verify they are dicts
with proper "cfg" entries and that their "quantizer_path" matches
"*input_quantizer" / "*weight_quantizer", and if either is missing or not a dict
return False instead of asserting or allowing StopIteration; reference the
symbols quant_cfg_list, input_cfg, weight_cfg and the existing next(...)/assert
lines to locate and change the lookup and validation to defensive checks that
return False on malformed/missing configs.
In `@modelopt/torch/quantization/config.py`:
- Around line 1625-1631: The before-validator normalize_quant_cfg currently
returns non-list inputs unchanged causing legacy dict-style quant_cfg to bypass
normalization; update normalize_quant_cfg to detect dict inputs and pass them
through normalize_quant_cfg_list (which handles dict -> entry conversion via
_dict_to_entry) so legacy {"*weight_quantizer": {...}} shapes are converted to
the expected list form; ensure this change preserves existing behavior for lists
and other non-dict types so downstream callers like need_calibration (which
iterates entries) receive normalized entry lists.
In `@modelopt/torch/quantization/conversion.py`:
- Around line 341-356: The current logic in _match_quantizer branch mutates
existing SequentialQuantizer instances instead of fully replacing them when the
incoming attributes change shape; to fix, ensure full replacement instead of
partial assignment: when attributes is a list, if module is not a
SequentialQuantizer create a new SequentialQuantizer (as before) OR if module is
a SequentialQuantizer but len(attributes) != len(module) replace the existing
module with a new SequentialQuantizer of the correct length before calling
set_from_attribute_config; conversely, when attributes is not a list and module
is a SequentialQuantizer, replace it with a fresh TensorQuantizer instance and
then call set_from_attribute_config on that new instance (use
quant_model.get_submodule/name.rpartition to locate parent_module and setattr to
swap the module).
In `@modelopt/torch/quantization/model_quant.py`:
- Around line 183-191: The docstring for quantize() is out of sync: it still
describes quant_cfg as a dict mapping wildcard keys, but the implementation and
example use a list-of-dicts schema. Update the prose in the quantize() doc block
to describe quant_cfg as a list where each item is a dict with keys like
"quantizer_path" (wildcard pattern), optional "enable" (bool), optional "cfg"
(dict, e.g., {"num_bits": int, "axis": int}), and that multiple entries are
applied in order; also mention the top-level "algorithm" field (e.g., "max") and
how entries such as
{"quantizer_path":"*weight_quantizer","cfg":{"num_bits":8,"axis":0}} and
{"quantizer_path":"*input_quantizer","cfg":{"num_bits":8,"axis":-1}} are
interpreted by quantize()/quant_cfg.
In `@modelopt/torch/quantization/utils/core_utils.py`:
- Around line 835-836: The code is appending dict.items() tuples into
quant_cfg["quant_cfg"], creating mixed tuples instead of the expected
list[QuantizerCfgEntry] and can error when quant_cfg["quant_cfg"] is a dict;
normalize both sources into proper QuantizerCfgEntry dicts: ensure inner is a
list (if quant_cfg.get("quant_cfg") is a dict, convert it to a list of entries),
and transform kv_cache_quant_cfg (the dict of pattern->cfg) into a list of
{"quantizer_path": pattern, ...cfg} or at least {"quantizer_path": pattern,
"enable": cfg.get("enable", False)} entries before concatenating; assign the
concatenated list back to quant_cfg["quant_cfg"] so downstream consumers only
see a homogeneous list[QuantizerCfgEntry].
In `@tests/_test_utils/torch/quantization/quantize_common.py`:
- Around line 50-61: The loop over config["quant_cfg"] looks for a
"*weight_quantizer" entry and mutates its "block_sizes" but silently does
nothing if no match is found; add a fail-fast check by tracking whether a
matching entry was found (e.g., set a boolean flag when pat ==
"*weight_quantizer") and after the loop raise an explicit error (ValueError or
AssertionError) if no match was found so tests fail early rather than silently
proceeding without applying block_size to the intended entry.
In `@tests/gpu/torch/quantization/test_quantize_cuda.py`:
- Around line 133-135: The test is mutating the shared constant
mtq.FP8_2D_BLOCKWISE_WEIGHT_ONLY_CFG by editing config["quant_cfg"] in place;
instead, create a deep copy of that constant (e.g., via copy.deepcopy) into the
local variable config before patching so changes to entry.setdefault("cfg",
{})["block_sizes"] = {-1: 8, -2: 8} only affect this test; locate the code that
assigns config from mtq.FP8_2D_BLOCKWISE_WEIGHT_ONLY_CFG and replace it with a
deep-copied version so modifications to quant_cfg/ block_sizes/quantizer_path do
not leak to other parametrized cases.
In `@tests/unit/torch/quantization/test_autoquant.py`:
- Around line 492-507: The current assertion allows tuple-style payloads by
falling back to entry[0]/entry[1]; restrict the check to dict-style
QuantizerCfgEntry entries only by iterating over config["quant_cfg"] and
filtering with isinstance(entry, dict), then assert that some dict has
entry.get("quantizer_path") == "*" and entry.get("enable") is False; update the
assertion around the quant_cfg check in
tests/unit/torch/quantization/test_autoquant.py accordingly (referencing
config["quant_cfg"] and the dict keys "quantizer_path" and "enable").
---
Outside diff comments:
In `@examples/llm_autodeploy/run_auto_quantize.py`:
- Around line 134-143: The code currently hardcodes trust_remote_code=True when
calling AutoModelForCausalLM.from_pretrained and AutoTokenizer.from_pretrained,
creating an RCE risk; update the modelopt_ptq(...) function signature to accept
a new parameter (e.g., trust_remote_code: bool = False) and replace the
hardcoded True with that parameter for both AutoModelForCausalLM.from_pretrained
and AutoTokenizer.from_pretrained so callers can opt-in explicitly while
defaulting to False.
In `@examples/llm_ptq/example_utils.py`:
- Around line 208-247: The code mutates quant_cfg["quant_cfg"] assuming it's a
list; normalize quant_cfg["quant_cfg"] to the list schema up front (convert
legacy dict form into the new list form) before any mutations so
weight_quantizer lookup and .append() calls are safe; when locating the weight
quantizer use a reverse-scan (e.g., search from the end) to find the effective
"*weight_quantizer" entry referenced by weight_quantizer_entry so awq_block_size
handling (weight_quantizer_entry / weight_quantizer) and the model_type ==
"phi4mm" .append() calls operate on the correct, normalized list.
In `@modelopt/onnx/llm_export_utils/quantization_utils.py`:
- Around line 57-111: get_quant_config currently mutates the shared presets
mtq.FP8_DEFAULT_CFG / mtq.NVFP4_DEFAULT_CFG / mtq.INT4_AWQ_CFG in place which
causes duplicated lm_head entries on repeated calls; fix by making a deep copy
of the chosen preset (e.g., import copy and do quant_cfg =
copy.deepcopy(mtq.FP8_DEFAULT_CFG) / copy.deepcopy(mtq.NVFP4_DEFAULT_CFG) /
copy.deepcopy(mtq.INT4_AWQ_CFG) inside get_quant_config) before building
quant_cfg_list and appending lm_head entries so the original mtq presets remain
unchanged.
In `@modelopt/torch/quantization/config.py`:
- Around line 541-545: The dict _nvfp4_cfg_bs32 contains an internal "enable"
key which causes a duplicate keyword when later unpacked into
QuantizerAttributeConfig; remove the "enable" entry from _nvfp4_cfg_bs32 so it
only contains its attribute settings (num_bits, block_sizes, scale_bits/type
entries) and let the caller (_nvfp4_selective_quant_cfg -> QuantizerCfgEntry)
supply the enable flag separately; specifically edit the _nvfp4_cfg_bs32
constant to drop the "enable": True key so that conversion code that does
QuantizerAttributeConfig(**cfg, enable=enable) no longer passes enable twice.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: fcb69c76-5a46-4dd0-82f2-905613c6cc83
📒 Files selected for processing (53)
docs/source/guides/1_quantization.rstdocs/source/guides/_pytorch_quantization.rstdocs/source/guides/_quant_cfg.rstexamples/diffusers/quantization/config.pyexamples/llm_autodeploy/run_auto_quantize.pyexamples/llm_eval/quantization_utils.pyexamples/llm_ptq/example_utils.pyexamples/llm_ptq/hf_ptq.pyexamples/llm_ptq/notebooks/2_PTQ_AWQ_Calibration.ipynbexamples/llm_ptq/notebooks/3_PTQ_AutoQuantization.ipynbexamples/llm_qat/main.pyexamples/vllm_serve/fakequant_worker.pyexamples/windows/torch_onnx/diffusers/qad_example/sample_example_qad_diffusers.pymodelopt/onnx/llm_export_utils/quantization_utils.pymodelopt/torch/export/unified_export_hf.pymodelopt/torch/quantization/algorithms.pymodelopt/torch/quantization/backends/fp8_per_tensor_gemm.pymodelopt/torch/quantization/backends/nvfp4_gemm.pymodelopt/torch/quantization/compress.pymodelopt/torch/quantization/config.pymodelopt/torch/quantization/conversion.pymodelopt/torch/quantization/model_calib.pymodelopt/torch/quantization/model_quant.pymodelopt/torch/quantization/nn/modules/tensor_quantizer.pymodelopt/torch/quantization/utils/core_utils.pymodelopt/torch/sparsity/attention_sparsity/conversion.pymodelopt_recipes/general/ptq/fp8_default-fp8_kv.ymlmodelopt_recipes/general/ptq/nvfp4_default-fp8_kv.ymlmodelopt_recipes/general/ptq/nvfp4_mlp_only-fp8_kv.ymlmodelopt_recipes/general/ptq/nvfp4_omlp_only-fp8_kv.ymltests/_test_utils/torch/export/utils.pytests/_test_utils/torch/quantization/onnx_export.pytests/_test_utils/torch/quantization/quantize_common.pytests/gpu/torch/quantization/test_hadamard.pytests/gpu/torch/quantization/test_quant_rnn_cuda.pytests/gpu/torch/quantization/test_quantize_cuda.pytests/gpu_megatron/torch/peft/plugins/test_megatron_peft.pytests/gpu_megatron/torch/quantization/plugins/test_apex.pytests/gpu_megatron/torch/quantization/plugins/test_megatron.pytests/unit/recipe/test_loader.pytests/unit/torch/quantization/plugins/test_attention_quant.pytests/unit/torch/quantization/plugins/test_huggingface.pytests/unit/torch/quantization/plugins/test_peft.pytests/unit/torch/quantization/test_autoquant.pytests/unit/torch/quantization/test_compute_quantization_mse.pytests/unit/torch/quantization/test_config_validation.pytests/unit/torch/quantization/test_custom_backend.pytests/unit/torch/quantization/test_quant_activations.pytests/unit/torch/quantization/test_quant_batchnorm.pytests/unit/torch/quantization/test_quant_rnn.pytests/unit/torch/quantization/test_quantize_cpu.pytests/unit/torch/quantization/test_quantize_replace.pytests/unit/torch/quantization/test_tensor_quant_cpu.py
| "quant_cfg": [ | ||
| # Disable all quantizers by default, then enable selectively | ||
| {"quantizer_path": "*", "enable": False}, | ||
|
|
||
| # Configure weight quantizers with 4-bit precision and 128-element blocks | ||
| "*weight_quantizer": {"num_bits": 4, "block_sizes": {-1: 128}, "enable": True}, | ||
| {"quantizer_path": "*weight_quantizer", "cfg": {"num_bits": 4, "block_sizes": {-1: 128}}, "enable": True}, | ||
|
|
||
| # Configure input quantizers with 8-bit dynamic quantization | ||
| "*input_quantizer": {"num_bits": 8, "type": "dynamic", "block_sizes": {-1: None}}, | ||
| {"quantizer_path": "*input_quantizer", "cfg": {"num_bits": 8, "type": "dynamic", "block_sizes": {-1: None}}}, | ||
|
|
||
| # Include default disabled quantizer configurations | ||
| **_default_disabled_quantizer_cfg, | ||
| }, | ||
| *_default_disabled_quantizer_cfg, | ||
| ], |
There was a problem hiding this comment.
Define or qualify _default_disabled_quantizer_cfg in this example.
This snippet now references _default_disabled_quantizer_cfg without importing or qualifying it, so it fails when copied verbatim. Please either add the import or qualify it as mtq.config._default_disabled_quantizer_cfg.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docs/source/guides/_pytorch_quantization.rst` around lines 260 - 272, The
example references _default_disabled_quantizer_cfg but doesn't define or import
it; fix by either adding an import for it (e.g., from mtq.config import
_default_disabled_quantizer_cfg) at the top of the snippet or qualify the
reference inline as mtq.config._default_disabled_quantizer_cfg so the example is
self-contained and runnable; update the code block containing "quant_cfg" to use
the qualified name or add the import near the example.
| The recommended pattern used by all built-in configs is: | ||
|
|
||
| .. code-block:: python | ||
| "quant_cfg": [ | ||
| # 1. Deny all quantizers by default | ||
| {"quantizer_path": "*", "enable": False}, | ||
| # 2. Enable and configure the target quantizers | ||
| {"quantizer_path": "*weight_quantizer", "cfg": {"num_bits": 8, "axis": 0}}, | ||
| {"quantizer_path": "*input_quantizer", "cfg": {"num_bits": 8, "axis": None}}, | ||
| # 3. Apply standard exclusions last (BatchNorm, LM head, MoE routers, etc.) | ||
| *mtq.config._default_disabled_quantizer_cfg, | ||
| ] | ||
| .. note:: | ||
|
|
||
| The deny-all entry ``{"quantizer_path": "*", "enable": False}`` is available as | ||
| :data:`modelopt.torch.quantization.config._base_disable_all` and is prepended to every | ||
| built-in config. This ensures quantizers not explicitly targeted remain disabled. |
There was a problem hiding this comment.
Don’t describe _base_disable_all as universal across built-ins.
Several specialized recipes in modelopt/torch/quantization/config.py intentionally omit the deny-all prelude (for example the KV-only configs). Saying it is prepended to every built-in config will send users to the wrong pattern for those formats.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docs/source/guides/_quant_cfg.rst` around lines 123 - 143, The docs currently
state that _base_disable_all (a.k.a. the deny-all entry) is prepended to every
built-in config; update the text to accurately say that the deny-all entry
(referenced as _base_disable_all or mtq.config._default_disabled_quantizer_cfg)
is included in most built-in configs but not all, and call out that some
specialized recipes (e.g., the KV-only configs in the quantization config set)
intentionally omit the deny-all prelude; adjust the example wording and the note
to avoid the absolute "every built-in config" claim and add a short
parenthetical pointer to the specialized recipes that omit it (referencing their
config names such as the KV-only configs).
| for entry in quant_config["quant_cfg"]: | ||
| p = entry.get("cfg", {}) | ||
| if isinstance(p, dict) and "num_bits" in p and "trt_high_precision_dtype" not in p: | ||
| p["trt_high_precision_dtype"] = trt_high_precision_dtype |
There was a problem hiding this comment.
Handle sequential cfg lists when injecting trt_high_precision_dtype.
The new quant_cfg format allows cfg to be a list of attribute dicts, but Line 107 only updates plain dicts. Any sequential entry will skip this dtype injection entirely.
🛠️ Proposed fix
- for entry in quant_config["quant_cfg"]:
- p = entry.get("cfg", {})
- if isinstance(p, dict) and "num_bits" in p and "trt_high_precision_dtype" not in p:
- p["trt_high_precision_dtype"] = trt_high_precision_dtype
+ for entry in quant_config["quant_cfg"]:
+ cfg = entry.get("cfg")
+ cfg_items = [cfg] if isinstance(cfg, dict) else (cfg or [])
+ for attrs in cfg_items:
+ if "num_bits" in attrs and "trt_high_precision_dtype" not in attrs:
+ attrs["trt_high_precision_dtype"] = trt_high_precision_dtype🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@examples/diffusers/quantization/config.py` around lines 106 - 109, The loop
that injects trt_high_precision_dtype into quant_config["quant_cfg"] only
handles when cfg is a dict (variable p) and skips the case where cfg is a list
of attribute dicts; update the logic in the loop that iterates
quant_config["quant_cfg"] to detect if p is a list and, when so, iterate its
elements and for each element that is a dict and contains "num_bits" but not
"trt_high_precision_dtype", set element["trt_high_precision_dtype"] =
trt_high_precision_dtype; keep the existing dict-handling branch for backward
compatibility so both dict and list cfg forms are supported.
| quant_config["quant_cfg"].append( | ||
| { | ||
| "quantizer_path": aq_name, | ||
| "cfg": { | ||
| "num_bits": 8, | ||
| "axis": None, | ||
| "percentile": percentile, | ||
| "total_step": n_steps, | ||
| "collect_method": collect_method, | ||
| "calibrator": ( | ||
| PercentileCalibrator, | ||
| (), | ||
| { | ||
| "num_bits": 8, | ||
| "axis": None, | ||
| "percentile": percentile, | ||
| "total_step": n_steps, | ||
| "collect_method": collect_method, | ||
| }, | ||
| ), | ||
| }, | ||
| ), | ||
| } | ||
| } | ||
| ) |
There was a problem hiding this comment.
Make reset_set_int8_config() idempotent under list-based quant_cfg.
Line 128 switched this helper from a replace-by-key update to an unconditional append. Reusing the same quant_config object now accumulates duplicate or stale *{name}*input_quantizer* rules, which changes later-match behavior under the new schema.
🧼 Proposed fix
if isinstance(module, nn.Conv2d):
aq_name = f"*{name}*input_quantizer*"
+ quant_config["quant_cfg"] = [
+ entry
+ for entry in quant_config["quant_cfg"]
+ if not (isinstance(entry, dict) and entry.get("quantizer_path") == aq_name)
+ ]
quant_config["quant_cfg"].append(
{
"quantizer_path": aq_name,🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@examples/diffusers/quantization/config.py` around lines 128 - 147, The
current append in reset_set_int8_config causes duplicate entries in
quant_config["quant_cfg"]; instead, make the operation idempotent by searching
quant_config["quant_cfg"] for an existing dict whose "quantizer_path" equals
aq_name and replace that dict in-place (or update its "cfg") if found, otherwise
append the new entry; modify the code around the append that builds the payload
(the block referencing aq_name, quant_config["quant_cfg"], and the
PercentileCalibrator tuple) to perform a find-and-replace by quantizer_path
rather than always appending.
| "quant_cfg": [ | ||
| *mtq.config._base_disable_all, | ||
| { | ||
| "quantizer_path": "*weight_quantizer", | ||
| "cfg": {"num_bits": 4, "block_sizes": {-1: 128}}, | ||
| "enable": True, | ||
| }, | ||
| { | ||
| "quantizer_path": "*input_quantizer", | ||
| "cfg": {"num_bits": 8, "type": "dynamic", "block_sizes": {-1: None}}, | ||
| }, | ||
| # Disable sensitive layers such as `lm_head`, gate layers in MoE etc. | ||
| **mtq.config._default_disabled_quantizer_cfg, | ||
| }, | ||
| *mtq.config._default_disabled_quantizer_cfg, | ||
| ], |
There was a problem hiding this comment.
Update the BMM-attention detection for list-based quant_cfg.
After switching quant_cfg to a list of entries, _quantize_model_with_dataset()'s later for key in mtq_cfg["quant_cfg"] / "bmm_quantizer" in key check no longer works: iterating the list yields dicts, so that membership test only inspects dict keys. Configs that target *bmm_quantizer now skip register_hf_attentions_on_the_fly(net) and silently miss KV-attention quantization.
💡 Suggested update outside this hunk
- quantize_bmm_attention = False
- for key in mtq_cfg["quant_cfg"]:
- if "bmm_quantizer" in key:
- quantize_bmm_attention = True
+ quantize_bmm_attention = any(
+ "bmm_quantizer" in entry["quantizer_path"]
+ for entry in mtq.config.normalize_quant_cfg_list(mtq_cfg["quant_cfg"])
+ )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@examples/llm_eval/quantization_utils.py` around lines 36 - 49, The loop that
detects BMM-attention entries fails because mtq_cfg["quant_cfg"] is now a list
of dicts; update _quantize_model_with_dataset to handle both list and dict
formats by inspecting each entry: if an entry is a dict, read its
"quantizer_path" (or keys for legacy string entries) and test whether the path
string contains "bmm_quantizer" or matches "*bmm_quantizer"; if
mtq_cfg["quant_cfg"] is still a dict or contains raw strings, preserve the old
membership check. Ensure this detection is used before calling
register_hf_attentions_on_the_fly(net) so KV-attention quantizers targeted via
"quantizer_path": "*bmm_quantizer" are not skipped.
| "quant_cfg": [ | ||
| # Disable all quantizers by default | ||
| {"quantizer_path": "*", "enable": False}, | ||
| # "num_bits" specifies the number of bits for quantization | ||
| # "axis" specifies the axis for quantization | ||
| "*weight_quantizer": {"num_bits": 8, "axis": 0}, | ||
| "*input_quantizer": {"num_bits": 8, "axis": -1}, | ||
|
|
||
| # Default quantization settings | ||
| "default": {"num_bits": 8, "axis": None}, | ||
| } | ||
| "algorithm": "max" | ||
| {"quantizer_path": "*weight_quantizer", "cfg": {"num_bits": 8, "axis": 0}}, | ||
| {"quantizer_path": "*input_quantizer", "cfg": {"num_bits": 8, "axis": -1}}, | ||
| ], | ||
| "algorithm": "max", |
There was a problem hiding this comment.
Finish updating this quantize() doc block to the list-based schema.
Lines 183-191 show the new list-of-dicts format, but the prose immediately above still describes quant_cfg as a dictionary mapping wildcard keys to attributes. That leaves the public API docs contradictory in the same section.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@modelopt/torch/quantization/model_quant.py` around lines 183 - 191, The
docstring for quantize() is out of sync: it still describes quant_cfg as a dict
mapping wildcard keys, but the implementation and example use a list-of-dicts
schema. Update the prose in the quantize() doc block to describe quant_cfg as a
list where each item is a dict with keys like "quantizer_path" (wildcard
pattern), optional "enable" (bool), optional "cfg" (dict, e.g., {"num_bits":
int, "axis": int}), and that multiple entries are applied in order; also mention
the top-level "algorithm" field (e.g., "max") and how entries such as
{"quantizer_path":"*weight_quantizer","cfg":{"num_bits":8,"axis":0}} and
{"quantizer_path":"*input_quantizer","cfg":{"num_bits":8,"axis":-1}} are
interpreted by quantize()/quant_cfg.
| for entry in config["quant_cfg"]: | ||
| pat = ( | ||
| entry["quantizer_path"] | ||
| if isinstance(entry, dict) and "quantizer_path" in entry | ||
| else entry[0] | ||
| ) | ||
| if pat == "*weight_quantizer": | ||
| if isinstance(entry, dict) and "quantizer_path" in entry: | ||
| entry.setdefault("cfg", {})["block_sizes"] = {-1: block_size} | ||
| else: | ||
| entry[1]["block_sizes"] = {-1: block_size} | ||
| break |
There was a problem hiding this comment.
Fail fast when *weight_quantizer is missing.
If no matching entry exists, this function currently returns without applying block_sizes, which can silently invalidate downstream test intent.
🔧 Proposed fix
def get_awq_config(algorithm="awq_lite", block_size=8):
config = copy.deepcopy(mtq.INT4_AWQ_CFG)
+ found_weight_entry = False
for entry in config["quant_cfg"]:
pat = (
entry["quantizer_path"]
if isinstance(entry, dict) and "quantizer_path" in entry
else entry[0]
)
if pat == "*weight_quantizer":
+ found_weight_entry = True
if isinstance(entry, dict) and "quantizer_path" in entry:
entry.setdefault("cfg", {})["block_sizes"] = {-1: block_size}
else:
entry[1]["block_sizes"] = {-1: block_size}
break
+ if not found_weight_entry:
+ raise KeyError("No '*weight_quantizer' entry found in quant_cfg")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/_test_utils/torch/quantization/quantize_common.py` around lines 50 -
61, The loop over config["quant_cfg"] looks for a "*weight_quantizer" entry and
mutates its "block_sizes" but silently does nothing if no match is found; add a
fail-fast check by tracking whether a matching entry was found (e.g., set a
boolean flag when pat == "*weight_quantizer") and after the loop raise an
explicit error (ValueError or AssertionError) if no match was found so tests
fail early rather than silently proceeding without applying block_size to the
intended entry.
| for entry in config["quant_cfg"]: | ||
| if entry.get("quantizer_path") == "*weight_quantizer": | ||
| entry.setdefault("cfg", {})["block_sizes"] = {-1: 8, -2: 8} |
There was a problem hiding this comment.
Deep-copy the shared config before patching block_sizes.
config here is the imported mtq.FP8_2D_BLOCKWISE_WEIGHT_ONLY_CFG object. Mutating config["quant_cfg"] in place leaks the reduced block size into later parametrized cases and any other tests that reuse the same constant.
💡 Suggested fix
if config == mtq.FP8_2D_BLOCKWISE_WEIGHT_ONLY_CFG:
+ import copy
+
+ config = copy.deepcopy(config)
# reduce block sizes for simple testing models
for entry in config["quant_cfg"]:
if entry.get("quantizer_path") == "*weight_quantizer":
entry.setdefault("cfg", {})["block_sizes"] = {-1: 8, -2: 8}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| for entry in config["quant_cfg"]: | |
| if entry.get("quantizer_path") == "*weight_quantizer": | |
| entry.setdefault("cfg", {})["block_sizes"] = {-1: 8, -2: 8} | |
| if config == mtq.FP8_2D_BLOCKWISE_WEIGHT_ONLY_CFG: | |
| import copy | |
| config = copy.deepcopy(config) | |
| # reduce block sizes for simple testing models | |
| for entry in config["quant_cfg"]: | |
| if entry.get("quantizer_path") == "*weight_quantizer": | |
| entry.setdefault("cfg", {})["block_sizes"] = {-1: 8, -2: 8} |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/gpu/torch/quantization/test_quantize_cuda.py` around lines 133 - 135,
The test is mutating the shared constant mtq.FP8_2D_BLOCKWISE_WEIGHT_ONLY_CFG by
editing config["quant_cfg"] in place; instead, create a deep copy of that
constant (e.g., via copy.deepcopy) into the local variable config before
patching so changes to entry.setdefault("cfg", {})["block_sizes"] = {-1: 8, -2:
8} only affect this test; locate the code that assigns config from
mtq.FP8_2D_BLOCKWISE_WEIGHT_ONLY_CFG and replace it with a deep-copied version
so modifications to quant_cfg/ block_sizes/quantizer_path do not leak to other
parametrized cases.
| assert isinstance(config["quant_cfg"], list) | ||
| assert any( | ||
| ( | ||
| entry["quantizer_path"] | ||
| if isinstance(entry, dict) and "quantizer_path" in entry | ||
| else entry[0] | ||
| ) | ||
| == "*" | ||
| and ( | ||
| entry.get("enable") | ||
| if isinstance(entry, dict) and "quantizer_path" in entry | ||
| else entry[1].get("enable") | ||
| ) | ||
| is False | ||
| for entry in config["quant_cfg"] | ||
| ) |
There was a problem hiding this comment.
Keep this assertion strict to QuantizerCfgEntry dicts.
The fallback to entry[0] / entry[1] means this test now accepts tuple-style payloads, so it no longer protects the new list-of-dicts contract from regressions. Assert on dict entries directly.
🧪 Proposed fix
- assert any(
- (
- entry["quantizer_path"]
- if isinstance(entry, dict) and "quantizer_path" in entry
- else entry[0]
- )
- == "*"
- and (
- entry.get("enable")
- if isinstance(entry, dict) and "quantizer_path" in entry
- else entry[1].get("enable")
- )
- is False
- for entry in config["quant_cfg"]
- )
+ assert all(isinstance(entry, dict) for entry in config["quant_cfg"])
+ assert any(
+ entry.get("quantizer_path") == "*" and entry.get("enable") is False
+ for entry in config["quant_cfg"]
+ )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/unit/torch/quantization/test_autoquant.py` around lines 492 - 507, The
current assertion allows tuple-style payloads by falling back to
entry[0]/entry[1]; restrict the check to dict-style QuantizerCfgEntry entries
only by iterating over config["quant_cfg"] and filtering with isinstance(entry,
dict), then assert that some dict has entry.get("quantizer_path") == "*" and
entry.get("enable") is False; update the assertion around the quant_cfg check in
tests/unit/torch/quantization/test_autoquant.py accordingly (referencing
config["quant_cfg"] and the dict keys "quantizer_path" and "enable").
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
5971a7b to
fe2d2f3
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
modelopt/torch/quantization/config.py (1)
1622-1628:⚠️ Potential issue | 🟠 MajorLegacy top-level dict format bypasses normalization.
The
beforevalidator returns non-list inputs unchanged (line 1626-1627). If a user passes a legacy dict-stylequant_cfglike{"*weight_quantizer": {"num_bits": 8}}directly (not wrapped in a list), it will not be normalized. This can causeneed_calibration()and downstream consumers to fail when iterating, as they expect list entries.Consider converting top-level dict inputs to list form before calling
normalize_quant_cfg_list():Suggested fix
`@field_validator`("quant_cfg", mode="before") `@classmethod` def normalize_quant_cfg(cls, v): """Normalize quant_cfg entries: convert dict and tuple forms to QuantizerCfgEntry dicts.""" + if isinstance(v, dict): + # Convert legacy top-level dict format to list of single-key dicts + v = [{k: val} for k, val in v.items()] if not isinstance(v, list): return v return normalize_quant_cfg_list(v)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/quantization/config.py` around lines 1622 - 1628, The before-validator normalize_quant_cfg currently returns non-list inputs unchanged, so legacy top-level dict forms for quant_cfg bypass normalization; update normalize_quant_cfg (the `@field_validator` for "quant_cfg") to detect a dict input (e.g., mapping of pattern->entry) and wrap/convert it into a one-item list (or otherwise translate into the list form expected by normalize_quant_cfg_list) before calling normalize_quant_cfg_list, then return the normalized list so downstream functions like need_calibration() always receive a list of QuantizerCfgEntry dicts.
🧹 Nitpick comments (2)
modelopt/torch/quantization/config.py (2)
1630-1639: Consider validating list-formcfgentries as well.The validator validates dict
cfgvalues againstQuantizerAttributeConfig, but skips listcfg(used forSequentialQuantizer). While downstream processing may catch errors, early validation would surface misconfigurations sooner.Optional enhancement
`@field_validator`("quant_cfg", mode="after") `@classmethod` def validate_quant_cfg_entries(cls, v): """Validate quantizer attribute configs to surface errors (e.g. invalid axis/block_sizes).""" qac_fields = set(QuantizerAttributeConfig.model_fields.keys()) for entry in v: cfg = entry.get("cfg", {}) if isinstance(cfg, dict) and qac_fields & set(cfg.keys()): QuantizerAttributeConfig.model_validate(cfg) + elif isinstance(cfg, list): + for sub_cfg in cfg: + if isinstance(sub_cfg, dict) and qac_fields & set(sub_cfg.keys()): + QuantizerAttributeConfig.model_validate(sub_cfg) return v🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/quantization/config.py` around lines 1630 - 1639, The validator validate_quant_cfg_entries currently only validates dict cfg values against QuantizerAttributeConfig but ignores list-form cfg (used by SequentialQuantizer); update validate_quant_cfg_entries to also handle when entry.get("cfg") is a list by iterating over its items and, for each item that is a dict and contains keys in QuantizerAttributeConfig.model_fields, call QuantizerAttributeConfig.model_validate(item) so each element is validated early; preserve existing behavior for non-dict/non-list cfgs and return v as before.
1554-1567: Replace assert with ValueError for clearer error messaging.The
assertat line 1556 will produce an unclearAssertionErrorif the legacynn.*format has unexpected structure. AValueErrorwith a descriptive message would be more helpful for users debugging config issues.Suggested improvement
def _dict_to_entry(key: str, value) -> QuantizerCfgEntry: if isinstance(key, str) and key.startswith("nn."): - assert isinstance(value, dict) and len(value) == 1 + if not isinstance(value, dict) or len(value) != 1: + raise ValueError( + f"Legacy nn.*-scoped entry expects a single-key dict value, got: {value!r}" + ) q_path, sub_cfg = next(iter(value.items()))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/quantization/config.py` around lines 1554 - 1567, In _dict_to_entry, replace the brittle assert inside the branch that handles legacy "nn." keys with a ValueError that gives a clear message; specifically in the function _dict_to_entry where you check if key.startswith("nn."), validate that value is a dict with exactly one item and if not raise ValueError("invalid legacy nn.* quantizer entry: expected a dict with a single mapping from quantizer path to config, got: {repr(value)}") (include the offending value in the message) so callers get a descriptive error instead of an AssertionError.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@modelopt/torch/quantization/config.py`:
- Around line 1677-1699: The loop over quant_cfg can silently mis-index string
keys when an unnormalized dict is present; update the logic in the loop that
determines name/cfg (the code using entry["quantizer_path"] vs
entry[0]/entry[1]) to explicitly check types: if entry is a dict require the
"quantizer_path" key and use entry.get("cfg"), otherwise require entry to be a
sequence/tuple/list of length >= 2 before using entry[0] and entry[1]; if the
input is neither a valid dict nor a valid 2+ element sequence, raise a clear
ValueError (or log and skip) so _not_dynamic and downstream logic never index
into a string or malformed structure. Ensure you reference quant_cfg, entry,
name, cfg, and _not_dynamic when implementing the checks.
---
Duplicate comments:
In `@modelopt/torch/quantization/config.py`:
- Around line 1622-1628: The before-validator normalize_quant_cfg currently
returns non-list inputs unchanged, so legacy top-level dict forms for quant_cfg
bypass normalization; update normalize_quant_cfg (the `@field_validator` for
"quant_cfg") to detect a dict input (e.g., mapping of pattern->entry) and
wrap/convert it into a one-item list (or otherwise translate into the list form
expected by normalize_quant_cfg_list) before calling normalize_quant_cfg_list,
then return the normalized list so downstream functions like need_calibration()
always receive a list of QuantizerCfgEntry dicts.
---
Nitpick comments:
In `@modelopt/torch/quantization/config.py`:
- Around line 1630-1639: The validator validate_quant_cfg_entries currently only
validates dict cfg values against QuantizerAttributeConfig but ignores list-form
cfg (used by SequentialQuantizer); update validate_quant_cfg_entries to also
handle when entry.get("cfg") is a list by iterating over its items and, for each
item that is a dict and contains keys in QuantizerAttributeConfig.model_fields,
call QuantizerAttributeConfig.model_validate(item) so each element is validated
early; preserve existing behavior for non-dict/non-list cfgs and return v as
before.
- Around line 1554-1567: In _dict_to_entry, replace the brittle assert inside
the branch that handles legacy "nn." keys with a ValueError that gives a clear
message; specifically in the function _dict_to_entry where you check if
key.startswith("nn."), validate that value is a dict with exactly one item and
if not raise ValueError("invalid legacy nn.* quantizer entry: expected a dict
with a single mapping from quantizer path to config, got: {repr(value)}")
(include the offending value in the message) so callers get a descriptive error
instead of an AssertionError.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: a4a13435-75c6-4c01-93a8-09afd75e09ad
📒 Files selected for processing (2)
modelopt/torch/quantization/config.pymodelopt/torch/quantization/utils/core_utils.py
| quant_cfg: list = config.get("quant_cfg") or [] | ||
| for entry in quant_cfg: | ||
| name = ( | ||
| entry["quantizer_path"] | ||
| if isinstance(entry, dict) and "quantizer_path" in entry | ||
| else entry[0] | ||
| ) | ||
| if isinstance(entry, dict) and "quantizer_path" in entry: | ||
| cfg = dict(entry.get("cfg") or {}) | ||
| if "enable" in entry: | ||
| cfg["enable"] = entry["enable"] | ||
| else: | ||
| cfg = entry[1] | ||
| if "weight_quantizer" in name: | ||
| # We don't calibrate weight quantizer | ||
| continue | ||
| # quantization like W4A8 has a list of weight quantizers | ||
| if isinstance(cfg, list): | ||
| for _config in cfg: | ||
| if _not_dynamic(_config): | ||
| print(f"{cfg}: True") | ||
| return True | ||
| elif _not_dynamic(cfg): | ||
| print(f"{cfg}: True") | ||
| elif isinstance(cfg, dict) and _not_dynamic(cfg): | ||
| return True |
There was a problem hiding this comment.
Fallback logic for tuple format may mask errors from unnormalized dict inputs.
The fallback to entry[0]/entry[1] (lines 1682, 1689) is intended for legacy tuple format support. However, if a top-level dict bypasses the validator normalization, iterating over it yields string keys, and entry[0]/entry[1] would index into string characters (e.g., "*"[0] → "*"), leading to silent misbehavior rather than a clear error.
Consider adding explicit type checking or removing the tuple fallback if tuples are no longer a supported input format:
Suggested defensive check
quant_cfg: list = config.get("quant_cfg") or []
for entry in quant_cfg:
+ if not isinstance(entry, dict):
+ raise TypeError(f"Expected dict entry in quant_cfg, got {type(entry).__name__}: {entry!r}")
name = (
entry["quantizer_path"]
if isinstance(entry, dict) and "quantizer_path" in entry
else entry[0]
)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@modelopt/torch/quantization/config.py` around lines 1677 - 1699, The loop
over quant_cfg can silently mis-index string keys when an unnormalized dict is
present; update the logic in the loop that determines name/cfg (the code using
entry["quantizer_path"] vs entry[0]/entry[1]) to explicitly check types: if
entry is a dict require the "quantizer_path" key and use entry.get("cfg"),
otherwise require entry to be a sequence/tuple/list of length >= 2 before using
entry[0] and entry[1]; if the input is neither a valid dict nor a valid 2+
element sequence, raise a clear ValueError (or log and skip) so _not_dynamic and
downstream logic never index into a string or malformed structure. Ensure you
reference quant_cfg, entry, name, cfg, and _not_dynamic when implementing the
checks.
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (2)
modelopt/torch/quantization/config.py (2)
1677-1699:⚠️ Potential issue | 🟡 MinorFallback logic for tuple format may mask errors from unnormalized dict inputs.
The fallback to
entry[0]/entry[1](lines 1682, 1689) is intended for legacy tuple format. However, if a top-level dict bypasses the validator normalization, iterating over it yields string keys, andentry[0]/entry[1]would index into string characters (e.g.,"*weight_quantizer"[0]→"*"), causing silent misbehavior.Suggested defensive check
quant_cfg: list = config.get("quant_cfg") or [] for entry in quant_cfg: + if not isinstance(entry, (dict, tuple, list)): + raise TypeError(f"Expected dict or tuple entry in quant_cfg, got {type(entry).__name__}: {entry!r}") name = ( entry["quantizer_path"] if isinstance(entry, dict) and "quantizer_path" in entry else entry[0] )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/quantization/config.py` around lines 1677 - 1699, The loop over quant_cfg can mis-index string keys when a dict slips through the legacy tuple/path fallback; update the logic in the quant_cfg iteration (around variables entry, name, cfg) to first check types explicitly: if entry is a dict use entry["quantizer_path"] and entry.get("cfg"), else if entry is a sequence (tuple/list) with length >= 2 use entry[0]/entry[1]; otherwise raise a clear ValueError or skip with a logged warning. Ensure the branch that checks "weight_quantizer" still uses the computed name and that _not_dynamic is only called with validated cfg objects.
1622-1628:⚠️ Potential issue | 🟠 MajorLegacy dict-style
quant_cfgis not normalized by the before-validator.The validator returns non-list inputs unchanged (line 1626-1627), so a legacy dict format like
{"*weight_quantizer": {"num_bits": 8}}bypassesnormalize_quant_cfg_list()entirely. If such a dict reaches downstream code expecting a list, it will fail.Since
normalize_quant_cfg_list()already supports converting legacy dict formats via_dict_to_entry(), the validator should route dicts to it:Suggested fix
`@field_validator`("quant_cfg", mode="before") `@classmethod` def normalize_quant_cfg(cls, v): """Normalize quant_cfg entries: convert dict and tuple forms to QuantizerCfgEntry dicts.""" - if not isinstance(v, list): - return v - return normalize_quant_cfg_list(v) + if isinstance(v, dict): + # Legacy top-level dict format: convert to list of single-key dicts + v = [{k: val} for k, val in v.items()] + if isinstance(v, list): + return normalize_quant_cfg_list(v) + return v🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/quantization/config.py` around lines 1622 - 1628, The before-validator normalize_quant_cfg currently returns non-list values unchanged, so legacy dict-style quant_cfg like {"*weight_quantizer": {"num_bits": 8}} bypasses normalization; update normalize_quant_cfg to detect dict inputs and pass them into normalize_quant_cfg_list (which uses _dict_to_entry) so all legacy dict formats are converted into the expected list of QuantizerCfgEntry objects—i.e., in normalize_quant_cfg (the `@field_validator` for "quant_cfg") call normalize_quant_cfg_list(v) for dicts as well as lists so downstream code always receives a normalized list.
🧹 Nitpick comments (1)
modelopt/torch/quantization/config.py (1)
162-168: TypedDicttotal=Falseconflicts with requiredquantizer_pathdocumentation.The docstring states
quantizer_pathis required, buttotal=Falsemakes all fields optional at the type level. Consider usingtyping.Requiredfromtyping_extensionsto markquantizer_pathas required while keeping other fields optional:-class QuantizerCfgEntry(TypedDict, total=False): +from typing_extensions import Required + +class QuantizerCfgEntry(TypedDict, total=False): """A single entry in a ``quant_cfg`` list.""" - quantizer_path: str # required; matched against quantizer module names + quantizer_path: Required[str] # required; matched against quantizer module names parent_class: str | None # optional; filters by pytorch module class name (e.g. "nn.Linear")This ensures static type checkers flag missing
quantizer_pathwhile allowing other fields to be omitted.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/quantization/config.py` around lines 162 - 168, QuantizerCfgEntry currently uses TypedDict(total=False) which makes all keys optional but the docstring says quantizer_path is required; update QuantizerCfgEntry to mark quantizer_path as required using typing_extensions.Required (or typing.Required on newer Python): change the TypedDict definition so quantizer_path: Required[str] while keeping parent_class, cfg, and enable as optional, and import Required from typing_extensions if not available in typing to ensure static type-checkers enforce the required key.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@modelopt/torch/quantization/utils/core_utils.py`:
- Around line 830-849: The function update_quant_cfg_with_kv_cache_quant can
raise a TypeError if quant_cfg.get("quant_cfg") returns a legacy dict instead of
a list; change it to defensively normalize inner by checking the type of
quant_cfg.get("quant_cfg") and converting a dict into a single-item list (or
treating None/Falsey as the existing default list), then set
quant_cfg["quant_cfg"] = inner + list(kv_cache_quant_cfg) and return the
deep-copied quant_cfg; update the logic around the inner variable in
update_quant_cfg_with_kv_cache_quant to handle list, dict, and None cases
explicitly.
---
Duplicate comments:
In `@modelopt/torch/quantization/config.py`:
- Around line 1677-1699: The loop over quant_cfg can mis-index string keys when
a dict slips through the legacy tuple/path fallback; update the logic in the
quant_cfg iteration (around variables entry, name, cfg) to first check types
explicitly: if entry is a dict use entry["quantizer_path"] and entry.get("cfg"),
else if entry is a sequence (tuple/list) with length >= 2 use entry[0]/entry[1];
otherwise raise a clear ValueError or skip with a logged warning. Ensure the
branch that checks "weight_quantizer" still uses the computed name and that
_not_dynamic is only called with validated cfg objects.
- Around line 1622-1628: The before-validator normalize_quant_cfg currently
returns non-list values unchanged, so legacy dict-style quant_cfg like
{"*weight_quantizer": {"num_bits": 8}} bypasses normalization; update
normalize_quant_cfg to detect dict inputs and pass them into
normalize_quant_cfg_list (which uses _dict_to_entry) so all legacy dict formats
are converted into the expected list of QuantizerCfgEntry objects—i.e., in
normalize_quant_cfg (the `@field_validator` for "quant_cfg") call
normalize_quant_cfg_list(v) for dicts as well as lists so downstream code always
receives a normalized list.
---
Nitpick comments:
In `@modelopt/torch/quantization/config.py`:
- Around line 162-168: QuantizerCfgEntry currently uses TypedDict(total=False)
which makes all keys optional but the docstring says quantizer_path is required;
update QuantizerCfgEntry to mark quantizer_path as required using
typing_extensions.Required (or typing.Required on newer Python): change the
TypedDict definition so quantizer_path: Required[str] while keeping
parent_class, cfg, and enable as optional, and import Required from
typing_extensions if not available in typing to ensure static type-checkers
enforce the required key.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: ea0ca383-fd43-415d-8bef-252ce66fbfc5
📒 Files selected for processing (2)
modelopt/torch/quantization/config.pymodelopt/torch/quantization/utils/core_utils.py
| def update_quant_cfg_with_kv_cache_quant( | ||
| quant_cfg: dict[str, Any], kv_cache_quant_cfg: dict[str, Any] | ||
| quant_cfg: dict[str, Any], kv_cache_quant_cfg: list[QuantizerCfgEntry] | ||
| ) -> dict[str, Any]: | ||
| """Update the quant_cfg with the kv cache quant_cfg.""" | ||
| """Update the quant_cfg with the kv cache quant_cfg. | ||
|
|
||
| Args: | ||
| quant_cfg: The outer quantization config dict (with ``"quant_cfg"`` and ``"algorithm"`` keys). | ||
| kv_cache_quant_cfg: A list of :class:`QuantizerCfgEntry | ||
| <modelopt.torch.quantization.config.QuantizerCfgEntry>` dicts for KV cache quantization, | ||
| typically ``some_kv_cfg["quant_cfg"]``. | ||
|
|
||
| Returns: | ||
| A deep copy of ``quant_cfg`` with the KV cache entries appended to ``quant_cfg["quant_cfg"]``. | ||
| """ | ||
| # If quant_cfg["quant_cfg"] is None, it corresponds to only kv cache quantization case | ||
| quant_cfg = copy.deepcopy(quant_cfg) | ||
| quant_cfg["quant_cfg"] = quant_cfg.get("quant_cfg") or {"default": {"enable": False}} | ||
| quant_cfg["quant_cfg"].update(kv_cache_quant_cfg) | ||
| inner: list[QuantizerCfgEntry] = quant_cfg.get("quant_cfg") or [ | ||
| {"quantizer_path": "*", "enable": False} | ||
| ] | ||
| quant_cfg["quant_cfg"] = inner + list(kv_cache_quant_cfg) |
There was a problem hiding this comment.
Potential issue if quant_cfg["quant_cfg"] is a legacy dict format.
The function assumes quant_cfg.get("quant_cfg") returns a list (or falsey value). If a caller passes an unnormalized config where quant_cfg["quant_cfg"] is still a legacy dict, line 849 would raise TypeError when attempting dict + list.
Consider normalizing the inner config or adding a type check:
Suggested defensive fix
+from modelopt.torch.quantization.config import normalize_quant_cfg_list
+
def update_quant_cfg_with_kv_cache_quant(
quant_cfg: dict[str, Any], kv_cache_quant_cfg: list[QuantizerCfgEntry]
) -> dict[str, Any]:
...
quant_cfg = copy.deepcopy(quant_cfg)
- inner: list[QuantizerCfgEntry] = quant_cfg.get("quant_cfg") or [
- {"quantizer_path": "*", "enable": False}
- ]
+ raw_inner = quant_cfg.get("quant_cfg")
+ if raw_inner is None:
+ inner = [{"quantizer_path": "*", "enable": False}]
+ elif isinstance(raw_inner, dict):
+ # Legacy dict format - convert to list
+ inner = normalize_quant_cfg_list([{k: v} for k, v in raw_inner.items()])
+ else:
+ inner = list(raw_inner)
quant_cfg["quant_cfg"] = inner + list(kv_cache_quant_cfg)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@modelopt/torch/quantization/utils/core_utils.py` around lines 830 - 849, The
function update_quant_cfg_with_kv_cache_quant can raise a TypeError if
quant_cfg.get("quant_cfg") returns a legacy dict instead of a list; change it to
defensively normalize inner by checking the type of quant_cfg.get("quant_cfg")
and converting a dict into a single-item list (or treating None/Falsey as the
existing default list), then set quant_cfg["quant_cfg"] = inner +
list(kv_cache_quant_cfg) and return the deep-copied quant_cfg; update the logic
around the inner variable in update_quant_cfg_with_kv_cache_quant to handle
list, dict, and None cases explicitly.
|
| # 1. Deny all quantizers by default | ||
| {"quantizer_path": "*", "enable": False}, | ||
| # 2. Enable and configure the target quantizers | ||
| {"quantizer_path": "*weight_quantizer", "cfg": {"num_bits": 8, "axis": 0}}, | ||
| {"quantizer_path": "*input_quantizer", "cfg": {"num_bits": 8, "axis": None}}, |
There was a problem hiding this comment.
| # 1. Deny all quantizers by default | |
| {"quantizer_path": "*", "enable": False}, | |
| # 2. Enable and configure the target quantizers | |
| {"quantizer_path": "*weight_quantizer", "cfg": {"num_bits": 8, "axis": 0}}, | |
| {"quantizer_path": "*input_quantizer", "cfg": {"num_bits": 8, "axis": None}}, | |
| # 1. Deny all quantizers by default | |
| {"*", {"enable": False}}, | |
| # 2. Enable and configure the target quantizers | |
| {"*weight_quantizer": {"num_bits": 8, "axis": 0}}, | |
| {"*input_quantizer": {"num_bits": 8, "axis": None}}, |
There was a problem hiding this comment.
Why do we need the following fields "quantizer_path", "cfg"
What does this PR do?
Summary
Redesigns the
quant_cfgconfiguration format in ModelOpt's PyTorch quantization stack, replacing the previous dict-based format with an ordered list of typedQuantizerCfgEntrydicts.Motivation
The old
quant_cfgdict had several pain points:"default"key: an implicit, undocumented catch-all that was easy to misuseNew format
quant_cfgis now an ordered list ofQuantizerCfgEntryTypedDicts. Each entry has:quantizer_path(required):fnmatchwildcard matched against quantizer module namescfg(optional): dict (or list of dicts) ofQuantizerAttributeConfigfieldsenable(optional): toggles quantizer on/off independently ofcfgparent_class(optional): restricts match to quantizers inside the given PyTorch module classEntries are applied in list order; later entries override earlier ones. The canonical pattern is deny-all first (
_base_disable_all), then selectively re-enable, then apply standard exclusions (_default_disabled_quantizer_cfg).Changes
config.py:QuantizerCfgEntryTypedDict andnormalize_quant_cfg_list()added;_default_disabled_quantizer_cfg/_mamba_moe_disabled_quantizer_cfgconverted from dicts to lists; all built-in configs (INT8_DEFAULT_CFG,FP8_DEFAULT_CFG,NVFP4_DEFAULT_CFG, etc.) converted to list format; twoQuantizeConfigfield validators added for normalization and early validationconversion.py:set_quantizer_by_cfgrewritten to iterate the list directly;set_quantizer_attributes_fullandset_quantizer_attributes_partialdistinguished;parent_classresolution uses new_DMRegistryCls.get_nn_class()against the originalnn.Moduleclassdynamic.py:_DMRegistryCls.get_nn_class()added to resolve string class names to their originalnn.Modulesubclass forisinstancecheckstensor_quantizer.py:set_from_attribute_configsignature narrowed toQuantizerAttributeConfig; usesmodel_dump()(notexclude_unset) to enforce entry atomicity — unset fields reset to defaults rather than inheriting from prior entriesmodelopt_recipes/general/ptq/*.ymlfiles updated to the new list formatdocs/source/guides/_quant_cfg.rstcovering entry format, ordering, atomicity, and common patternsBackward compatibility
normalize_quant_cfg_list()is called automatically by theQuantizeConfigPydantic validator, so existing code using the old single-key dict format ({"*weight_quantizer": {"num_bits": 8}}) continues to work without modification.Test coverage
Additional Information
Summary by CodeRabbit
New Features
Documentation
Refactor
Tests