Conversation
📝 WalkthroughWalkthroughThis PR adds recipe-based quantization as an alternative to format-based quantization in the HuggingFace PTQ example scripts. The parser accepts a mutually exclusive ChangesRecipe-based PTQ Support
🎯 2 (Simple) | ⏱️ ~12 minutes 🚥 Pre-merge checks | ✅ 6✅ Passed checks (6 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@examples/llm_ptq/scripts/huggingface_example.sh`:
- Around line 64-70: The script leaves unquoted expansions and a plain string
for multi-word args which risks word-splitting: update the basename calls that
build MODEL_NAME/RECIPE_TAG to quote MODEL_PATH and RECIPE (use basename
"$MODEL_PATH" and basename "$RECIPE"), and change QUANT_SPEC_ARGS into an array
(e.g., declare -a QUANT_SPEC_ARGS and populate elements) so when invoking
hf_ptq.py you pass the arguments safely as "${QUANT_SPEC_ARGS[@]}"; ensure any
places that appended suffixes to MODEL_NAME still use the quoted basename
results to avoid splitting.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 8d11f0f3-5321-403b-a703-3ab3972dfe59
📒 Files selected for processing (4)
examples/llm_ptq/scripts/huggingface_example.shexamples/llm_ptq/scripts/parser.shmodelopt_recipes/general/ptq/nvfp4_experts_only_mse-fp8_cast_kv.yamlmodelopt_recipes/general/ptq/nvfp4_mlp_only_mse-fp8_cast_kv.yaml
| # When using --recipe, build the model name from the recipe basename (without | ||
| # directory or .yaml suffix) so each recipe gets its own SAVE_PATH. | ||
| if [ -n "$RECIPE" ]; then | ||
| RECIPE_TAG=$(basename "$RECIPE" .yaml | sed 's/[^0-9a-zA-Z\-]/_/g') | ||
| MODEL_NAME=$(basename $MODEL_PATH | sed 's/[^0-9a-zA-Z\-]/_/g')_recipe_${RECIPE_TAG} | ||
| else | ||
| MODEL_NAME=$(basename $MODEL_PATH | sed 's/[^0-9a-zA-Z\-]/_/g')_${QFORMAT_MODIFIED}${KV_CACHE_QUANT:+_kv_${KV_CACHE_QUANT}} |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Read-only verification of word-splitting-sensitive expansions in changed recipe flow.
rg -n 'basename \$MODEL_PATH|QUANT_SPEC_ARGS=.*--recipe=\$RECIPE|^\s+\$QUANT_SPEC_ARGS\s+\\$' examples/llm_ptq/scripts/huggingface_example.shRepository: NVIDIA/Model-Optimizer
Length of output: 375
🏁 Script executed:
cat -n examples/llm_ptq/scripts/huggingface_example.sh | sed -n '60,75p'Repository: NVIDIA/Model-Optimizer
Length of output: 782
🏁 Script executed:
cat -n examples/llm_ptq/scripts/huggingface_example.sh | sed -n '170,190p'Repository: NVIDIA/Model-Optimizer
Length of output: 890
Quote $MODEL_PATH and use array syntax for QUANT_SPEC_ARGS to prevent shell word-splitting bugs.
Unquoted basename $MODEL_PATH on lines 68–70 and unquoted $QUANT_SPEC_ARGS expansion on line 185 can break if paths or the recipe argument contain spaces or special characters. This contradicts line 67, which already quotes $RECIPE in a similar basename call. Convert QUANT_SPEC_ARGS to an array to safely pass multi-word arguments to hf_ptq.py.
Proposed fix
if [ -n "$RECIPE" ]; then
RECIPE_TAG=$(basename "$RECIPE" .yaml | sed 's/[^0-9a-zA-Z\-]/_/g')
- MODEL_NAME=$(basename $MODEL_PATH | sed 's/[^0-9a-zA-Z\-]/_/g')_recipe_${RECIPE_TAG}
+ MODEL_NAME=$(basename "$MODEL_PATH" | sed 's/[^0-9a-zA-Z\-]/_/g')_recipe_${RECIPE_TAG}
else
- MODEL_NAME=$(basename $MODEL_PATH | sed 's/[^0-9a-zA-Z\-]/_/g')_${QFORMAT_MODIFIED}${KV_CACHE_QUANT:+_kv_${KV_CACHE_QUANT}}
+ MODEL_NAME=$(basename "$MODEL_PATH" | sed 's/[^0-9a-zA-Z\-]/_/g')_${QFORMAT_MODIFIED}${KV_CACHE_QUANT:+_kv_${KV_CACHE_QUANT}}
fi
@@
- if [ -n "$RECIPE" ]; then
- QUANT_SPEC_ARGS="--recipe=$RECIPE"
- else
- QUANT_SPEC_ARGS="--qformat=${QFORMAT// /,}"
- fi
+ if [ -n "$RECIPE" ]; then
+ QUANT_SPEC_ARGS=(--recipe="$RECIPE")
+ else
+ QUANT_SPEC_ARGS=(--qformat="${QFORMAT// /,}")
+ fi
python hf_ptq.py \
--pyt_ckpt_path=$MODEL_PATH \
--export_path=$SAVE_PATH \
--sparsity_fmt=$SPARSITY_FMT \
- $QUANT_SPEC_ARGS \
+ "${QUANT_SPEC_ARGS[@]}" \
--calib_size=$CALIB_SIZE \Also applies to lines 177–185.
🧰 Tools
🪛 Shellcheck (0.11.0)
[info] 68-68: Double quote to prevent globbing and word splitting.
(SC2086)
[info] 70-70: Double quote to prevent globbing and word splitting.
(SC2086)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@examples/llm_ptq/scripts/huggingface_example.sh` around lines 64 - 70, The
script leaves unquoted expansions and a plain string for multi-word args which
risks word-splitting: update the basename calls that build MODEL_NAME/RECIPE_TAG
to quote MODEL_PATH and RECIPE (use basename "$MODEL_PATH" and basename
"$RECIPE"), and change QUANT_SPEC_ARGS into an array (e.g., declare -a
QUANT_SPEC_ARGS and populate elements) so when invoking hf_ptq.py you pass the
arguments safely as "${QUANT_SPEC_ARGS[@]}"; ensure any places that appended
suffixes to MODEL_NAME still use the quoted basename results to avoid splitting.
…recipe support in scripts - Add modelopt_recipes/general/ptq/nvfp4_experts_only_mse-fp8_cast_kv.yaml, combining experts-only NVFP4 W4A4 with the MSE FP8 scale-sweep weight calibration (algorithm: mse, fp8_scale_sweep: true; expert weight blocks use nvfp4_static so the static FP8 sweep applies) and FP8 KV cache via the kv_fp8_cast unit (use_constant_amax: true). - examples/llm_ptq/scripts: thread a new --recipe flag through parser.sh and huggingface_example.sh. Either --quant or --recipe is required; passing both errors out. When --recipe is used, the script derives MODEL_NAME from the recipe basename, passes --recipe= to hf_ptq.py, and exits after export with a TRT-LLM deployment hint (recipes can produce arbitrary configs). - Drop the qformat case-statement whitelist in huggingface_example.sh; let hf_ptq.py be the single source of truth for valid qformats / recipes. Signed-off-by: Chenjie Luo <chenjiel@nvidia.com>
Same shape as nvfp4_experts_only_mse-fp8_cast_kv but with the broader
*mlp* / *block_sparse_moe* / *.experts.* patterns from
nvfp4_mlp_only-kv_fp8.yaml so it covers both dense MLP and MoE expert
weights:
- algorithm: { method: mse, fp8_scale_sweep: true, layerwise: false }
- All MLP weight quantizers use nvfp4_static so the static FP8 scale
sweep applies (otherwise mse_calibrate skips them).
- Input quantizers use nvfp4 (dynamic).
- KV bmm uses kv_fp8_cast (skips KV calibration; amax hardcoded to FP8
E4M3 max 448.0).
Pre-commit hook check-modelopt-recipes was skipped because the host
conda env has a broken torchvision install that prevents the validator
from importing modelopt; the same hook fails on the existing committed
sibling nvfp4_experts_only-kv_fp8.yaml in this env.
Signed-off-by: Chenjie Luo <chenjiel@nvidia.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1407 +/- ##
==========================================
+ Coverage 76.74% 77.38% +0.64%
==========================================
Files 476 476
Lines 51307 51307
==========================================
+ Hits 39377 39706 +329
+ Misses 11930 11601 -329
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
d17bcc8 to
2fdfe86
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@modelopt_recipes/general/ptq/nvfp4_mlp_only_mse-fp8_cast_kv.yaml`:
- Line 25: The YAML description incorrectly claims quantization applies to “all
linear layers”; update the description field in
nvfp4_mlp_only_mse-fp8_cast_kv.yaml to accurately state that quantizers are
enabled only for MLP/MoE/expert (MLP-only) patterns and that other linear layers
are not affected, so users aren’t misled about the scope of W4A4/static weight
and FP8 KV settings.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: c1e82d9b-9fb7-44bc-bbcb-509d457b8621
📒 Files selected for processing (4)
examples/llm_ptq/scripts/huggingface_example.shexamples/llm_ptq/scripts/parser.shmodelopt_recipes/general/ptq/nvfp4_experts_only_mse-fp8_cast_kv.yamlmodelopt_recipes/general/ptq/nvfp4_mlp_only_mse-fp8_cast_kv.yaml
✅ Files skipped from review due to trivial changes (1)
- modelopt_recipes/general/ptq/nvfp4_experts_only_mse-fp8_cast_kv.yaml
|
|
||
| metadata: | ||
| recipe_type: ptq | ||
| description: NVFP4 static weight (MSE FP8-scale sweep) and dynamic activation for all linear layers (W4A4), FP8 KV cache with constant amax. |
There was a problem hiding this comment.
Metadata description overstates the quantization scope.
Line 25 says “all linear layers,” but this recipe only enables quantizers for MLP/MoE/expert patterns. Please align the text to the actual scope to avoid user confusion.
Proposed fix
- description: NVFP4 static weight (MSE FP8-scale sweep) and dynamic activation for all linear layers (W4A4), FP8 KV cache with constant amax.
+ description: NVFP4 static weight (MSE FP8-scale sweep) and dynamic activation for MLP/MoE linear layers (W4A4), FP8 KV cache with constant amax.📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| description: NVFP4 static weight (MSE FP8-scale sweep) and dynamic activation for all linear layers (W4A4), FP8 KV cache with constant amax. | |
| description: NVFP4 static weight (MSE FP8-scale sweep) and dynamic activation for MLP/MoE linear layers (W4A4), FP8 KV cache with constant amax. |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@modelopt_recipes/general/ptq/nvfp4_mlp_only_mse-fp8_cast_kv.yaml` at line 25,
The YAML description incorrectly claims quantization applies to “all linear
layers”; update the description field in nvfp4_mlp_only_mse-fp8_cast_kv.yaml to
accurately state that quantizers are enabled only for MLP/MoE/expert (MLP-only)
patterns and that other linear layers are not affected, so users aren’t misled
about the scope of W4A4/static weight and FP8 KV settings.
meenchen
left a comment
There was a problem hiding this comment.
Bot review — DM the bot to share feedback.
The recipe YAMLs themselves look correct — they reuse existing nvfp4, nvfp4_static, and kv_fp8_cast units, and only differ from the -kv_fp8 siblings in the ways the PR body describes (weight quantizers switch to type: static to be eligible for the MSE FP8 scale sweep; algorithm switches to {method: mse, fp8_scale_sweep: true, layerwise: false}). The hf_ptq.py --recipe flag is already wired up from the prior PR and correctly bypasses the post-hoc _set_kv_cache_constant_amax so the recipe is authoritative.
One real regression in huggingface_example.sh: deleting the for qformat in $QFORMAT; do … done whitelist also removes the implicit binding of the lowercase loop variable $qformat, which is still referenced later in the bf16/fp16 shortcut branch (if [ "$qformat" == "bf16" ] || [ "$qformat" == "fp16" ]). With the loop gone, that variable is empty, so the symlink-the-model shortcut for pure-bf16/fp16 runs is now dead code. Replace $qformat with $QFORMAT (or reintroduce a single assignment) to preserve the prior behavior.
Minor stylistic nit: the new files are named *_mse-fp8_cast_kv.yaml while the existing convention in modelopt_recipes/general/ptq/ is *-kv_fp8_cast (KV descriptor after the dash, with kv_ prefix). Not blocking, but inconsistent with siblings.
Tests: no new unit tests, and the PR notes the pre-commit check-modelopt-recipes was skipped locally due to a broken env. The recipes are YAML-only and are covered by the standard recipe loader check in CI, so this is low-risk provided CI runs that check — but the author should confirm the pre-commit hook passes in CI on this branch.
| esac | ||
| done | ||
| IFS=" " | ||
| # Quant format / recipe validation is delegated to hf_ptq.py. |
There was a problem hiding this comment.
Bot comment.
Regression: deleting the for qformat in $QFORMAT; do … done loop also drops the implicit binding of the lowercase loop variable $qformat, which is still used below at if [ "$qformat" == "bf16" ] || [ "$qformat" == "fp16" ]. With the loop removed, $qformat is empty and that bf16/fp16 shortcut (which symlinks the source model into $SAVE_PATH and marks MODEL_CONFIG_EXIST=true) will never trigger — users running --quant=bf16 or --quant=fp16 will now fall through to python hf_ptq.py --qformat=bf16 instead. Either replace $qformat with $QFORMAT in that check, or add a dedicated qformat="$QFORMAT" assignment here.
There was a problem hiding this comment.
Do we still need bf16/fp16 path anyway? Maybe we can deprecate them
There was a problem hiding this comment.
Not sure if we still have the use cases where we quantize fp32 to fp16.
There was a problem hiding this comment.
yeah I think we can delete. Let me add this to the PR
| @@ -0,0 +1,48 @@ | |||
| # SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. | |||
There was a problem hiding this comment.
Bot comment.
Stylistic nit: sibling recipes in this directory follow the <numerics>-kv_<kv_fmt> naming pattern (e.g. nvfp4_experts_only-kv_fp8.yaml, nvfp4_default-kv_fp8_cast.yaml). This file uses _mse-fp8_cast_kv which flips the KV descriptor order. Consider renaming to something like nvfp4_experts_only_mse-kv_fp8_cast.yaml for consistency with the existing convention. Same for nvfp4_mlp_only_mse-fp8_cast_kv.yaml.
- huggingface_example.sh: remove the bf16/fp16 shortcut block. The lowercase
$qformat variable it referenced was bound by the `for qformat in $QFORMAT;
do … done` whitelist loop deleted in the parent commit, so the shortcut
became dead code (regression flagged by reviewers). Per consensus on the
PR, drop the path entirely rather than reintroduce the binding — the
bf16/fp16 case still goes through `python hf_ptq.py --qformat=...` and the
TRT-LLM whitelist on line 210 already covers it.
- huggingface_example.sh: quote $MODEL_PATH in the basename calls used to
build MODEL_NAME (CodeRabbit shellcheck SC2086).
- nvfp4_mlp_only_mse-kv_fp8_cast.yaml: tighten metadata description from
"all linear layers" to "MLP/MoE linear layers" — quantization is only
enabled for *mlp* / *block_sparse_moe* / *.experts.* patterns.
- Rename both recipes to match the sibling `<numerics>_<scope>-kv_<kv_fmt>`
convention used by `nvfp4_default-kv_fp8_cast.yaml` etc.:
nvfp4_experts_only_mse-fp8_cast_kv.yaml
→ nvfp4_experts_only_mse-kv_fp8_cast.yaml
nvfp4_mlp_only_mse-fp8_cast_kv.yaml
→ nvfp4_mlp_only_mse-kv_fp8_cast.yaml
Pre-commit hook check-modelopt-recipes was skipped (same env-broken
torchvision issue as the prior commits); the renamed recipes were verified
independently with tools/precommit/check_modelopt_recipes.py against the
working-tree modelopt — both load and produce the expected effective
config.
Signed-off-by: Chenjie Luo <chenjiel@nvidia.com>
There was a problem hiding this comment.
♻️ Duplicate comments (1)
examples/llm_ptq/scripts/huggingface_example.sh (1)
165-169:⚠️ Potential issue | 🟡 Minor | ⚡ Quick win
QUANT_SPEC_ARGSstill a plain string — unquoted expansion risks word-splitting.
QUANT_SPEC_ARGSis assigned as a plain string and then expanded unquoted on line 174. ShellCheck flags this (SC2086). If$RECIPEcontains spaces or special characters, the argument will be split incorrectly. Convert to a Bash array so the expansion is safe.🛠️ Proposed fix — array syntax
- if [ -n "$RECIPE" ]; then - QUANT_SPEC_ARGS="--recipe=$RECIPE" - else - QUANT_SPEC_ARGS="--qformat=${QFORMAT// /,}" - fi + if [ -n "$RECIPE" ]; then + QUANT_SPEC_ARGS=(--recipe="$RECIPE") + else + QUANT_SPEC_ARGS=(--qformat="${QFORMAT// /,}") + fi python hf_ptq.py \ --pyt_ckpt_path=$MODEL_PATH \ --export_path=$SAVE_PATH \ --sparsity_fmt=$SPARSITY_FMT \ - $QUANT_SPEC_ARGS \ + "${QUANT_SPEC_ARGS[@]}" \ --calib_size=$CALIB_SIZE \Also applies to: 174-174
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@examples/llm_ptq/scripts/huggingface_example.sh` around lines 165 - 169, QUANT_SPEC_ARGS is built as a plain string which risks word-splitting when expanded; change it to a Bash array and quote expansions: set QUANT_SPEC_ARGS as an array in the branches (e.g., QUANT_SPEC_ARGS=(--recipe "$RECIPE") or QUANT_SPEC_ARGS=(--qformat "${QFORMAT// /,}")) and later invoke it with "${QUANT_SPEC_ARGS[@]}" where it is expanded (the current unquoted expansion on the line that uses QUANT_SPEC_ARGS should be replaced with the array expansion).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Duplicate comments:
In `@examples/llm_ptq/scripts/huggingface_example.sh`:
- Around line 165-169: QUANT_SPEC_ARGS is built as a plain string which risks
word-splitting when expanded; change it to a Bash array and quote expansions:
set QUANT_SPEC_ARGS as an array in the branches (e.g., QUANT_SPEC_ARGS=(--recipe
"$RECIPE") or QUANT_SPEC_ARGS=(--qformat "${QFORMAT// /,}")) and later invoke it
with "${QUANT_SPEC_ARGS[@]}" where it is expanded (the current unquoted
expansion on the line that uses QUANT_SPEC_ARGS should be replaced with the
array expansion).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 6278d544-daa2-4b0f-8ff8-b5a53b6b86d8
📒 Files selected for processing (3)
examples/llm_ptq/scripts/huggingface_example.shmodelopt_recipes/general/ptq/nvfp4_experts_only_mse-kv_fp8_cast.yamlmodelopt_recipes/general/ptq/nvfp4_mlp_only_mse-kv_fp8_cast.yaml
✅ Files skipped from review due to trivial changes (2)
- modelopt_recipes/general/ptq/nvfp4_experts_only_mse-kv_fp8_cast.yaml
- modelopt_recipes/general/ptq/nvfp4_mlp_only_mse-kv_fp8_cast.yaml
meenchen
left a comment
There was a problem hiding this comment.
Bot review — DM the bot to share feedback.
Re-review: all critical previous comments have been addressed.
- Regression with
$qformatbf16/fp16 shortcut: Fixed by removing the block entirely (author and meenchen agreed to deprecate; no use case for bf16/fp16 "quantize" today). - Naming convention: Files renamed to
nvfp4_experts_only_mse-kv_fp8_cast.yamlandnvfp4_mlp_only_mse-kv_fp8_cast.yaml, matching the sibling*-kv_fp8_castconvention. mlp_onlydescription wording: Updated from "all linear layers" to "MLP/MoE linear layers".- Basename quoting:
$MODEL_PATHand$RECIPEare now quoted in the new basename calls.
Remaining CodeRabbit nit (QUANT_SPEC_ARGS as plain string instead of a Bash array) is stylistic; recipe paths are repo-relative without spaces, and the prior script already uses unquoted expansions elsewhere. Not a blocker.
Licensing: new YAML files carry the standard NVIDIA Apache-2.0 header; no licensing concern. Recipes are covered by the existing check-modelopt-recipes loader test in CI.
|
Summary
use_constant_amax: true(skips KV calibration; matches thenvfp4_default-fp8_cast_kvcontract):modelopt_recipes/general/ptq/nvfp4_experts_only_mse-fp8_cast_kv.yaml— applies to*mlp.experts*/*block_sparse_moe*only.modelopt_recipes/general/ptq/nvfp4_mlp_only_mse-fp8_cast_kv.yaml— applies to all*mlp*/*block_sparse_moe*(dense MLP + MoE).--recipeflag throughexamples/llm_ptq/scripts/parser.shandhuggingface_example.sh. Either--quantor--recipeis required; passing both errors out. Recipe names are not validated in the script —hf_ptq.pyis the source of truth.qformatwhitelist case-statement inhuggingface_example.shfor the same reason.Files
New recipes (
modelopt_recipes/general/ptq/):nvfp4_experts_only_mse-fp8_cast_kv.yaml— same patterns asnvfp4_experts_only-fp8_kv.yaml.nvfp4_mlp_only_mse-fp8_cast_kv.yaml— same patterns asnvfp4_mlp_only-fp8_kv.yaml.Both differ from their
_kvsiblings by:algorithm: max→{ method: mse, fp8_scale_sweep: true, layerwise: false }type: dynamic→type: static(otherwisemse_calibrateskips them: only static block-quant weight quantizers are recognized for the FP8 sweep — seemodel_calib.py:369-374).use_constant_amax: true(the_cast_kvflavor).Scripts (
examples/llm_ptq/scripts/):parser.sh— adds--recipelong-option, defaultRECIPE="", validates one-of-{--quant,--recipe} and not-both.huggingface_example.sh— whenRECIPEis set, derivesMODEL_NAMEfrom the recipe basename, passes--recipe=…tohf_ptq.pyinstead of--qformat=…, and exits after export with a TRT-LLM deployment hint (recipes can produce arbitrary configs that the script's downstreamrun_tensorrt_llm.pypath doesn't know how to handle generically). Drops theqformatwhitelist; defers tohf_ptq.py.Behavior
Test plan
experts_only_mse-fp8_cast_kvloads viamodelopt.recipe.load_recipe(...)and produces the expected algorithm + per-patternquant_cfg(verified in a working env:algorithm == {'method': 'mse', 'fp8_scale_sweep': True, 'layerwise': False}; expert weight quantizerstype: static; KV bmm hasuse_constant_amax: True).--quant, only--recipe) all behave as designed.Note
Pre-commit hook
check-modelopt-recipeswas skipped on both commits because the local conda env has a brokentorchvisioninstall (AttributeError: partially initialized module 'torchvision' has no attribute 'extension') that preventsfrom modelopt.recipe.loader import load_recipe. Theexperts_onlyrecipe was validated independently by runningtools/precommit/check_modelopt_recipes.pyin a working environment (exits 0); themlp_onlyone is the same shape with a different glob.Rebased onto
mainfrom #1391 (which targetedchenjiel/nvfp4-fp8-sweep-triton). The diff is scoped to the recipes + script wiring; no kernel/sweep changes are included here.🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
--recipeCLI option.Configuration
--quantand--recipeoptions are now mutually exclusive; specify one to configure quantization behavior.