Merge any_model tutorial by danielkorzekwa · Pull Request #1035 · NVIDIA/Model-Optimizer

danielkorzekwa · 2026-03-13T15:25:45Z

What does this PR do?

Merge any_model tutorial for Puzzletron.

Summary by CodeRabbit

New Features
- MIP sweep mode for multi-rate memory-compression searches
- Triton-ready HuggingFace deployable, lm-eval adapter, and Ray-compatible inference pathways
- Megatron-Bridge distillation CLI/workflow with optional HF export
New Configurations
- Extensive pruning/memory-sweep profiles for GPT‑Oss, Llama, Mistral, Nemotron, Qwen families
Documentation
- GptOss guide and conversion example, expanded READMEs, MIP Quick Start, NeMo evaluator notes
Chores
- Requirements updated (lm-eval bumped; math-verify, ray added)

- Add converter, model_descriptor, puzzformer, and llama model support - Selective merge of anymodel functionality Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

…s merged) Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

…tion_scoring

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

…tion_scoring

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

…tion_scoring

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

…tion_scoring

…nymodel_pruning

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

…reproducible on CI) Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

…dapted from https://github.com/EleutherAI/lm-evaluation-harness Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

…el_gptoss

…rial

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

…rial

coderabbitai

Actionable comments posted: 18

🧹 Nitpick comments (10)

examples/puzzletron/evaluation/nemo_evaluator_instructions.md (1)
24-24: Consider documenting the docker image version requirement.

The docker image version nvcr.io/nvidia/nemo:26.02 is hardcoded. As container versions evolve, this specific version may become outdated or incompatible with newer features.
📝 Suggested documentation improvement
-export DOCKER_IMAGE=nvcr.io/nvidia/nemo:26.02
+# Use a NeMo container version compatible with your Model-Optimizer version
+# Version 26.02 or later is recommended
+export DOCKER_IMAGE=nvcr.io/nvidia/nemo:26.02
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/puzzletron/evaluation/nemo_evaluator_instructions.md` at line 24,
The DOCKER_IMAGE variable is hardcoded to nvcr.io/nvidia/nemo:26.02; update the
instructions to document the version requirement and allow overrides by
documenting that DOCKER_IMAGE can be set by the user (i.e., reference the
DOCKER_IMAGE variable) rather than assuming the specific tag, add a short note
explaining compatibility (which NeMo/GPUs/CUDA combinations the tag targets) and
point readers to NVIDIA NGC or the project's compatibility matrix for the latest
recommended tag so maintainers and users can update the image safely.
examples/puzzletron/configs/qwen3-8b_pruneffn_memory/qwen3_8b_pruneffn_memory.yaml (1)
14-17: Consider adding the optional sweep configuration for consistency.

The llama-3_1-8B_pruneffn_memory.yaml includes an optional mip.sweep configuration block. For feature parity and user convenience, you may want to add the same block here.
💡 Optional: Add sweep configuration
 # MIP memory constraint (in MiB) 
 mip:
   human_constraints:
     target_memory: 78_000 # 78 GiB
+  # Memory sweep configuration (optional)
+  sweep:
+    enabled: false
+    memory_compression_rates: [0.5, 0.6, 0.7, 0.8, 0.9]
+    output_csv: ${puzzle_dir}/mip_sweep_results.csv
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@examples/puzzletron/configs/qwen3-8b_pruneffn_memory/qwen3_8b_pruneffn_memory.yaml`
around lines 14 - 17, Add an optional mip.sweep block to this config to match
the other file: update the mip section (which currently contains
mip.human_constraints.target_memory) to also include a mip.sweep configuration
with the same keys/structure used in llama-3_1-8B_pruneffn_memory.yaml so users
get feature parity and can run parameter sweeps; locate the mip section and
append the sweep block using the same option names as in the reference config.
examples/puzzletron/requirements.txt (1)
1-3: Pin versions for math-verify and ray for reproducibility.

Both math-verify and ray lack version pinning, which may lead to non-reproducible builds across different installations and CI runs. Consider pinning them to specific versions (e.g., math-verify==0.9.0 and ray==2.54.0 or similar stable releases) to ensure consistent behavior.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/puzzletron/requirements.txt` around lines 1 - 3, Pin the unpinned
dependencies by updating the requirements entry for math-verify and ray to
specific versions (e.g., change math-verify to math-verify==0.9.0 and ray to
ray==2.54.0) so installs are reproducible; make this change alongside the
existing lm-eval==0.4.10 line in the same requirements list to ensure
deterministic builds and CI behavior.
examples/puzzletron/mbridge_distillation/distill_hf.py (1)
286-312: Consider guarding HF export against race conditions.

The export logic destroys the process group before rank 0 exports. While the barrier at line 290 synchronizes completion, if export_to_hf_and_copy_config fails partway through on rank 0, other ranks have already destroyed their process group and exited cleanly—no indication of partial failure propagates to them.

This is acceptable for a script but worth noting for robustness. The current error handling (lines 310-312) logs the failure but doesn't propagate it as a non-zero exit code.
💡 Optional: Re-raise exception after logging to signal failure
             except Exception as e:
                 print(f"⚠️  Export failed: {e}")
                 traceback.print_exc()
+                raise  # Propagate failure as non-zero exit code
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/puzzletron/mbridge_distillation/distill_hf.py` around lines 286 -
312, The export step can silently fail on rank 0 after other ranks destroy their
process group; update the block around export_to_hf_and_copy_config so that if
export_to_hf_and_copy_config throws on is_rank_0 you both log the error and
propagate failure (e.g., re-raise the exception or call sys.exit(non_zero)) so
the process ends with a non-zero exit code; ensure this change references the
existing is_rank_0 check, the export_to_hf_and_copy_config call, and the
torch.distributed.destroy_process_group call so the re-raise/exit happens after
logging but before the script exits.
examples/puzzletron/configs/nemotron-nano-12b-v2/pruning/hidden_dim_pruning.yaml (1)
1-15: LGTM with TODO comments noted.

The configuration is well-structured for hidden dimension pruning. The TODO comments on lines 13-14 indicate planned improvements for mlp_init_mode and gqa_init_mode—consider creating tracking issues to ensure these are addressed.

Would you like me to open issues to track the TODO items for CopyAsIs/FromTeacher support in mlp_init_mode and gqa_init_mode?
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@examples/puzzletron/configs/nemotron-nano-12b-v2/pruning/hidden_dim_pruning.yaml`
around lines 1 - 15, Create two tracking issues for the TODOs: one titled
"Support CopyAsIs/FromTeacher for mlp_init_mode" and another "Support
CopyAsIs/FromTeacher for gqa_init_mode"; each issue should reference the TODO in
the pruning config, describe current behavior (mlp_init_mode: Truncate,
gqa_init_mode: AverageKV), enumerate desired behavior (support CopyAsIs and
FromTeacher initialization modes), list acceptance criteria (unit/integration
test that verifies correct weight/shape handling and comparable performance to
teacher for each mode), propose implementation notes (where
pruning/initialization logic lives and relevant functions/classes to modify),
and add labels like enhancement and pruning. Ensure to link or mention the
config keys mlp_init_mode and gqa_init_mode and assign to the pruning/ML team or
leave unassigned if unknown.
examples/puzzletron/configs/nemotron-nano-12b-v2/nemotron_nano_12b_v2_pruneffn_memory.yaml (1)
6-12: Consider env-overridable defaults for paths.

Hardcoded absolute paths make this tutorial config less portable outside the default container layout.
Proposed refactor
-input_hf_model_path: /workspace/hf_models/nvidia/Nemotron-Nano-12B-v2
+input_hf_model_path: ${oc.env:PUZZLETRON_INPUT_HF_MODEL_PATH,/workspace/hf_models/nvidia/Nemotron-Nano-12B-v2}
@@
-dataset_path: /workspace/datasets/Nemotron-Post-Training-Dataset-v2
+dataset_path: ${oc.env:PUZZLETRON_DATASET_PATH,/workspace/datasets/Nemotron-Post-Training-Dataset-v2}
@@
-puzzle_dir: /workspace/puzzle_dir
+puzzle_dir: ${oc.env:PUZZLETRON_PUZZLE_DIR,/workspace/puzzle_dir}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@examples/puzzletron/configs/nemotron-nano-12b-v2/nemotron_nano_12b_v2_pruneffn_memory.yaml`
around lines 6 - 12, Replace the hardcoded absolute paths by reading from
environment variables with sensible defaults so the config is portable: change
input_hf_model_path, dataset_path, and puzzle_dir to use env-overridable values
(e.g., check PROCESS env vars like INPUT_HF_MODEL_PATH, DATASET_PATH, PUZZLE_DIR
or similar) and fall back to the current absolute paths only if the env var is
unset; update any code or loader that parses this YAML to prefer process.env.*
values before the YAML defaults so existing keys (input_hf_model_path,
dataset_path, puzzle_dir) remain the canonical identifiers but become
configurable via environment variables.
examples/puzzletron/configs/qwen2_5_7b_instruct_pruneffn_memory/qwen2_5_7b_instruct_pruneffn_memory.yaml (1)
14-17: Minor: Comment states 78 GiB but value is 78,000 MiB (~76.2 GiB).

The comment says "78 GiB" but 78_000 MiB equals approximately 76.17 GiB (78000 / 1024). Consider updating the comment for accuracy or adjusting the value to 79872 MiB for exactly 78 GiB.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@examples/puzzletron/configs/qwen2_5_7b_instruct_pruneffn_memory/qwen2_5_7b_instruct_pruneffn_memory.yaml`
around lines 14 - 17, The comment for the MIP memory target is inconsistent:
`mip.human_constraints.target_memory` is set to 78_000 (MiB ≈ 76.17 GiB) but the
comment says "78 GiB"; either change the comment to reflect ~76.2 GiB or set
`target_memory` to 79_872 (MiB) to represent exactly 78 GiB (78*1024). Update
the comment or the `target_memory` value accordingly and ensure the keys `mip`,
`human_constraints`, and `target_memory` remain unchanged.
examples/puzzletron/configs/mistral-small-24b-instruct-2501_pruneffn_memory/mistral-small-24b-instruct-2501_pruneffn_memory.yaml (1)
14-17: Minor: Comment states 234 GiB but value is 234,000 MiB (~228.5 GiB).

Similar to the qwen config, 234_000 MiB equals approximately 228.5 GiB, not 234 GiB. Consider updating for accuracy or adjusting the value to 239616 MiB for exactly 234 GiB.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@examples/puzzletron/configs/mistral-small-24b-instruct-2501_pruneffn_memory/mistral-small-24b-instruct-2501_pruneffn_memory.yaml`
around lines 14 - 17, The comment and value disagree:
mip.human_constraints.target_memory is set to 234_000 MiB (~228.5 GiB) but the
comment claims 234 GiB; update either the comment or the value. Edit the YAML
entry for target_memory (mip.human_constraints.target_memory) to use 239616
(MiB) if you want exactly 234 GiB, or change the comment to the correct
approximate GiB for 234_000 MiB (≈228.5 GiB).
examples/puzzletron/configs/qwen2_5_7b_instruct_pruneffn_memory/pruning/ffn_pruning.yaml (1)
17-18: Verify default intermediate_size_list value.

The default intermediate_size_list: [256] is extremely small relative to the teacher size (14336). While the parent config overrides this with [4096, 7808, 11520, 15104], this default could cause issues if used directly. Consider whether this should be a more reasonable default or explicitly marked as a required override.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@examples/puzzletron/configs/qwen2_5_7b_instruct_pruneffn_memory/pruning/ffn_pruning.yaml`
around lines 17 - 18, The default intermediate_size_list ([256]) in this pruning
config is orders of magnitude smaller than teacher_intermediate_size (14336) and
risks misconfiguration; update the file so intermediate_size_list is set to a
sensible default (e.g., a set of progressive sizes closer to teacher size like
[4096,7808,11520,15104]) or make it explicitly required to override by adding a
clear sentinel (e.g., null or an empty array) and a comment indicating it must
be overridden; modify the symbol intermediate_size_list in this YAML and leave
mlp_init_mode unchanged (PruneByActivationsLog) so downstream code that reads
intermediate_size_list will either receive a sensible default or fail-fast
prompting an override.
examples/puzzletron/configs/mistral-small-24b-instruct-2501_pruneffn_memory/pruning/hidden_dim_pruning.yaml (1)
14-15: Noted: TODO comments for init mode compatibility.

The TODO comments indicate that mlp_init_mode and gqa_init_mode need work to support CopyAsIs/FromTeacher. Consider creating tracking issues for these if they're planned improvements.

Would you like me to open issues to track these TODO items?
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@examples/puzzletron/configs/mistral-small-24b-instruct-2501_pruneffn_memory/pruning/hidden_dim_pruning.yaml`
around lines 14 - 15, Create tracking issues for the TODOs so the missing
init-mode support is captured and referenced: open one issue for mlp_init_mode
(currently "Truncate") to add compatibility with CopyAsIs and FromTeacher and
one issue for gqa_init_mode (currently "AverageKV") to add the same support; in
each issue include current config key (mlp_init_mode / gqa_init_mode), current
value, desired behavior for CopyAsIs and FromTeacher, implementation notes
(where init logic lives and tests needed), and a short checklist; then replace
the inline TODO comments in the YAML with brief references to the new issue IDs
so future readers can find the work item.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@examples/puzzletron/configs/gptoss-20b_remove_experts_memory/gptoss-20b_remove_experts_memory.yaml`:
- Line 17: The target_memory value and its comment disagree: update the
target_memory entry (symbol name: target_memory) so the numeric MiB matches the
annotated 45 GiB or adjust the comment to the intended MiB; either set
target_memory to 46_080 (45 GiB in MiB) to keep the comment, or change the
comment to match 16_000 MiB (≈15.6 GiB) so they are consistent.

In
`@examples/puzzletron/configs/gptoss-20b_remove_experts_memory/pruning/ffn_pruning.yaml`:
- Around line 13-15: The YAML sets activation_hooks_kwargs to null which will
cause an AttributeError when the hook (e.g.,
modelopt.torch.nas.plugins.megatron_hooks.base_hooks.RankedChoiceVotingHook)
calls .get(); change activation_hooks_kwargs to an explicit empty mapping
(activation_hooks_kwargs: {}) so the hook receives a dict even when no extra
args are provided.

In
`@examples/puzzletron/configs/gptoss-20b_remove_experts_memory/validate_model_defaults.yaml`:
- Around line 11-12: The YAML mapping for source_datasets_to_discard is broken
because its child key varlen is not indented, making source_datasets_to_discard
null and varlen a top-level key; fix by nesting varlen (and any other child
keys) under source_datasets_to_discard with proper indentation so that
source_datasets_to_discard contains the varlen entry (e.g., indent varlen one
level beneath source_datasets_to_discard).

In
`@examples/puzzletron/configs/gptoss-20b_remove_experts_memory/validate_solutions_defaults.yaml`:
- Around line 5-10: The YAML block "solutions_to_validate" is not nested so its
fields become root-level and the block is null; fix by indenting the keys
(skip_validation, save_models, bigger_is_better, sort_solutions_by,
calculate_full_score_ablations) under the "solutions_to_validate" mapping so
they are children of that key (ensure consistent spacing—e.g., two spaces—or
follow the repo's YAML indentation convention) while preserving the existing key
names and boolean values.

In `@examples/puzzletron/configs/llama-3_1-8B_pruneffn_memory/Llama-3_1-8B.yaml`:
- Line 13: Fix the typo in the comment next to the config key dataset_path:
change "ppath" to "path" so the line reads a correct inline comment referring to
the path to Nemotron-Post-Training-Dataset-v2; update the comment text in the
Llama-3_1-8B.yaml entry for dataset_path accordingly.

In
`@examples/puzzletron/configs/llama-3_2-3B_pruneffn_memory/validate_model_defaults.yaml`:
- Around line 11-12: The YAML key varlen is not indented under
source_datasets_to_discard; update the validate_model_defaults.yaml so that
varlen is a child of source_datasets_to_discard (e.g., indent varlen under
source_datasets_to_discard with the same indentation style used elsewhere) so
the keys source_datasets_to_discard and varlen form a proper mapping.

In
`@examples/puzzletron/configs/llama-3_2-3B_pruneffn_memory/validate_solutions_defaults.yaml`:
- Around line 5-10: The YAML keys skip_validation, save_models,
bigger_is_better, sort_solutions_by, and calculate_full_score_ablations are
currently at the root instead of nested under the solutions_to_validate mapping;
move these keys so they are children of solutions_to_validate (i.e., indent them
under the solutions_to_validate key) so the parser reads them as part of that
block and not top-level entries.

In
`@examples/puzzletron/configs/mistral-small-24b-instruct-2501_pruneffn_memory/validate_solutions_defaults.yaml`:
- Around line 5-10: The YAML keys after solutions_to_validate are incorrectly at
root level; indent the properties (skip_validation, save_models,
bigger_is_better, sort_solutions_by, calculate_full_score_ablations) two spaces
so they become children of solutions_to_validate (i.e., nest them under the
solutions_to_validate mapping) to restore the intended schema.

In
`@examples/puzzletron/configs/nemotron-nano-12b-v2/validate_model_defaults.yaml`:
- Around line 11-12: The key varlen is mistakenly at the top level instead of
nested under source_datasets_to_discard, which leaves source_datasets_to_discard
null; fix it by indenting varlen so it becomes a child of
source_datasets_to_discard (i.e., move the varlen line under
source_datasets_to_discard with proper indentation) so the YAML key
source_datasets_to_discard contains the varlen entry.

In
`@examples/puzzletron/configs/nemotron-nano-12b-v2/validate_solutions_defaults.yaml`:
- Around line 5-10: The YAML keys skip_validation, save_models,
bigger_is_better, sort_solutions_by, and calculate_full_score_ablations are
currently at the root but should be nested under the solutions_to_validate
mapping; fix by indenting those keys under solutions_to_validate (e.g., two
spaces) so the block reads as a single mapping for solutions_to_validate and
produces valid YAML structure.

In
`@examples/puzzletron/configs/qwen2_5_7b_instruct_pruneffn_memory/pruning/attn_pruning.yaml`:
- Line 4: The activations_log_dir interpolation uses the long path
${modelopt.torch.puzzletron.pruning.activation_hooks_kwargs.method} and
${modelopt.torch.puzzletron.pruning.experiment_id} which is inconsistent with
other pruning configs; update the keys in attn_pruning.yaml (and mirror the same
change in hidden_dim_pruning.yaml) to use the shorter Hydra interpolation
${pruning.activation_hooks_kwargs.method} and ${pruning.experiment_id} so they
match ffn_pruning.yaml and the rest of the pruning configs.

In
`@examples/puzzletron/configs/qwen2_5_7b_instruct_pruneffn_memory/validate_model_defaults.yaml`:
- Around line 11-12: The YAML has incorrect indentation: the key varlen should
be nested under source_datasets_to_discard but is currently a top-level key;
update the file so that varlen is indented as a child of
source_datasets_to_discard (i.e., make varlen a nested key under
source_datasets_to_discard) to ensure correct parsing and that the
source_datasets_to_discard mapping contains the varlen entry.

In
`@examples/puzzletron/configs/qwen2_5_7b_instruct_pruneffn_memory/validate_solutions_defaults.yaml`:
- Around line 5-10: The keys skip_validation, save_models, bigger_is_better,
sort_solutions_by, and calculate_full_score_ablations are at the top level
instead of being nested under solutions_to_validate, making
cfg.scoring.solutions_to_validate null; fix by indenting those keys beneath
solutions_to_validate in validate_solutions_defaults.yaml so the block becomes a
proper mapping (preserve existing boolean/string values), which will restore
cfg.scoring.solutions_to_validate for consumers like scoring.py.

In
`@examples/puzzletron/configs/qwen3-8b_pruneffn_memory/validate_solutions_defaults.yaml`:
- Around line 5-10: The YAML block for solutions_to_validate is malformed
because its child keys are not nested; indent the keys skip_validation,
save_models, bigger_is_better, sort_solutions_by, and
calculate_full_score_ablations two spaces under the parent key
solutions_to_validate so they become properties of that map (i.e., make them
children of solutions_to_validate instead of top-level keys).

In `@examples/puzzletron/evaluation/hf_deployable_anymodel.py`:
- Around line 353-364: The get_triton_input function contains a duplicated
Tensor input named "max_length"; remove the redundant Tensor(name="max_length",
...) entry so each Triton input name is unique and the intended max_length
definition remains (verify you keep the correct dtype/optional settings in
get_triton_input and update the inputs tuple accordingly).

In `@examples/puzzletron/evaluation/nemo_evaluator_instructions.md`:
- Line 32: Replace the hardcoded Python version in the Docker mount path string
"-v ${MODELOPT_DIR}/modelopt:/opt/venv/lib/python3.12/site-packages/modelopt"
with a robust approach: either document that users must confirm the container
Python version (mentioning ${MODELOPT_DIR} and the target path) or change the
instructions to compute/parameterize the Python minor version (e.g., use a
variable like PYTHON_VERSION or detect it at runtime) and use
"/opt/venv/lib/python${PYTHON_VERSION}/site-packages/modelopt" so the mount path
matches the container Python; update the README/example note near the mount to
explain how to set PYTHON_VERSION if you choose parameterization.

In `@examples/puzzletron/GPTOSS.md`:
- Line 7: Fix the typo "prunning" to "pruning" in the sentence inside
examples/puzzletron/GPTOSS.md (the line that currently reads "In the prunning
steps puzzle utilizes decompressed model..."); update the word to "pruning" so
the sentence becomes "In the pruning steps puzzle utilizes decompressed model
(back to BF16) for statistics and scores computation." Ensure only the
misspelled word is changed and punctuation/capitalization remains consistent.

In `@examples/puzzletron/mbridge_distillation/distill_hf.py`:
- Around line 180-182: Add a new boolean CLI argument --trust_remote_code to the
script's argument parsing and use its value when loading HuggingFace models: in
the _build_model_provider function pass trust_remote_code=args.trust_remote_code
to AutoBridge.from_hf_pretrained(hf_path). Update the argparse setup to define
--trust_remote_code (store_true/False default as appropriate) and ensure the
_build_model_provider call receives the parsed args so it can access
args.trust_remote_code.

---

Nitpick comments:
In
`@examples/puzzletron/configs/mistral-small-24b-instruct-2501_pruneffn_memory/mistral-small-24b-instruct-2501_pruneffn_memory.yaml`:
- Around line 14-17: The comment and value disagree:
mip.human_constraints.target_memory is set to 234_000 MiB (~228.5 GiB) but the
comment claims 234 GiB; update either the comment or the value. Edit the YAML
entry for target_memory (mip.human_constraints.target_memory) to use 239616
(MiB) if you want exactly 234 GiB, or change the comment to the correct
approximate GiB for 234_000 MiB (≈228.5 GiB).

In
`@examples/puzzletron/configs/mistral-small-24b-instruct-2501_pruneffn_memory/pruning/hidden_dim_pruning.yaml`:
- Around line 14-15: Create tracking issues for the TODOs so the missing
init-mode support is captured and referenced: open one issue for mlp_init_mode
(currently "Truncate") to add compatibility with CopyAsIs and FromTeacher and
one issue for gqa_init_mode (currently "AverageKV") to add the same support; in
each issue include current config key (mlp_init_mode / gqa_init_mode), current
value, desired behavior for CopyAsIs and FromTeacher, implementation notes
(where init logic lives and tests needed), and a short checklist; then replace
the inline TODO comments in the YAML with brief references to the new issue IDs
so future readers can find the work item.

In
`@examples/puzzletron/configs/nemotron-nano-12b-v2/nemotron_nano_12b_v2_pruneffn_memory.yaml`:
- Around line 6-12: Replace the hardcoded absolute paths by reading from
environment variables with sensible defaults so the config is portable: change
input_hf_model_path, dataset_path, and puzzle_dir to use env-overridable values
(e.g., check PROCESS env vars like INPUT_HF_MODEL_PATH, DATASET_PATH, PUZZLE_DIR
or similar) and fall back to the current absolute paths only if the env var is
unset; update any code or loader that parses this YAML to prefer process.env.*
values before the YAML defaults so existing keys (input_hf_model_path,
dataset_path, puzzle_dir) remain the canonical identifiers but become
configurable via environment variables.

In
`@examples/puzzletron/configs/nemotron-nano-12b-v2/pruning/hidden_dim_pruning.yaml`:
- Around line 1-15: Create two tracking issues for the TODOs: one titled
"Support CopyAsIs/FromTeacher for mlp_init_mode" and another "Support
CopyAsIs/FromTeacher for gqa_init_mode"; each issue should reference the TODO in
the pruning config, describe current behavior (mlp_init_mode: Truncate,
gqa_init_mode: AverageKV), enumerate desired behavior (support CopyAsIs and
FromTeacher initialization modes), list acceptance criteria (unit/integration
test that verifies correct weight/shape handling and comparable performance to
teacher for each mode), propose implementation notes (where
pruning/initialization logic lives and relevant functions/classes to modify),
and add labels like enhancement and pruning. Ensure to link or mention the
config keys mlp_init_mode and gqa_init_mode and assign to the pruning/ML team or
leave unassigned if unknown.

In
`@examples/puzzletron/configs/qwen2_5_7b_instruct_pruneffn_memory/pruning/ffn_pruning.yaml`:
- Around line 17-18: The default intermediate_size_list ([256]) in this pruning
config is orders of magnitude smaller than teacher_intermediate_size (14336) and
risks misconfiguration; update the file so intermediate_size_list is set to a
sensible default (e.g., a set of progressive sizes closer to teacher size like
[4096,7808,11520,15104]) or make it explicitly required to override by adding a
clear sentinel (e.g., null or an empty array) and a comment indicating it must
be overridden; modify the symbol intermediate_size_list in this YAML and leave
mlp_init_mode unchanged (PruneByActivationsLog) so downstream code that reads
intermediate_size_list will either receive a sensible default or fail-fast
prompting an override.

In
`@examples/puzzletron/configs/qwen2_5_7b_instruct_pruneffn_memory/qwen2_5_7b_instruct_pruneffn_memory.yaml`:
- Around line 14-17: The comment for the MIP memory target is inconsistent:
`mip.human_constraints.target_memory` is set to 78_000 (MiB ≈ 76.17 GiB) but the
comment says "78 GiB"; either change the comment to reflect ~76.2 GiB or set
`target_memory` to 79_872 (MiB) to represent exactly 78 GiB (78*1024). Update
the comment or the `target_memory` value accordingly and ensure the keys `mip`,
`human_constraints`, and `target_memory` remain unchanged.

In
`@examples/puzzletron/configs/qwen3-8b_pruneffn_memory/qwen3_8b_pruneffn_memory.yaml`:
- Around line 14-17: Add an optional mip.sweep block to this config to match the
other file: update the mip section (which currently contains
mip.human_constraints.target_memory) to also include a mip.sweep configuration
with the same keys/structure used in llama-3_1-8B_pruneffn_memory.yaml so users
get feature parity and can run parameter sweeps; locate the mip section and
append the sweep block using the same option names as in the reference config.

In `@examples/puzzletron/evaluation/nemo_evaluator_instructions.md`:
- Line 24: The DOCKER_IMAGE variable is hardcoded to nvcr.io/nvidia/nemo:26.02;
update the instructions to document the version requirement and allow overrides
by documenting that DOCKER_IMAGE can be set by the user (i.e., reference the
DOCKER_IMAGE variable) rather than assuming the specific tag, add a short note
explaining compatibility (which NeMo/GPUs/CUDA combinations the tag targets) and
point readers to NVIDIA NGC or the project's compatibility matrix for the latest
recommended tag so maintainers and users can update the image safely.

In `@examples/puzzletron/mbridge_distillation/distill_hf.py`:
- Around line 286-312: The export step can silently fail on rank 0 after other
ranks destroy their process group; update the block around
export_to_hf_and_copy_config so that if export_to_hf_and_copy_config throws on
is_rank_0 you both log the error and propagate failure (e.g., re-raise the
exception or call sys.exit(non_zero)) so the process ends with a non-zero exit
code; ensure this change references the existing is_rank_0 check, the
export_to_hf_and_copy_config call, and the
torch.distributed.destroy_process_group call so the re-raise/exit happens after
logging but before the script exits.

In `@examples/puzzletron/requirements.txt`:
- Around line 1-3: Pin the unpinned dependencies by updating the requirements
entry for math-verify and ray to specific versions (e.g., change math-verify to
math-verify==0.9.0 and ray to ray==2.54.0) so installs are reproducible; make
this change alongside the existing lm-eval==0.4.10 line in the same requirements
list to ensure deterministic builds and CI behavior.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 4082f6b4-39eb-404c-bd09-049134a788a5

📥 Commits

Reviewing files that changed from the base of the PR and between 67999eb and b47f846.

⛔ Files ignored due to path filters (1)

examples/puzzletron/mip_sweep_example.png is excluded by !**/*.png

📒 Files selected for processing (57)

examples/puzzletron/GPTOSS.md
examples/puzzletron/README.md
examples/puzzletron/configs/gptoss-20b_remove_experts_memory/gptoss-20b.yaml
examples/puzzletron/configs/gptoss-20b_remove_experts_memory/gptoss-20b_remove_experts_memory.yaml
examples/puzzletron/configs/gptoss-20b_remove_experts_memory/pruning/ffn_pruning.yaml
examples/puzzletron/configs/gptoss-20b_remove_experts_memory/pruning/pruning_defaults.yaml
examples/puzzletron/configs/gptoss-20b_remove_experts_memory/validate_model_defaults.yaml
examples/puzzletron/configs/gptoss-20b_remove_experts_memory/validate_solutions_defaults.yaml
examples/puzzletron/configs/llama-3_1-8B_pruneffn_memory/Llama-3_1-8B.yaml
examples/puzzletron/configs/llama-3_1-8B_pruneffn_memory/llama-3_1-8B_pruneffn_memory.yaml
examples/puzzletron/configs/llama-3_1-8B_pruneffn_memory/pruning/ffn_pruning.yaml
examples/puzzletron/configs/llama-3_1-8B_pruneffn_memory/pruning/pruning_defaults.yaml
examples/puzzletron/configs/llama-3_2-3B_pruneffn_memory/Llama-3_2-3B.yaml
examples/puzzletron/configs/llama-3_2-3B_pruneffn_memory/llama-3_2-3B_pruneffn_memory.yaml
examples/puzzletron/configs/llama-3_2-3B_pruneffn_memory/pruning/ffn_pruning.yaml
examples/puzzletron/configs/llama-3_2-3B_pruneffn_memory/pruning/pruning_defaults.yaml
examples/puzzletron/configs/llama-3_2-3B_pruneffn_memory/validate_model_defaults.yaml
examples/puzzletron/configs/llama-3_2-3B_pruneffn_memory/validate_solutions_defaults.yaml
examples/puzzletron/configs/mistral-small-24b-instruct-2501_pruneffn_memory/Mistral-Small-24B.yaml
examples/puzzletron/configs/mistral-small-24b-instruct-2501_pruneffn_memory/mistral-small-24b-instruct-2501_pruneffn_memory.yaml
examples/puzzletron/configs/mistral-small-24b-instruct-2501_pruneffn_memory/pruning/attn_pruning.yaml
examples/puzzletron/configs/mistral-small-24b-instruct-2501_pruneffn_memory/pruning/ffn_pruning.yaml
examples/puzzletron/configs/mistral-small-24b-instruct-2501_pruneffn_memory/pruning/hidden_dim_pruning.yaml
examples/puzzletron/configs/mistral-small-24b-instruct-2501_pruneffn_memory/pruning/pruning_defaults.yaml
examples/puzzletron/configs/mistral-small-24b-instruct-2501_pruneffn_memory/validate_model_defaults.yaml
examples/puzzletron/configs/mistral-small-24b-instruct-2501_pruneffn_memory/validate_solutions_defaults.yaml
examples/puzzletron/configs/nemotron-nano-12b-v2/nemotron_nano_12b_v2.yaml
examples/puzzletron/configs/nemotron-nano-12b-v2/nemotron_nano_12b_v2_pruneffn_memory.yaml
examples/puzzletron/configs/nemotron-nano-12b-v2/pruning/attn_pruning.yaml
examples/puzzletron/configs/nemotron-nano-12b-v2/pruning/ffn_pruning.yaml
examples/puzzletron/configs/nemotron-nano-12b-v2/pruning/hidden_dim_pruning.yaml
examples/puzzletron/configs/nemotron-nano-12b-v2/pruning/pruning_defaults.yaml
examples/puzzletron/configs/nemotron-nano-12b-v2/validate_model_defaults.yaml
examples/puzzletron/configs/nemotron-nano-12b-v2/validate_solutions_defaults.yaml
examples/puzzletron/configs/qwen2_5_7b_instruct_pruneffn_memory/pruning/attn_pruning.yaml
examples/puzzletron/configs/qwen2_5_7b_instruct_pruneffn_memory/pruning/ffn_pruning.yaml
examples/puzzletron/configs/qwen2_5_7b_instruct_pruneffn_memory/pruning/hidden_dim_pruning.yaml
examples/puzzletron/configs/qwen2_5_7b_instruct_pruneffn_memory/pruning/pruning_defaults.yaml
examples/puzzletron/configs/qwen2_5_7b_instruct_pruneffn_memory/qwen2_5_7b_instruct.yaml
examples/puzzletron/configs/qwen2_5_7b_instruct_pruneffn_memory/qwen2_5_7b_instruct_pruneffn_memory.yaml
examples/puzzletron/configs/qwen2_5_7b_instruct_pruneffn_memory/validate_model_defaults.yaml
examples/puzzletron/configs/qwen2_5_7b_instruct_pruneffn_memory/validate_solutions_defaults.yaml
examples/puzzletron/configs/qwen3-8b_pruneffn_memory/pruning/attn_pruning.yaml
examples/puzzletron/configs/qwen3-8b_pruneffn_memory/pruning/ffn_pruning.yaml
examples/puzzletron/configs/qwen3-8b_pruneffn_memory/pruning/hidden_dim_pruning.yaml
examples/puzzletron/configs/qwen3-8b_pruneffn_memory/pruning/pruning_defaults.yaml
examples/puzzletron/configs/qwen3-8b_pruneffn_memory/qwen3_8b.yaml
examples/puzzletron/configs/qwen3-8b_pruneffn_memory/qwen3_8b_pruneffn_memory.yaml
examples/puzzletron/configs/qwen3-8b_pruneffn_memory/validate_model_defaults.yaml
examples/puzzletron/configs/qwen3-8b_pruneffn_memory/validate_solutions_defaults.yaml
examples/puzzletron/evaluation/hf_deployable_anymodel.py
examples/puzzletron/evaluation/lm_eval_anymodel.py
examples/puzzletron/evaluation/nemo_evaluator_instructions.md
examples/puzzletron/main.py
examples/puzzletron/mbridge_distillation/README.md
examples/puzzletron/mbridge_distillation/distill_hf.py
examples/puzzletron/requirements.txt

coderabbitai · 2026-03-17T13:27:27Z

...es/puzzletron/configs/gptoss-20b_remove_experts_memory/gptoss-20b_remove_experts_memory.yaml

+# MIP memory constraint (in MiB)
+mip:
+  human_constraints:
+    target_memory: 16_000 # 45 GiB


⚠️ Potential issue | 🟠 Major

Align target_memory with the 45 GiB annotation.

Line 17 is inconsistent: 16_000 MiB is ~15.6 GiB, not 45 GiB. Please update either the value or the comment so the constraint is unambiguous.

Possible fix (if 16,000 MiB is intended)

- target_memory: 16_000 # 45 GiB + target_memory: 16_000 # ~15.6 GiB

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

target_memory: 16_000 # 45 GiB

target_memory: 16_000 # ~15.6 GiB

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@examples/puzzletron/configs/gptoss-20b_remove_experts_memory/gptoss-20b_remove_experts_memory.yaml` at line 17, The target_memory value and its comment disagree: update the target_memory entry (symbol name: target_memory) so the numeric MiB matches the annotated 45 GiB or adjust the comment to the intended MiB; either set target_memory to 46_080 (45 GiB in MiB) to keep the comment, or change the comment to match 16_000 MiB (≈15.6 GiB) so they are consistent.

coderabbitai · 2026-03-17T13:27:27Z

examples/puzzletron/configs/gptoss-20b_remove_experts_memory/pruning/ffn_pruning.yaml

+hook_class: ${get_object:modelopt.torch.nas.plugins.megatron_hooks.base_hooks.RankedChoiceVotingHook}
+activation_hooks_kwargs:    # Additional kwargs to pass to the hook init
+


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Verify whether activation_hooks_kwargs is ever unpacked as kwargs rg -n -C3 'activation_hooks_kwargs' rg -n -P '\*\*\s*[^,)\s]*activation_hooks_kwargs'

Repository: NVIDIA/Model-Optimizer

Length of output: 50380

Use an explicit empty mapping for activation_hooks_kwargs.

Line 14 currently resolves to null in YAML. Hook classes expect activation_hooks_kwargs to be a dict and call .get() method on it, which will fail with an AttributeError if the value is None. Change to activation_hooks_kwargs: {} for no extra args.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@examples/puzzletron/configs/gptoss-20b_remove_experts_memory/pruning/ffn_pruning.yaml` around lines 13 - 15, The YAML sets activation_hooks_kwargs to null which will cause an AttributeError when the hook (e.g., modelopt.torch.nas.plugins.megatron_hooks.base_hooks.RankedChoiceVotingHook) calls .get(); change activation_hooks_kwargs to an explicit empty mapping (activation_hooks_kwargs: {}) so the hook receives a dict even when no extra args are provided.

coderabbitai · 2026-03-17T13:27:27Z

examples/puzzletron/configs/gptoss-20b_remove_experts_memory/validate_model_defaults.yaml

+source_datasets_to_discard:
+varlen: false


⚠️ Potential issue | 🟠 Major

Fix YAML nesting for source_datasets_to_discard.

Line 11 starts a mapping, but Line 12 is not indented. That makes source_datasets_to_discard null and varlen a top-level key.

Proposed fix

source_datasets_to_discard: -varlen: false + varlen: false

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

source_datasets_to_discard:

varlen: false

source_datasets_to_discard:

varlen: false

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@examples/puzzletron/configs/gptoss-20b_remove_experts_memory/validate_model_defaults.yaml` around lines 11 - 12, The YAML mapping for source_datasets_to_discard is broken because its child key varlen is not indented, making source_datasets_to_discard null and varlen a top-level key; fix by nesting varlen (and any other child keys) under source_datasets_to_discard with proper indentation so that source_datasets_to_discard contains the varlen entry (e.g., indent varlen one level beneath source_datasets_to_discard).

coderabbitai · 2026-03-17T13:27:27Z

examples/puzzletron/configs/gptoss-20b_remove_experts_memory/validate_solutions_defaults.yaml

+solutions_to_validate:
+skip_validation: false
+save_models: false
+bigger_is_better: false
+sort_solutions_by:
+calculate_full_score_ablations: false


⚠️ Potential issue | 🟠 Major

solutions_to_validate block is not nested.

Lines 6-10 need to be indented under Line 5. Current structure makes solutions_to_validate null and moves all controls to root scope.

Proposed fix

solutions_to_validate: -skip_validation: false -save_models: false -bigger_is_better: false -sort_solutions_by: -calculate_full_score_ablations: false + skip_validation: false + save_models: false + bigger_is_better: false + sort_solutions_by: + calculate_full_score_ablations: false

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

solutions_to_validate:

skip_validation: false

save_models: false

bigger_is_better: false

sort_solutions_by:

calculate_full_score_ablations: false

solutions_to_validate:

skip_validation: false

save_models: false

bigger_is_better: false

sort_solutions_by:

calculate_full_score_ablations: false

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@examples/puzzletron/configs/gptoss-20b_remove_experts_memory/validate_solutions_defaults.yaml` around lines 5 - 10, The YAML block "solutions_to_validate" is not nested so its fields become root-level and the block is null; fix by indenting the keys (skip_validation, save_models, bigger_is_better, sort_solutions_by, calculate_full_score_ablations) under the "solutions_to_validate" mapping so they are children of that key (ensure consistent spacing—e.g., two spaces—or follow the repo's YAML indentation convention) while preserving the existing key names and boolean values.

coderabbitai · 2026-03-17T13:27:27Z

examples/puzzletron/configs/llama-3_1-8B_pruneffn_memory/Llama-3_1-8B.yaml

+descriptor: llama
 teacher_dir: ${puzzle_dir}/ckpts/teacher/
 replacement_library_path: ${puzzle_dir}/replacement_library.json
 dataset_path: ??? # ppath to Nemotron-Post-Training-Dataset-v2


⚠️ Potential issue | 🟡 Minor

Minor typo in comment.

"ppath" should be "path".

📝 Proposed fix

-dataset_path: ??? # ppath to Nemotron-Post-Training-Dataset-v2 +dataset_path: ??? # path to Nemotron-Post-Training-Dataset-v2

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

dataset_path: ??? # ppath to Nemotron-Post-Training-Dataset-v2

dataset_path: ??? # path to Nemotron-Post-Training-Dataset-v2

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@examples/puzzletron/configs/llama-3_1-8B_pruneffn_memory/Llama-3_1-8B.yaml` at line 13, Fix the typo in the comment next to the config key dataset_path: change "ppath" to "path" so the line reads a correct inline comment referring to the path to Nemotron-Post-Training-Dataset-v2; update the comment text in the Llama-3_1-8B.yaml entry for dataset_path accordingly.

coderabbitai · 2026-03-17T13:27:28Z

examples/puzzletron/configs/qwen3-8b_pruneffn_memory/validate_solutions_defaults.yaml

+solutions_to_validate:
+skip_validation: false
+save_models: false
+bigger_is_better: false
+sort_solutions_by:
+calculate_full_score_ablations: false


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat -n examples/puzzletron/configs/qwen3-8b_pruneffn_memory/validate_solutions_defaults.yaml

Repository: NVIDIA/Model-Optimizer

Length of output: 332

YAML indentation error: solutions_to_validate block keys must be nested.

The keys skip_validation, save_models, bigger_is_better, sort_solutions_by, and calculate_full_score_ablations (lines 6–10) are not indented under the solutions_to_validate key, causing them to be parsed as root-level keys instead of nested properties. Add 2-space indentation to all these lines.

Proposed fix

solutions_to_validate: -skip_validation: false -save_models: false -bigger_is_better: false -sort_solutions_by: -calculate_full_score_ablations: false + skip_validation: false + save_models: false + bigger_is_better: false + sort_solutions_by: + calculate_full_score_ablations: false

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

solutions_to_validate:

skip_validation: false

save_models: false

bigger_is_better: false

sort_solutions_by:

calculate_full_score_ablations: false

solutions_to_validate:

skip_validation: false

save_models: false

bigger_is_better: false

sort_solutions_by:

calculate_full_score_ablations: false

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@examples/puzzletron/configs/qwen3-8b_pruneffn_memory/validate_solutions_defaults.yaml` around lines 5 - 10, The YAML block for solutions_to_validate is malformed because its child keys are not nested; indent the keys skip_validation, save_models, bigger_is_better, sort_solutions_by, and calculate_full_score_ablations two spaces under the parent key solutions_to_validate so they become properties of that map (i.e., make them children of solutions_to_validate instead of top-level keys).

coderabbitai · 2026-03-17T13:27:28Z

examples/puzzletron/evaluation/hf_deployable_anymodel.py

+            Tensor(name="prompts", shape=(-1,), dtype=bytes),
+            Tensor(name="max_length", shape=(-1,), dtype=np.int_, optional=True),
+            Tensor(name="max_batch_size", shape=(-1,), dtype=np.int_, optional=True),
+            Tensor(name="top_k", shape=(-1,), dtype=np.int_, optional=True),
+            Tensor(name="top_p", shape=(-1,), dtype=np.single, optional=True),
+            Tensor(name="temperature", shape=(-1,), dtype=np.single, optional=True),
+            Tensor(name="random_seed", shape=(-1,), dtype=np.int_, optional=True),
+            Tensor(name="max_length", shape=(-1,), dtype=np.int_, optional=True),
+            Tensor(name="output_logits", shape=(-1,), dtype=np.bool_, optional=True),
+            Tensor(name="output_scores", shape=(-1,), dtype=np.bool_, optional=True),
+        )
+        return inputs


⚠️ Potential issue | 🟡 Minor

Duplicate Triton input definition for max_length.

max_length is defined twice in get_triton_input (lines 354 and 360), which will cause the second definition to shadow the first.

🐛 Proposed fix: Remove duplicate definition

inputs = ( Tensor(name="prompts", shape=(-1,), dtype=bytes), Tensor(name="max_length", shape=(-1,), dtype=np.int_, optional=True), Tensor(name="max_batch_size", shape=(-1,), dtype=np.int_, optional=True), Tensor(name="top_k", shape=(-1,), dtype=np.int_, optional=True), Tensor(name="top_p", shape=(-1,), dtype=np.single, optional=True), Tensor(name="temperature", shape=(-1,), dtype=np.single, optional=True), Tensor(name="random_seed", shape=(-1,), dtype=np.int_, optional=True), - Tensor(name="max_length", shape=(-1,), dtype=np.int_, optional=True), Tensor(name="output_logits", shape=(-1,), dtype=np.bool_, optional=True), Tensor(name="output_scores", shape=(-1,), dtype=np.bool_, optional=True), )

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@examples/puzzletron/evaluation/hf_deployable_anymodel.py` around lines 353 - 364, The get_triton_input function contains a duplicated Tensor input named "max_length"; remove the redundant Tensor(name="max_length", ...) entry so each Triton input name is unique and the intended max_length definition remains (verify you keep the correct dtype/optional settings in get_triton_input and update the inputs tuple accordingly).

coderabbitai · 2026-03-17T13:27:28Z

examples/puzzletron/evaluation/nemo_evaluator_instructions.md

+  --ulimit memlock=-1 \
+  --rm -it \
+  -v ${MODELOPT_DIR}:/opt/Model-Optimizer \
+  -v ${MODELOPT_DIR}/modelopt:/opt/venv/lib/python3.12/site-packages/modelopt \


⚠️ Potential issue | 🟠 Major

Hardcoded Python version may break with different containers.

The mount path /opt/venv/lib/python3.12/site-packages/modelopt assumes Python 3.12. If the NeMo container uses a different Python version, this mount will fail silently or cause import errors.

🔧 Proposed fix to make the path more robust

Consider documenting that users should verify the Python version or use a more flexible approach:

- -v ${MODELOPT_DIR}/modelopt:/opt/venv/lib/python3.12/site-packages/modelopt \ + -v ${MODELOPT_DIR}/modelopt:/opt/venv/lib/python$(python3 --version | cut -d' ' -f2 | cut -d'.' -f1,2)/site-packages/modelopt \

Or add a comment warning users to verify the Python version:

+ # Note: Adjust the Python version (3.12) below to match your container's Python version -v ${MODELOPT_DIR}/modelopt:/opt/venv/lib/python3.12/site-packages/modelopt \

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

-v ${MODELOPT_DIR}/modelopt:/opt/venv/lib/python3.12/site-packages/modelopt \

# Note: Adjust the Python version (3.12) below to match your container's Python version

-v ${MODELOPT_DIR}/modelopt:/opt/venv/lib/python3.12/site-packages/modelopt \

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@examples/puzzletron/evaluation/nemo_evaluator_instructions.md` at line 32, Replace the hardcoded Python version in the Docker mount path string "-v ${MODELOPT_DIR}/modelopt:/opt/venv/lib/python3.12/site-packages/modelopt" with a robust approach: either document that users must confirm the container Python version (mentioning ${MODELOPT_DIR} and the target path) or change the instructions to compute/parameterize the Python minor version (e.g., use a variable like PYTHON_VERSION or detect it at runtime) and use "/opt/venv/lib/python${PYTHON_VERSION}/site-packages/modelopt" so the mount path matches the container Python; update the README/example note near the mount to explain how to set PYTHON_VERSION if you choose parameterization.

examples/puzzletron/GPTOSS.md

examples/puzzletron/mbridge_distillation/distill_hf.py

codecov · 2026-03-17T13:34:18Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 72.12%. Comparing base (67999eb) to head (6d895e8).

Additional details and impacted files

@@                 Coverage Diff                 @@
##           feature/puzzletron    #1035   +/-   ##
===================================================
  Coverage               72.12%   72.12%           
===================================================
  Files                     209      209           
  Lines                   23628    23628           
===================================================
  Hits                    17042    17042           
  Misses                   6586     6586

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

kevalmorabia97 · 2026-03-17T18:30:45Z

examples/puzzletron/evaluation/hf_deployable_anymodel.py

Is this file copiled from lm-eval-harness as shown the license below?

I fixed license header (after info from JohannesR)

kevalmorabia97 · 2026-03-17T18:31:35Z

examples/puzzletron/evaluation/lm_eval_anymodel.py

+# SPDX-FileCopyrightText: Copyright (c) 2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.


can you move the NVIDIA Apache2 copyright and license after the lm-eval license? And also exclude this from from auto license hook in .pre-commit-config.yaml

kevalmorabia97 · 2026-03-17T18:34:18Z

examples/puzzletron/mbridge_distillation/distill_hf.py

can we just add the puzzletron imports in existing distill.py so we dont have 2 copies of the same file?

Added to TODO, we would need to test first if puzzletron mbridge distillation works ok with non-puzzletron models

kevalmorabia97 · 2026-03-17T18:36:19Z

examples/puzzletron/mbridge_distillation/README.md

+
+**Start Docker container:**
+
+Use the [NeMo 26.02 container](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo?version=26.02):


Please update to 26.02.01 which will release this week and has some distillation_provider.py fixes already

kevalmorabia97 · 2026-03-17T18:37:50Z

examples/puzzletron/GPTOSS.md

+To do so, there is an additional script, that takes the original and the pruned checkpoint and outputs pruned checkpoint in _MXFP4_ format.
+
+```bash
+python -m modelopt.torch.puzzletron.anymodel.models.gpt_oss.gpt_oss_pruned_to_mxfp4 --student-path /workspaces/any_model_gpt_oss/mip/puzzle_solutions/stats_num_params_18014757184/solutions--checkpoints/solution_0/ --original-path /workspaces/source_model_checkpoints/openai_gpt-oss-20b/ --output-path /workspaces/any_model_gpt_oss/mip/puzzle_solutions/stats_num_params_18014757184/solutions--checkpoints/mxfp4-ckpt/  --num-layers 24


how do we arrive at stats_num_params_18014757184 folder?

It is computed automatically by Puzzletron, added to TODO to improve this doc.

kevalmorabia97 · 2026-03-17T18:39:09Z

examples/puzzletron/README.md

+> **Note:** NeMo containers may ship `nvidia-lm-eval` which may conflict with `lm-eval` that is used for evaluation.
+> If so, run `pip uninstall nvidia-lm-eval -y` before installing requirements.
+
 - For this example we are using 2x NVIDIA H100 80GB HBM3 to show multi-GPU steps. You can use also use s single GPU.


Suggested change

- For this example we are using 2x NVIDIA H100 80GB HBM3 to show multi-GPU steps. You can use also use s single GPU.

- For this example we are using 2x NVIDIA H100 80GB HBM3 to show multi-GPU steps. You can use also use single GPU.

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

examples/puzzletron/GPTOSS.md (2)

7-8: Consider minor grammar improvements for readability.

Line 7: "In the pruning steps puzzle utilizes" could be rephrased to "During the pruning steps, Puzzle utilizes" or "In the pruning steps, Puzzle utilizes"
Line 8: "This means, during the conversion" would read better as "This means that during the conversion"

📝 Optional grammar refinements

-In the pruning steps puzzle utilizes decompressed model (back to BF16) for statistics and scores computation.
-This means, during the conversion to puzzle format we decompress the model and store it as a BF16.
+During the pruning steps, Puzzle utilizes a decompressed model (back to BF16) for statistics and scores computation.
+This means that during the conversion to Puzzle format, we decompress the model and store it as BF16.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@examples/puzzletron/GPTOSS.md` around lines 7 - 8, Update the two sentences
for improved grammar and readability: change "In the pruning steps puzzle
utilizes decompressed model (back to BF16) for statistics and scores
computation." to "During the pruning steps, Puzzle utilizes the decompressed
model (back to BF16) for statistics and score computation." and change "This
means, during the conversion to puzzle format we decompress the model and store
it as a BF16." to "This means that during the conversion to Puzzle format, we
decompress the model and store it as BF16." Ensure capitalization of "Puzzle" is
consistent and remove the extraneous comma after "means".

2-4: Inconsistent model name capitalization.

The heading uses "GptOss" (line 2) while the text uses "Gpt-Oss" (line 4). Consider standardizing to one format throughout the document for consistency.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@examples/puzzletron/GPTOSS.md` around lines 2 - 4, The heading "GptOss" and
the body text "Gpt-Oss" are inconsistent; pick a single canonical model name
(either "GptOss" or "Gpt-Oss") and update both the H2 heading and all
occurrences in the paragraph to match; specifically edit the heading line
containing "GptOss" and the sentence mentioning "Gpt-Oss" so they use the same
standardized spelling across the file.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@examples/puzzletron/GPTOSS.md`:
- Line 13: Update the GPTOSS.md example to clarify that the shown filesystem
paths are placeholders and should be replaced by users with their actual
workspace paths, explicitly explain that the folder name
stats_num_params_18014757184 is an auto-generated output from the
pruning/statistics step (include how it encodes the total parameter count or
stats used), and add a short note describing the --num-layers parameter (that it
sets the student model layer count and should match the target architecture
after pruning or be chosen based on the original model’s layer count and desired
compression) so readers know how to derive these values for their model.

---

Nitpick comments:
In `@examples/puzzletron/GPTOSS.md`:
- Around line 7-8: Update the two sentences for improved grammar and
readability: change "In the pruning steps puzzle utilizes decompressed model
(back to BF16) for statistics and scores computation." to "During the pruning
steps, Puzzle utilizes the decompressed model (back to BF16) for statistics and
score computation." and change "This means, during the conversion to puzzle
format we decompress the model and store it as a BF16." to "This means that
during the conversion to Puzzle format, we decompress the model and store it as
BF16." Ensure capitalization of "Puzzle" is consistent and remove the extraneous
comma after "means".
- Around line 2-4: The heading "GptOss" and the body text "Gpt-Oss" are
inconsistent; pick a single canonical model name (either "GptOss" or "Gpt-Oss")
and update both the H2 heading and all occurrences in the paragraph to match;
specifically edit the heading line containing "GptOss" and the sentence
mentioning "Gpt-Oss" so they use the same standardized spelling across the file.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 96574184-9858-43c1-9357-d2104b326907

📥 Commits

Reviewing files that changed from the base of the PR and between b47f846 and 500568e.

📒 Files selected for processing (1)

examples/puzzletron/GPTOSS.md

coderabbitai · 2026-03-18T12:44:36Z

examples/puzzletron/GPTOSS.md

+To do so, there is an additional script, that takes the original and the pruned checkpoint and outputs pruned checkpoint in _MXFP4_ format.
+
+```bash
+python -m modelopt.torch.puzzletron.anymodel.models.gpt_oss.gpt_oss_pruned_to_mxfp4 --student-path /workspaces/any_model_gpt_oss/mip/puzzle_solutions/stats_num_params_18014757184/solutions--checkpoints/solution_0/ --original-path /workspaces/source_model_checkpoints/openai_gpt-oss-20b/ --output-path /workspaces/any_model_gpt_oss/mip/puzzle_solutions/stats_num_params_18014757184/solutions--checkpoints/mxfp4-ckpt/  --num-layers 24


⚠️ Potential issue | 🟡 Minor

Clarify example paths and explain generated folder names.

The example command uses hardcoded paths including a folder named stats_num_params_18014757184, which appears to be auto-generated. This addresses the earlier question from kevalmorabia97 about how users arrive at this folder name. Consider adding a note explaining:

That these are example paths and users should replace them with their actual paths

How the stats_num_params_18014757184 folder name is generated (presumably by the pruning process based on model statistics)

What the --num-layers parameter represents and how users should determine its value for their model

📋 Suggested documentation addition

Add explanatory text before the code block:

To do so, there is an additional script, that takes the original and the pruned checkpoint and outputs pruned checkpoint in _MXFP4_ format. + +**Note**: The example below shows sample paths. Replace them with your actual paths: +- `--student-path`: Path to your pruned checkpoint (the folder name like `stats_num_params_18014757184` is auto-generated during pruning based on model statistics) +- `--original-path`: Path to your original quantized model +- `--output-path`: Desired output location for the re-quantized checkpoint +- `--num-layers`: Number of layers in your model (24 for gpt-oss-20b) ```bash

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@examples/puzzletron/GPTOSS.md` at line 13, Update the GPTOSS.md example to clarify that the shown filesystem paths are placeholders and should be replaced by users with their actual workspace paths, explicitly explain that the folder name stats_num_params_18014757184 is an auto-generated output from the pruning/statistics step (include how it encodes the total parameter count or stats used), and add a short note describing the --num-layers parameter (that it sets the student model layer count and should match the target architecture after pruning or be chosen based on the original model’s layer count and desired compression) so readers know how to derive these values for their model.

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

coderabbitai

🧹 Nitpick comments (1)

examples/puzzletron/mbridge_distillation/distill_hf.py (1)

155-161: Consider making --hf-model conditionally required.

The --hf-model argument is marked as required=True, but it's only used when --hf-export-path is provided (line 288). Users who don't want HF export are still forced to provide this argument.

Consider either:

Making it optional with validation in main() when export is requested, or
Adding post-parse validation similar to the data_paths check at lines 165-166.

♻️ Suggested approach

     parser.add_argument(
         "--hf-model",
         type=str,
-        required=True,
+        default=None,
         help="HuggingFace model ID to use as template for export (e.g., meta-llama/Llama-3.1-8B-Instruct). "
         "Should match the base architecture of the student model.",
     )
     args = parser.parse_args()

     # Sanity checks
     if not args.use_mock_data and not args.data_paths:
         raise ValueError("Must provide either --data_paths or set --use_mock_data.")
+    if args.hf_export_path and not args.hf_model:
+        raise ValueError("--hf-model is required when --hf-export-path is provided.")

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@examples/puzzletron/mbridge_distillation/distill_hf.py` around lines 155 -
161, The --hf-model argument is currently required even when not exporting to
HuggingFace; change the parser.add_argument call for "--hf-model" to not
required (remove required=True) and add a post-parse validation in main() that
checks if args.hf_export_path is set then ensure args.hf_model is present
(raise/exit with a clear message if missing) — reference the hf-export-path
check logic already used for data_paths and replicate that pattern to validate
hf_model only when export is requested.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@examples/puzzletron/mbridge_distillation/distill_hf.py`:
- Around line 155-161: The --hf-model argument is currently required even when
not exporting to HuggingFace; change the parser.add_argument call for
"--hf-model" to not required (remove required=True) and add a post-parse
validation in main() that checks if args.hf_export_path is set then ensure
args.hf_model is present (raise/exit with a clear message if missing) —
reference the hf-export-path check logic already used for data_paths and
replicate that pattern to validate hf_model only when export is requested.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: cdf922ab-5f44-42d7-a0d9-597c2e164f8e

📥 Commits

Reviewing files that changed from the base of the PR and between 500568e and 46d7d41.

📒 Files selected for processing (2)

examples/puzzletron/mbridge_distillation/README.md
examples/puzzletron/mbridge_distillation/distill_hf.py

🚧 Files skipped from review as they are similar to previous changes (1)

examples/puzzletron/mbridge_distillation/README.md

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

examples/puzzletron/evaluation/hf_deployable_anymodel.py (1)

353-364: ⚠️ Potential issue | 🟡 Minor

Duplicate Triton input name max_length is still present.

max_length appears at both Line 355 and Line 361. Keep only one definition to avoid ambiguous input mapping.

💡 Proposed fix

         inputs = (
             Tensor(name="prompts", shape=(-1,), dtype=bytes),
             Tensor(name="max_length", shape=(-1,), dtype=np.int_, optional=True),
             Tensor(name="max_batch_size", shape=(-1,), dtype=np.int_, optional=True),
             Tensor(name="top_k", shape=(-1,), dtype=np.int_, optional=True),
             Tensor(name="top_p", shape=(-1,), dtype=np.single, optional=True),
             Tensor(name="temperature", shape=(-1,), dtype=np.single, optional=True),
             Tensor(name="random_seed", shape=(-1,), dtype=np.int_, optional=True),
-            Tensor(name="max_length", shape=(-1,), dtype=np.int_, optional=True),
             Tensor(name="output_logits", shape=(-1,), dtype=np.bool_, optional=True),
             Tensor(name="output_scores", shape=(-1,), dtype=np.bool_, optional=True),
         )

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@examples/puzzletron/evaluation/hf_deployable_anymodel.py` around lines 353 -
364, The Triton model inputs tuple named "inputs" contains a duplicate Tensor
entry with name "max_length"; remove one of the duplicate Tensor(...) entries
for "max_length" so each input name is unique (leave the single desired
definition with the correct dtype/optional flags), and verify downstream code
that references "max_length" still matches the remaining Tensor name; update the
tuple in hf_deployable_anymodel.py where "inputs" is defined to eliminate the
duplicate.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@examples/puzzletron/evaluation/hf_deployable_anymodel.py`:
- Line 40: Remove the file-wide mypy suppression by deleting the "# mypy:
ignore-errors" directive and instead fix the underlying type issues in this
module (or, if necessary, apply targeted mypy pragmas such as "# type:
ignore[<code>]" on the specific lines or use narrow per-function decorators) so
the file complies with the repository's mypy checks; locate the top-level module
comment containing "# mypy: ignore-errors" and replace it with either concrete
type fixes or localized ignores only where required.

---

Duplicate comments:
In `@examples/puzzletron/evaluation/hf_deployable_anymodel.py`:
- Around line 353-364: The Triton model inputs tuple named "inputs" contains a
duplicate Tensor entry with name "max_length"; remove one of the duplicate
Tensor(...) entries for "max_length" so each input name is unique (leave the
single desired definition with the correct dtype/optional flags), and verify
downstream code that references "max_length" still matches the remaining Tensor
name; update the tuple in hf_deployable_anymodel.py where "inputs" is defined to
eliminate the duplicate.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: e745d78b-eb2e-41ac-b5be-26054e9c06dd

📥 Commits

Reviewing files that changed from the base of the PR and between 46d7d41 and 152d067.

📒 Files selected for processing (2)

.pre-commit-config.yaml
examples/puzzletron/evaluation/hf_deployable_anymodel.py

coderabbitai · 2026-03-18T13:05:04Z

examples/puzzletron/evaluation/hf_deployable_anymodel.py

+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+# mypy: ignore-errors


⚠️ Potential issue | 🟠 Major

Remove file-wide mypy suppression.

# mypy: ignore-errors at Line 40 disables type checking for the whole module and bypasses the repo’s Python type-checking requirement.

💡 Proposed fix

-# mypy: ignore-errors

As per coding guidelines, "**/*.py: Use mypy for type checking on Python code (configured in pyproject.toml)."

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

# mypy: ignore-errors

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@examples/puzzletron/evaluation/hf_deployable_anymodel.py` at line 40, Remove the file-wide mypy suppression by deleting the "# mypy: ignore-errors" directive and instead fix the underlying type issues in this module (or, if necessary, apply targeted mypy pragmas such as "# type: ignore[<code>]" on the specific lines or use narrow per-function decorators) so the file complies with the repository's mypy checks; locate the top-level module comment containing "# mypy: ignore-errors" and replace it with either concrete type fixes or localized ignores only where required.

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

coderabbitai

♻️ Duplicate comments (1)

examples/puzzletron/mbridge_distillation/README.md (1)

25-35: ⚠️ Potential issue | 🟠 Major

Align the documented NeMo version with the Docker image tag.

Line 25 tells users to use 26.02.01, but Line 34 runs nvcr.io/nvidia/nemo:26.02. This mismatch makes the setup non-reproducible and can invalidate expected behavior.

Suggested doc fix

-Use the [NeMo 26.02.01 container](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo?version=26.02.01):
+Use the [NeMo 26.02.01 container](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo?version=26.02.01):
@@
-  nvcr.io/nvidia/nemo:26.02 \
+  nvcr.io/nvidia/nemo:26.02.01 \
   /bin/bash

Is `nvcr.io/nvidia/nemo:26.02.01` the correct container tag for the NeMo 26.02.01 release, and if not, what exact tag should be used?

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@examples/puzzletron/mbridge_distillation/README.md` around lines 25 - 35, The
README documents "NeMo 26.02.01" but the docker command pulls
"nvcr.io/nvidia/nemo:26.02", causing a mismatch; update the docker image tag in
the command (nvcr.io/nvidia/nemo:26.02) to the exact release tag that matches
the text (e.g., nvcr.io/nvidia/nemo:26.02.01) or change the documented version
string to match the existing image tag so both the header "NeMo 26.02.01" and
the docker reference "nvcr.io/nvidia/nemo:26.02" are consistent and
reproducible.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@examples/puzzletron/mbridge_distillation/README.md`:
- Around line 25-35: The README documents "NeMo 26.02.01" but the docker command
pulls "nvcr.io/nvidia/nemo:26.02", causing a mismatch; update the docker image
tag in the command (nvcr.io/nvidia/nemo:26.02) to the exact release tag that
matches the text (e.g., nvcr.io/nvidia/nemo:26.02.01) or change the documented
version string to match the existing image tag so both the header "NeMo
26.02.01" and the docker reference "nvcr.io/nvidia/nemo:26.02" are consistent
and reproducible.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 10c8519f-ff89-41ad-a5db-6ecd7a0ee252

📥 Commits

Reviewing files that changed from the base of the PR and between 152d067 and 6d895e8.

📒 Files selected for processing (4)

.pre-commit-config.yaml
examples/puzzletron/README.md
examples/puzzletron/evaluation/lm_eval_anymodel.py
examples/puzzletron/mbridge_distillation/README.md

🚧 Files skipped from review as they are similar to previous changes (1)

.pre-commit-config.yaml

danielkorzekwa added 30 commits March 4, 2026 11:33

Add anymodel directories to feature/puzzletron

e82164f

- Add converter, model_descriptor, puzzformer, and llama model support - Selective merge of anymodel functionality Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Make any_model conversion working.

2099df3

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Update child_init.py with anymodel version

eb5cf8a

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

fix attention pruning

c9de41c

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Add trust_remote_code to load_model_config (default to false)

3c1bc1f

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Make activation scoring working

8357136

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Comment all tested models aside of llama_3_1_8b_instruct

6cc2194

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Delete not needed decilm test

ee4e1e3

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Fix broken tests

449b523

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Update puzzletron_nas_pluging to any_model version

fb27bba

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Correct test resources used by tests.

b350f82

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Disable puzzletron tests (will be enabled after all any_model logic i…

fafe5a3

…s merged) Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Merge branch 'dkorzekwa/anymodel_core' into dkorzekwa/anymodel_activa…

e988248

…tion_scoring

Comment out not implemented models.

c717852

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

format python docs

030f126

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Merge branch 'dkorzekwa/anymodel_core' into dkorzekwa/anymodel_activa…

8dcdfbf

…tion_scoring

Use trust_remote_code in force_cache_dynamic_modules()

70df0df

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Merge branch 'dkorzekwa/anymodel_core' into dkorzekwa/anymodel_activa…

bb56662

…tion_scoring

Fix anymodel pruning

ecd953e

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Fix buid docs issue.

ee8f538

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Merge branch 'dkorzekwa/anymodel_core' into dkorzekwa/anymodel_activa…

c9b76a1

…tion_scoring

Merge branch 'dkorzekwa/anymodel_activation_scoring' into dkorzekwa/a…

6e3af61

…nymodel_pruning

Merging build_library_and_stats

0ad6d92

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Merging anymodel: calc_one_block_scores

995eb1a

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Mering any_model: calc_one_block_scores

34081c9

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

merge any_model: mip_and_realize_models

ed5c00f

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Add all anymodel models but gptoss

993b5ec

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Make nemotron-nano-12b-v2 to work (set trust_remote_code=true)

6e9f03b

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

merge anymodel for nemotron-3-nano-30b-a3b-base-bf16

e8b7a7d

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Clarify readme and avoid reusing the same reference in llama_converter.

47414d5

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

danielkorzekwa added 9 commits March 17, 2026 00:23

Disable lm_loss assertion for nemotron-3-nano-30b-a3b-base-bf16 (not …

4a692dc

…reproducible on CI) Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Removing incorrect licence file. gpt_oss_pruned_to_mxfp4.py was not a…

e795f0c

…dapted from https://github.com/EleutherAI/lm-evaluation-harness Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Fix hardcoded trust_remote_code

631306c

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Merge branch 'dkorzekwa/any_model_other_models' into dkorzekwa/anymod…

dc77be2

…el_gptoss

Merge branch 'dkorzekwa/anymodel_gptoss' into dkorzekwa/anymodel_tuto…

b76e0ef

…rial

Merge branch 'feature/puzzletron' into dkorzekwa/anymodel_gptoss

5cadc65

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Delete not needed yaml files for test_puzzletron.

151081c

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Delete not needed mypy exclusion for removed hf_configs files.

36daa6d

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Merge branch 'dkorzekwa/anymodel_gptoss' into dkorzekwa/anymodel_tuto…

960b8ce

…rial

Base automatically changed from dkorzekwa/anymodel_gptoss to feature/puzzletron March 17, 2026 13:16

danielkorzekwa requested review from a team as code owners March 17, 2026 13:16

Merge branch 'feature/puzzletron' into dkorzekwa/anymodel_tutorial

b47f846

coderabbitai bot reviewed Mar 17, 2026

View reviewed changes

kevalmorabia97 reviewed Mar 17, 2026

View reviewed changes

fix typo fix

500568e

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

coderabbitai bot reviewed Mar 18, 2026

View reviewed changes

Add trust_remote_code to distill_hf cli

46d7d41

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

coderabbitai bot reviewed Mar 18, 2026

View reviewed changes

danielkorzekwa added 2 commits March 18, 2026 05:57

Fix licence header.

152d067

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Fix license header

7f95d27

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

coderabbitai bot reviewed Mar 18, 2026

View reviewed changes

Improve docs.

6d895e8

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

coderabbitai bot reviewed Mar 18, 2026

View reviewed changes

	target_memory: 16_000 # 45 GiB
	target_memory: 16_000 # ~15.6 GiB

		hook_class: ${get_object:modelopt.torch.nas.plugins.megatron_hooks.base_hooks.RankedChoiceVotingHook}
		activation_hooks_kwargs: # Additional kwargs to pass to the hook init

	dataset_path: ??? # ppath to Nemotron-Post-Training-Dataset-v2
	dataset_path: ??? # path to Nemotron-Post-Training-Dataset-v2

	-v ${MODELOPT_DIR}/modelopt:/opt/venv/lib/python3.12/site-packages/modelopt \
	# Note: Adjust the Python version (3.12) below to match your container's Python version
	-v ${MODELOPT_DIR}/modelopt:/opt/venv/lib/python3.12/site-packages/modelopt \


		Start Docker container:

		Use the [NeMo 26.02 container](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo?version=26.02):

	- For this example we are using 2x NVIDIA H100 80GB HBM3 to show multi-GPU steps. You can use also use s single GPU.
	- For this example we are using 2x NVIDIA H100 80GB HBM3 to show multi-GPU steps. You can use also use single GPU.

Conversation

danielkorzekwa commented Mar 13, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Summary by CodeRabbit

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

codecov bot commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

danielkorzekwa commented Mar 13, 2026 •

edited by coderabbitai bot

Loading

codecov bot commented Mar 17, 2026 •

edited

Loading