Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
91 commits
Select commit Hold shift + click to select a range
e82164f
Add anymodel directories to feature/puzzletron
danielkorzekwa Mar 4, 2026
2099df3
Make any_model conversion working.
danielkorzekwa Mar 5, 2026
eb5cf8a
Update child_init.py with anymodel version
danielkorzekwa Mar 5, 2026
c9de41c
fix attention pruning
danielkorzekwa Mar 5, 2026
3c1bc1f
Add trust_remote_code to load_model_config (default to false)
danielkorzekwa Mar 5, 2026
8357136
Make activation scoring working
danielkorzekwa Mar 5, 2026
6cc2194
Comment all tested models aside of llama_3_1_8b_instruct
danielkorzekwa Mar 5, 2026
ee4e1e3
Delete not needed decilm test
danielkorzekwa Mar 5, 2026
449b523
Fix broken tests
danielkorzekwa Mar 5, 2026
fb27bba
Update puzzletron_nas_pluging to any_model version
danielkorzekwa Mar 5, 2026
b350f82
Correct test resources used by tests.
danielkorzekwa Mar 5, 2026
fafe5a3
Disable puzzletron tests (will be enabled after all any_model logic i…
danielkorzekwa Mar 5, 2026
e988248
Merge branch 'dkorzekwa/anymodel_core' into dkorzekwa/anymodel_activa…
danielkorzekwa Mar 6, 2026
c717852
Comment out not implemented models.
danielkorzekwa Mar 6, 2026
030f126
format python docs
danielkorzekwa Mar 6, 2026
8dcdfbf
Merge branch 'dkorzekwa/anymodel_core' into dkorzekwa/anymodel_activa…
danielkorzekwa Mar 6, 2026
70df0df
Use trust_remote_code in force_cache_dynamic_modules()
danielkorzekwa Mar 6, 2026
bb56662
Merge branch 'dkorzekwa/anymodel_core' into dkorzekwa/anymodel_activa…
danielkorzekwa Mar 6, 2026
ecd953e
Fix anymodel pruning
danielkorzekwa Mar 6, 2026
ee8f538
Fix buid docs issue.
danielkorzekwa Mar 6, 2026
c9b76a1
Merge branch 'dkorzekwa/anymodel_core' into dkorzekwa/anymodel_activa…
danielkorzekwa Mar 6, 2026
6e3af61
Merge branch 'dkorzekwa/anymodel_activation_scoring' into dkorzekwa/a…
danielkorzekwa Mar 6, 2026
0ad6d92
Merging build_library_and_stats
danielkorzekwa Mar 6, 2026
995eb1a
Merging anymodel: calc_one_block_scores
danielkorzekwa Mar 6, 2026
34081c9
Mering any_model: calc_one_block_scores
danielkorzekwa Mar 6, 2026
ed5c00f
merge any_model: mip_and_realize_models
danielkorzekwa Mar 6, 2026
993b5ec
Add all anymodel models but gptoss
danielkorzekwa Mar 6, 2026
6e9f03b
Make nemotron-nano-12b-v2 to work (set trust_remote_code=true)
danielkorzekwa Mar 9, 2026
e8b7a7d
merge anymodel for nemotron-3-nano-30b-a3b-base-bf16
danielkorzekwa Mar 9, 2026
47414d5
Clarify readme and avoid reusing the same reference in llama_converter.
danielkorzekwa Mar 9, 2026
a8305d8
Fix tied-embedding handling before writing the safetensors index.
danielkorzekwa Mar 9, 2026
68421a5
Fix NaN ranking currently selects NaNs as “best” experts by default.
danielkorzekwa Mar 9, 2026
d6b8028
Code clean up.
danielkorzekwa Mar 9, 2026
ecd2341
Code clean up.
danielkorzekwa Mar 10, 2026
f9d845d
code clean up
danielkorzekwa Mar 10, 2026
d171b01
Merge branch 'dkorzekwa/anymodel_core' into dkorzekwa/anymodel_activa…
danielkorzekwa Mar 10, 2026
722da90
Merge branch 'dkorzekwa/anymodel_activation_scoring' into dkorzekwa/a…
danielkorzekwa Mar 10, 2026
934ab2f
code clean up
danielkorzekwa Mar 10, 2026
0f14ec3
Merge branch 'dkorzekwa/anymodel_pruning' into dkorzekwa/anymodel_bui…
danielkorzekwa Mar 10, 2026
dcb9e02
remove not needed comment
danielkorzekwa Mar 10, 2026
0c9ea5d
Merge branch 'dkorzekwa/anymodel_build_library_and_stats' into dkorze…
danielkorzekwa Mar 10, 2026
5b310e2
Merge branch 'dkorzekwa/any_model_calc_one_block_scores' into dkorzek…
danielkorzekwa Mar 10, 2026
4f82b1c
Merge branch 'dkorzekwa/mip_and_realize_models' into dkorzekwa/any_mo…
danielkorzekwa Mar 10, 2026
176a435
Fix a broken test_puzzletron test on 2 gpus.
danielkorzekwa Mar 10, 2026
02e2c9b
Merge branch 'dkorzekwa/anymodel_activation_scoring' into dkorzekwa/a…
danielkorzekwa Mar 10, 2026
92c4419
Merge branch 'dkorzekwa/anymodel_pruning' into dkorzekwa/anymodel_bui…
danielkorzekwa Mar 10, 2026
aa1eb3e
Merge branch 'dkorzekwa/anymodel_build_library_and_stats' into dkorze…
danielkorzekwa Mar 10, 2026
2b84a96
Merge branch 'dkorzekwa/any_model_calc_one_block_scores' into dkorzek…
danielkorzekwa Mar 10, 2026
fb838c0
Merge branch 'dkorzekwa/mip_and_realize_models' into dkorzekwa/any_mo…
danielkorzekwa Mar 10, 2026
13378ff
Add gpt-oss model
danielkorzekwa Mar 11, 2026
47ca0e3
Add comments about a broken test
danielkorzekwa Mar 11, 2026
96112f7
Fix a broken gptoss test
danielkorzekwa Mar 12, 2026
cb6b182
Add mamba to puzzletron dependencies.
danielkorzekwa Mar 12, 2026
670bb34
Update mamba-ssm and casual-conv1d dependences (remove pinpoint versi…
danielkorzekwa Mar 13, 2026
0e1b591
Install mamba-ssm and causal-conv1d in testenv:cuda13-gpu-puzzletron
danielkorzekwa Mar 13, 2026
ca845ec
Fix installing dependencies in testenv:cuda13-gpu-puzzletron
danielkorzekwa Mar 13, 2026
be825bc
Fix anymodel for qwen3 8B in 2 gpus
danielkorzekwa Mar 13, 2026
7fd1afa
Fix pipeline parallelism issue for wen3-vl-30b-a3b-instruct-qwen3_vl-…
danielkorzekwa Mar 13, 2026
7d7b609
Fix multi-gpu issue for nemotron-nano-12b-v2
danielkorzekwa Mar 13, 2026
249af9d
Fix no_op in any_model
danielkorzekwa Mar 13, 2026
b80583c
Merge branch 'feature/puzzletron' into dkorzekwa/any_model_other_models
danielkorzekwa Mar 13, 2026
88b1b13
Merge any_model tutorial
danielkorzekwa Mar 13, 2026
1dd742e
Fix nemotron_h_model_descriptor.
danielkorzekwa Mar 14, 2026
4a6ebbe
Fix tox -e build-docs
danielkorzekwa Mar 14, 2026
585f0ed
pin mamba/casual-conv1d versions to fix failing assertion for test_pu…
danielkorzekwa Mar 14, 2026
7fb5d9a
Fix for installing mamba-ssm
danielkorzekwa Mar 14, 2026
75d3d69
Fix broken test for nemotron-3-nano-30b-a3b-base-bf16
danielkorzekwa Mar 14, 2026
0e5722d
code clean up
danielkorzekwa Mar 14, 2026
2dd9735
Make test_puzzletron test deterministic
danielkorzekwa Mar 15, 2026
3561de5
Comment out all models but nemotron-3-nano-30b-a3b-base-bf16 to check…
danielkorzekwa Mar 15, 2026
27866de
Implement Qwen3VLRemoveExpertsIndependentHook
danielkorzekwa Mar 15, 2026
a012fe6
Remove not needed nvidia licence header
danielkorzekwa Mar 16, 2026
52922a4
# Initialize weights to ensure all parameters are properly initialized
danielkorzekwa Mar 16, 2026
c234fb4
Fix non-deterministic test_puzzletron test
danielkorzekwa Mar 16, 2026
53dcd10
Fix for unsetting CUDA_VISIBLE_DEVICES
danielkorzekwa Mar 16, 2026
69d9648
increase numeric tolerance for test_puzzletron.py
danielkorzekwa Mar 17, 2026
4a692dc
Disable lm_loss assertion for nemotron-3-nano-30b-a3b-base-bf16 (not …
danielkorzekwa Mar 17, 2026
e795f0c
Removing incorrect licence file. gpt_oss_pruned_to_mxfp4.py was not a…
danielkorzekwa Mar 17, 2026
631306c
Fix hardcoded trust_remote_code
danielkorzekwa Mar 17, 2026
dc77be2
Merge branch 'dkorzekwa/any_model_other_models' into dkorzekwa/anymod…
danielkorzekwa Mar 17, 2026
b76e0ef
Merge branch 'dkorzekwa/anymodel_gptoss' into dkorzekwa/anymodel_tuto…
danielkorzekwa Mar 17, 2026
5cadc65
Merge branch 'feature/puzzletron' into dkorzekwa/anymodel_gptoss
danielkorzekwa Mar 17, 2026
151081c
Delete not needed yaml files for test_puzzletron.
danielkorzekwa Mar 17, 2026
36daa6d
Delete not needed mypy exclusion for removed hf_configs files.
danielkorzekwa Mar 17, 2026
960b8ce
Merge branch 'dkorzekwa/anymodel_gptoss' into dkorzekwa/anymodel_tuto…
danielkorzekwa Mar 17, 2026
b47f846
Merge branch 'feature/puzzletron' into dkorzekwa/anymodel_tutorial
danielkorzekwa Mar 17, 2026
500568e
fix typo fix
danielkorzekwa Mar 18, 2026
46d7d41
Add trust_remote_code to distill_hf cli
danielkorzekwa Mar 18, 2026
152d067
Fix licence header.
danielkorzekwa Mar 18, 2026
7f95d27
Fix license header
danielkorzekwa Mar 18, 2026
6d895e8
Improve docs.
danielkorzekwa Mar 18, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -108,6 +108,8 @@ repos:
examples/llm_eval/modeling.py|
examples/llm_qat/main.py|
examples/llm_sparsity/weight_sparsity/finetune.py|
examples/puzzletron/evaluation/hf_deployable_anymodel.py|
examples/puzzletron/evaluation/lm_eval_anymodel.py|
examples/specdec_bench/specdec_bench/models/specbench_medusa.py|
examples/speculative_decoding/main.py|
examples/speculative_decoding/medusa_utils.py|
Expand Down
14 changes: 14 additions & 0 deletions examples/puzzletron/GPTOSS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@

## GptOss

With this release Puzzle algorithm supports only experts removal for `Gpt-Oss`.

This model comes as a quantized checkpoint i.e. MoE experts matrices are quantized with _MXFP4_ format.
In the pruning steps puzzle utilizes decompressed model (back to BF16) for statistics and scores computation.
This means, during the conversion to puzzle format we decompress the model and store it as a BF16.
Once the pruning is done i.e. experts to be removed are identified and the process is finished, user may want to get back the _MXFP4_ format of the checkpoint.
To do so, there is an additional script, that takes the original and the pruned checkpoint and outputs pruned checkpoint in _MXFP4_ format.

```bash
python -m modelopt.torch.puzzletron.anymodel.models.gpt_oss.gpt_oss_pruned_to_mxfp4 --student-path /workspaces/any_model_gpt_oss/mip/puzzle_solutions/stats_num_params_18014757184/solutions--checkpoints/solution_0/ --original-path /workspaces/source_model_checkpoints/openai_gpt-oss-20b/ --output-path /workspaces/any_model_gpt_oss/mip/puzzle_solutions/stats_num_params_18014757184/solutions--checkpoints/mxfp4-ckpt/ --num-layers 24
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how do we arrive at stats_num_params_18014757184 folder?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is computed automatically by Puzzletron, added to TODO to improve this doc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Clarify example paths and explain generated folder names.

The example command uses hardcoded paths including a folder named stats_num_params_18014757184, which appears to be auto-generated. This addresses the earlier question from kevalmorabia97 about how users arrive at this folder name. Consider adding a note explaining:

  1. That these are example paths and users should replace them with their actual paths
  2. How the stats_num_params_18014757184 folder name is generated (presumably by the pruning process based on model statistics)
  3. What the --num-layers parameter represents and how users should determine its value for their model
📋 Suggested documentation addition

Add explanatory text before the code block:

 To do so, there is an additional script, that takes the original and the pruned checkpoint and outputs pruned checkpoint in _MXFP4_ format.
+
+**Note**: The example below shows sample paths. Replace them with your actual paths:
+- `--student-path`: Path to your pruned checkpoint (the folder name like `stats_num_params_18014757184` is auto-generated during pruning based on model statistics)
+- `--original-path`: Path to your original quantized model
+- `--output-path`: Desired output location for the re-quantized checkpoint
+- `--num-layers`: Number of layers in your model (24 for gpt-oss-20b)

 ```bash
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/puzzletron/GPTOSS.md` at line 13, Update the GPTOSS.md example to
clarify that the shown filesystem paths are placeholders and should be replaced
by users with their actual workspace paths, explicitly explain that the folder
name stats_num_params_18014757184 is an auto-generated output from the
pruning/statistics step (include how it encodes the total parameter count or
stats used), and add a short note describing the --num-layers parameter (that it
sets the student model layer count and should match the target architecture
after pruning or be chosen based on the original model’s layer count and desired
compression) so readers know how to derive these values for their model.

```
79 changes: 54 additions & 25 deletions examples/puzzletron/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,18 +9,23 @@ The supported modifications are:

To use the Puzzle algorithm effectively, we need to specify the target number of parameters and/or the memory. The final stage is based on Mixed-Integer Programming (MIP) algorithm to find the most optimal combination of layer modifications that satisfy the target requirements.

In this example, we compress the [Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) model reducing GPU memory usage from 113 GiB to 96 GiB (15% reduction) with less than 1% regression in the token_accuracy_top_10 metric.
In this example, we compress the [Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) model reducing GPU memory usage from 113 GiB to 96 GiB (15% reduction) with less than 1% regression in the token_accuracy_top_10 metric. Other supported models should be compressed in a similar way. For GptOss there is one [additional step to be performed](GPTOSS.md).

> **Note:** Other models are also supported. See the [configs](./configs/) directory for additional model configurations (e.g., Llama-3.2-3B-Instruct on 1x H100, Qwen2.5-7B-Instruct on 1x H100, Qwen3-8B on 1x H100, Nemotron-Nano-12B-v2 on 1x H100, Mistral-Small-24B-Instruct-2501 on 4x H100). For information on adding support for new models, see the [AnyModel Guide](../../modelopt/torch/puzzletron/anymodel/README.md).

## Environment

- Install Model-Optimizer in editable mode with the corresponding dependencies:
- Install Model-Optimizer in editable mode with the corresponding dependencies (run from the repo root):

```bash
pip install -e .[hf,puzzletron]
pip install -r requirements.txt
pip install -r examples/puzzletron/requirements.txt
```

- For this example we are using 2x NVIDIA H100 80GB HBM3 to show multi-GPU steps. You can use also use s single GPU.
> **Note:** NeMo containers may ship `nvidia-lm-eval` which may conflict with `lm-eval` that is used for evaluation.
> If so, run `pip uninstall nvidia-lm-eval -y` before installing requirements.

- For this example we are using 2x NVIDIA H100 80GB HBM3 to show multi-GPU steps. You can use also use a single GPU.

- To make use of [Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) and [Nemotron-Post-Training-Dataset-v2](https://huggingface.co/datasets/nvidia/Nemotron-Post-Training-Dataset-v2), you need to accept the terms and conditions for the corresponding model and the dataset in the Huggingface Hub. Log in to the Huggingface Hub and enter your HF token.

Expand Down Expand Up @@ -133,7 +138,7 @@ This assumes pruning, replacement library building, NAS scoring, and subblock st
For example, let's set `target_memory: 96_000` in `llama-3_1-8B_pruneffn_memory.yaml`.

```bash
torchrun --nproc_per_node 2 examples/puzzletron/main.py --config path/to/llama-3_1-8B_pruneffn_memory.yaml --mip-only 2>&1 | tee ./log.txt | grep "Puzzletron Progress"
torchrun --nproc_per_node 2 examples/puzzletron/main.py --config examples/puzzletron/configs/llama-3_1-8B_pruneffn_memory/llama-3_1-8B_pruneffn_memory.yaml --mip-only 2>&1 | tee ./log.txt | grep "Puzzletron Progress"
```

This will generate the following network architecture (see `log.txt`):
Expand Down Expand Up @@ -195,18 +200,54 @@ block_13: attention no_op ffn intermediate_11520
block_14: attention no_op ffn intermediate_3072
```

### MIP Sweep Mode

The **MIP sweep mode** lets you explore multiple memory compression rates in a single run and compare the accuracy-memory trade-offs.

#### Quick Start

1. Enable sweep in your config YAML (e.g., `llama-3_1-8B_pruneffn_memory.yaml`):

```yaml
mip:
sweep:
enabled: true
memory_compression_rates: [0.5, 0.6, 0.7, 0.8, 0.9, 1.0]
output_csv: ${puzzle_dir}/mip_sweep_results.csv
```

2. Run the sweep:

```bash
torchrun --nproc_per_node 2 examples/puzzletron/main.py --config examples/puzzletron/configs/llama-3_1-8B_pruneffn_memory/llama-3_1-8B_pruneffn_memory.yaml --mip-only 2>&1 | tee ./log.txt | grep "Puzzletron Progress"
```

3. View results: The CSV file contains compression rates, memory usage, and accuracy metrics for each configuration.

#### Example Results

<img src="mip_sweep_example.png" alt="MIP Sweep Results" width="600">

The plot shows how token accuracy changes with different compression rates. Higher compression (0.5 = 50% of original memory) reduces accuracy, while lower compression maintains accuracy closer to the teacher model.

## Evaluation

Once the model is ready, you can evaluate it using [Language Model Evaluation Harness](https://pypi.org/project/lm-eval/). For example, run the following to evaluate the model on [Massive Multitask Language Understanding](https://huggingface.co/datasets/cais/mmlu) benchmark.
Evaluate AnyModel checkpoints using [lm-eval](https://github.com/EleutherAI/lm-evaluation-harness) directly.

```bash
lm_eval --model hf \
--model_args pretrained=path/to/model,dtype=bfloat16,trust_remote_code=true,parallelize=True \
--tasks mmlu \
--num_fewshot 5 \
--batch_size 4
python examples/puzzletron/evaluation/lm_eval_anymodel.py \
--model hf \
--model_args pretrained=path/to/checkpoint,dtype=bfloat16,parallelize=True \
--tasks mmlu \
--num_fewshot 5 \
--batch_size 4
```

For a quick smoke test, add `--limit 10`.

> **Alternative:** For server-based evaluation via an OpenAI-compatible endpoint,
> see [evaluation/nemo_evaluator_instructions.md](./evaluation/nemo_evaluator_instructions.md).

## Inference Performance Benchmarking

Now let's evaluate how much speedup we get with the compressed model in terms of throughput and latency.
Expand Down Expand Up @@ -234,21 +275,9 @@ vllm bench throughput --model path/to/model --input-len 2000 --output-len 100 --

## Knowledge Distillation

To recover degradation in the quality of the compressed model, we can use knowledge distillation. This allows transferring the capabilities of the original model to the pruned one. For this, we will use [NeMo framework](https://github.com/NVIDIA-NeMo/NeMo) with the [nemo:25.07](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo?version=25.07) container.

First, convert the HF model to NeMo format:
To recover degradation in the quality of the compressed model, we can use knowledge distillation. This allows transferring the capabilities of the original model to the pruned one.

```bash
python -m nemo_export/convert_hf_to_nemo --input-ckpt-path path/to/HF-model --output-ckpt-path path/to/save/model-nemo
```

Now you can utilize all the training features available in NeMo, including distillation. Please refer to the [NeMo distillation documentation](https://docs.nvidia.com/nemo-framework/user-guide/latest/model-optimization/distillation/distillation.html).

[Optional] Once distillation is complete, you can convert the distilled model back to the HuggingFace format.

```bash
python -m nemo_export/convert_nemo_to_hf --input-ckpt-path path/to/nemo-model --output-ckpt-path path/to/save/model-HF
```
See [mbridge_distillation/README.md](./mbridge_distillation/README.md) for instructions on using Megatron-Bridge for knowledge distillation.

## Advanced Usage

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,110 @@
defaults:
- pruning: ffn_pruning
- scoring: ../validate_solutions_defaults
- realize_model: ../validate_solutions_defaults
- bypass:
- override hydra/hydra_logging: disabled
- _self_

puzzle_dir: ???
descriptor: gpt_oss
teacher_dir: ${puzzle_dir}/ckpts/teacher/
replacement_library_path: ${puzzle_dir}/replacement_library.json
dataset_path: ??? # path to Nemotron-Post-Training-Dataset-v2

skip_realize_model: false

build_replacement_library:
add_ffn_no_ops: true
add_attention_no_ops: true

calc_subblock_stats:
batch_sizes: [64, 96, 128]
prefill_seq_len: 4096
generation_seq_len: 4096
num_active_tokens_override: # Optional override for sequence lengths
prefill_queue_size: 0
allocate_prefill_query: false
benchmark_iterations: # Set to a number (e.g., 1000) to enable runtime benchmarking
merge_with_existing_stats: false
subblock_stats_filename: "subblock_stats.json"
moe_stats_filename: "moe_stats.json"
runtime_stats:
backend: trt_torch

scoring:
descriptor: ${descriptor}
solutions_to_validate:
skip_existing_solutions: true

replacement_library_path: ${replacement_library_path}
solutions_path: ${to_path:${puzzle_dir}/single_sequence_replacement_solutions.json}
teacher_dir: ${to_path:${teacher_dir}}
output_dir: ${puzzle_dir}/single_sequence_replacement_solutions--validation

eval_samples: 128
micro_batch_size: 1
seed: 42
shuffle_seed: 444
dataset_path: ${dataset_path}

mip:
single_block_replacement_validation_dir: ${to_path:${scoring.output_dir}}
subblock_stats_path: ${to_path:${puzzle_dir}/${calc_subblock_stats.subblock_stats_filename}}
output_path: ${to_path:${puzzle_dir}/mip/puzzle_solutions}
gathered_metrics_path:
puzzle_profile:

# puzzle_profile:
objective: metrics.cosine_embedding_loss_hidden_states
bigger_is_better: false

subblock_stats_args:
- batch_size: 96
weights_dtype: torch.bfloat16
activations_dtype: torch.bfloat16
kv_cache_dtype: torch.bfloat16

report_additional_costs:
- stats.memory_mib
- stats.num_params
- stats.num_kv_heads
- stats.has_attention
- stats.has_ffn
- stats.kv_cache_memory_mib
- stats.attention_memory_mib
- stats.ffn_memory_mib
- stats.ffn_num_params
- stats.attention_num_params

human_constraints:
target_memory: 45_000
num_params: 3_000_000_000

mip_constraints:
metric_overrides:
max_seconds_per_solution: 60

realize_model:
descriptor: ${descriptor}
teacher_dir: ${to_path:${teacher_dir}}
tokenizer_name: ${to_path:${teacher_dir}}
replacement_library_path: ${replacement_library_path}
save_models: true
solutions_path: # Filled dynamically

# Validate params
skip_validation: false # To enable validation of the model solution set `skip_validation` as False
eval_samples: 128
micro_batch_size: 1
seed: 42
shuffle_seed: 444
dataset_path: ${dataset_path}

nccl_timeout_minutes: ${timedelta_minutes:10}

# This section redirects Hydra outputs
hydra:
run:
dir: ${puzzle_dir}/hydra_logs/${now:%Y-%m-%d}/${now:%H-%M-%S}

Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
defaults:
- gptoss-20b
- _self_

# Input Hugging Face model to compress
input_hf_model_path: /workspace/hf_models/openai/gpt-oss-20b

# Dataset path for pruning and NAS scoring
dataset_path: /workspace/datasets/Nemotron-Post-Training-Dataset-v2

# Working directory for compression outputs
puzzle_dir: /workspace/puzzle_dir

# MIP memory constraint (in MiB)
mip:
human_constraints:
target_memory: 16_000 # 45 GiB
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Align target_memory with the 45 GiB annotation.

Line 17 is inconsistent: 16_000 MiB is ~15.6 GiB, not 45 GiB. Please update either the value or the comment so the constraint is unambiguous.

Possible fix (if 16,000 MiB is intended)
-    target_memory: 16_000 # 45 GiB
+    target_memory: 16_000 # ~15.6 GiB
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
target_memory: 16_000 # 45 GiB
target_memory: 16_000 # ~15.6 GiB
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@examples/puzzletron/configs/gptoss-20b_remove_experts_memory/gptoss-20b_remove_experts_memory.yaml`
at line 17, The target_memory value and its comment disagree: update the
target_memory entry (symbol name: target_memory) so the numeric MiB matches the
annotated 45 GiB or adjust the comment to the intended MiB; either set
target_memory to 46_080 (45 GiB in MiB) to keep the comment, or change the
comment to match 16_000 MiB (≈15.6 GiB) so they are consistent.

Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
defaults:
- pruning_defaults

eval_samples: 2500 #10
activations_log_dir: ${puzzle_dir}/pruning/pruning_scores/expert_removal/${pruning.experiment_id}

pruning_mixin:
_target_: modelopt.torch.puzzletron.pruning.expert_removal_pruning_mixin.ExpertRemovalPruningMixIn
layer_descriptor:
_target_: modelopt.torch.puzzletron.anymodel.models.gpt_oss.gpt_oss_model_descriptor.GptOssExpertRemovalLayerDescriptor
target_name: "mlp.router"

hook_class: ${get_object:modelopt.torch.nas.plugins.megatron_hooks.base_hooks.RankedChoiceVotingHook}
activation_hooks_kwargs: # Additional kwargs to pass to the hook init

Comment on lines +13 to +15
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify whether activation_hooks_kwargs is ever unpacked as kwargs
rg -n -C3 'activation_hooks_kwargs'
rg -n -P '\*\*\s*[^,)\s]*activation_hooks_kwargs'

Repository: NVIDIA/Model-Optimizer

Length of output: 50380


Use an explicit empty mapping for activation_hooks_kwargs.

Line 14 currently resolves to null in YAML. Hook classes expect activation_hooks_kwargs to be a dict and call .get() method on it, which will fail with an AttributeError if the value is None. Change to activation_hooks_kwargs: {} for no extra args.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@examples/puzzletron/configs/gptoss-20b_remove_experts_memory/pruning/ffn_pruning.yaml`
around lines 13 - 15, The YAML sets activation_hooks_kwargs to null which will
cause an AttributeError when the hook (e.g.,
modelopt.torch.nas.plugins.megatron_hooks.base_hooks.RankedChoiceVotingHook)
calls .get(); change activation_hooks_kwargs to an explicit empty mapping
(activation_hooks_kwargs: {}) so the hook receives a dict even when no extra
args are provided.

num_experts_to_keep_list: [24, 16, 8] # num_experts in teacher is 128
mlp_init_mode: "ExpertRemoval"
mlp_init_config_yaml:
expert_scores_key: "expert_ranks"
layer_prefix_template: "model.layers.{layer_idx}.mlp.router"

Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
defaults:
- /validate_model_defaults

model_name_or_path: ${teacher_dir}
experiment_id: ${pruning.eval_samples}samples_diverse_mini
activations_log_dir: ???
activation_hooks_kwargs: ???

descriptor: ${descriptor}

# Data:
eval_samples: 10_000
micro_batch_size: 1
dataset_path: ${dataset_path}
val_dataset_name: train

# Prune ckpts
pruned_ckpts_output_dir: ${puzzle_dir}/pruning/${pruning.experiment_id}

## FFN pruning
ffn_list:
mlp_init_mode: "Truncate" # PruneByActivationsLog

## KV-heads pruning
n_heads_in_group_list:
gqa_init_mode: "AverageKV"

## Hidden dimension pruning
hidden_size_list:
hidden_size_init_mode: "PruneByChannelRanking"
linear_init_mode: "FromTeacher"

mlp_init_config_yaml:
activations_log_dir: ${pruning.activations_log_dir}
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
model_dtype: torch.bfloat16 # dtype to cast the model for validate_model
autocast_dtype: torch.bfloat16 # dtype for torch.autocast for validate_model
block_size: 8192
bos_rate: 0.5
data_column: messages
val_dataset_name: valid
shuffle_seed: 81436
seed: 42
fim_rate: 0
fim_spm_rate: 0
source_datasets_to_discard:
varlen: false
Comment on lines +11 to +12
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Fix YAML nesting for source_datasets_to_discard.

Line 11 starts a mapping, but Line 12 is not indented. That makes source_datasets_to_discard null and varlen a top-level key.

Proposed fix
 source_datasets_to_discard:
-varlen: false
+  varlen: false
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
source_datasets_to_discard:
varlen: false
source_datasets_to_discard:
varlen: false
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@examples/puzzletron/configs/gptoss-20b_remove_experts_memory/validate_model_defaults.yaml`
around lines 11 - 12, The YAML mapping for source_datasets_to_discard is broken
because its child key varlen is not indented, making source_datasets_to_discard
null and varlen a top-level key; fix by nesting varlen (and any other child
keys) under source_datasets_to_discard with proper indentation so that
source_datasets_to_discard contains the varlen entry (e.g., indent varlen one
level beneath source_datasets_to_discard).

write_results: false
calc_losses_on_cpu: false
activations_log_dir:
model_name_or_path:
load_dataset_fn: ${get_object:modelopt.torch.puzzletron.utils.data.dataloaders.load_from_disk_fn}

Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
defaults:
- /validate_model_defaults
- _self_

solutions_to_validate:
skip_validation: false
save_models: false
bigger_is_better: false
sort_solutions_by:
calculate_full_score_ablations: false
Comment on lines +5 to +10
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

solutions_to_validate block is not nested.

Lines 6-10 need to be indented under Line 5. Current structure makes solutions_to_validate null and moves all controls to root scope.

Proposed fix
 solutions_to_validate:
-skip_validation: false
-save_models: false
-bigger_is_better: false
-sort_solutions_by:
-calculate_full_score_ablations: false
+  skip_validation: false
+  save_models: false
+  bigger_is_better: false
+  sort_solutions_by:
+  calculate_full_score_ablations: false
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
solutions_to_validate:
skip_validation: false
save_models: false
bigger_is_better: false
sort_solutions_by:
calculate_full_score_ablations: false
solutions_to_validate:
skip_validation: false
save_models: false
bigger_is_better: false
sort_solutions_by:
calculate_full_score_ablations: false
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@examples/puzzletron/configs/gptoss-20b_remove_experts_memory/validate_solutions_defaults.yaml`
around lines 5 - 10, The YAML block "solutions_to_validate" is not nested so its
fields become root-level and the block is null; fix by indenting the keys
(skip_validation, save_models, bigger_is_better, sort_solutions_by,
calculate_full_score_ablations) under the "solutions_to_validate" mapping so
they are children of that key (ensure consistent spacing—e.g., two spaces—or
follow the repo's YAML indentation convention) while preserving the existing key
names and boolean values.


Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ defaults:
- _self_

puzzle_dir: ???
descriptor: llama
teacher_dir: ${puzzle_dir}/ckpts/teacher/
replacement_library_path: ${puzzle_dir}/replacement_library.json
dataset_path: ??? # ppath to Nemotron-Post-Training-Dataset-v2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Minor typo in comment.

"ppath" should be "path".

📝 Proposed fix
-dataset_path: ??? # ppath to Nemotron-Post-Training-Dataset-v2
+dataset_path: ??? # path to Nemotron-Post-Training-Dataset-v2
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
dataset_path: ??? # ppath to Nemotron-Post-Training-Dataset-v2
dataset_path: ??? # path to Nemotron-Post-Training-Dataset-v2
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/puzzletron/configs/llama-3_1-8B_pruneffn_memory/Llama-3_1-8B.yaml`
at line 13, Fix the typo in the comment next to the config key dataset_path:
change "ppath" to "path" so the line reads a correct inline comment referring to
the path to Nemotron-Post-Training-Dataset-v2; update the comment text in the
Llama-3_1-8B.yaml entry for dataset_path accordingly.

Expand All @@ -32,6 +33,7 @@ calc_subblock_stats:
backend: trt_torch

scoring:
descriptor: ${descriptor}
solutions_to_validate:
skip_existing_solutions: true

Expand Down Expand Up @@ -84,6 +86,7 @@ mip:
max_seconds_per_solution: 60

realize_model:
descriptor: ${descriptor}
teacher_dir: ${to_path:${teacher_dir}}
tokenizer_name: ${to_path:${teacher_dir}}
replacement_library_path: ${replacement_library_path}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,11 @@ puzzle_dir: /workspace/puzzle_dir
mip:
human_constraints:
target_memory: 78_000 # 78 GiB
# Memory sweep configuration (optional)
sweep:
enabled: false
memory_compression_rates: [0.5, 0.6, 0.7, 0.8, 0.9]
output_csv: ${puzzle_dir}/mip_sweep_results.csv

# FFN intermediate sizes to search over (heterogeneous architecture)
pruning:
Expand Down
Loading
Loading