-
Notifications
You must be signed in to change notification settings - Fork 303
Merge any_model tutorial #1035
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: feature/puzzletron
Are you sure you want to change the base?
Merge any_model tutorial #1035
Changes from all commits
e82164f
2099df3
eb5cf8a
c9de41c
3c1bc1f
8357136
6cc2194
ee4e1e3
449b523
fb27bba
b350f82
fafe5a3
e988248
c717852
030f126
8dcdfbf
70df0df
bb56662
ecd953e
ee8f538
c9b76a1
6e3af61
0ad6d92
995eb1a
34081c9
ed5c00f
993b5ec
6e9f03b
e8b7a7d
47414d5
a8305d8
68421a5
d6b8028
ecd2341
f9d845d
d171b01
722da90
934ab2f
0f14ec3
dcb9e02
0c9ea5d
5b310e2
4f82b1c
176a435
02e2c9b
92c4419
aa1eb3e
2b84a96
fb838c0
13378ff
47ca0e3
96112f7
cb6b182
670bb34
0e1b591
ca845ec
be825bc
7fd1afa
7d7b609
249af9d
b80583c
88b1b13
1dd742e
4a6ebbe
585f0ed
7fb5d9a
75d3d69
0e5722d
2dd9735
3561de5
27866de
a012fe6
52922a4
c234fb4
53dcd10
69d9648
4a692dc
e795f0c
631306c
dc77be2
b76e0ef
5cadc65
151081c
36daa6d
960b8ce
b47f846
500568e
46d7d41
152d067
7f95d27
6d895e8
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,14 @@ | ||
|
|
||
| ## GptOss | ||
|
|
||
| With this release Puzzle algorithm supports only experts removal for `Gpt-Oss`. | ||
|
|
||
| This model comes as a quantized checkpoint i.e. MoE experts matrices are quantized with _MXFP4_ format. | ||
| In the pruning steps puzzle utilizes decompressed model (back to BF16) for statistics and scores computation. | ||
| This means, during the conversion to puzzle format we decompress the model and store it as a BF16. | ||
| Once the pruning is done i.e. experts to be removed are identified and the process is finished, user may want to get back the _MXFP4_ format of the checkpoint. | ||
| To do so, there is an additional script, that takes the original and the pruned checkpoint and outputs pruned checkpoint in _MXFP4_ format. | ||
|
|
||
| ```bash | ||
| python -m modelopt.torch.puzzletron.anymodel.models.gpt_oss.gpt_oss_pruned_to_mxfp4 --student-path /workspaces/any_model_gpt_oss/mip/puzzle_solutions/stats_num_params_18014757184/solutions--checkpoints/solution_0/ --original-path /workspaces/source_model_checkpoints/openai_gpt-oss-20b/ --output-path /workspaces/any_model_gpt_oss/mip/puzzle_solutions/stats_num_params_18014757184/solutions--checkpoints/mxfp4-ckpt/ --num-layers 24 | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Clarify example paths and explain generated folder names. The example command uses hardcoded paths including a folder named
📋 Suggested documentation additionAdd explanatory text before the code block: To do so, there is an additional script, that takes the original and the pruned checkpoint and outputs pruned checkpoint in _MXFP4_ format.
+
+**Note**: The example below shows sample paths. Replace them with your actual paths:
+- `--student-path`: Path to your pruned checkpoint (the folder name like `stats_num_params_18014757184` is auto-generated during pruning based on model statistics)
+- `--original-path`: Path to your original quantized model
+- `--output-path`: Desired output location for the re-quantized checkpoint
+- `--num-layers`: Number of layers in your model (24 for gpt-oss-20b)
```bash🤖 Prompt for AI Agents |
||
| ``` | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,110 @@ | ||
| defaults: | ||
| - pruning: ffn_pruning | ||
| - scoring: ../validate_solutions_defaults | ||
| - realize_model: ../validate_solutions_defaults | ||
| - bypass: | ||
| - override hydra/hydra_logging: disabled | ||
| - _self_ | ||
|
|
||
| puzzle_dir: ??? | ||
| descriptor: gpt_oss | ||
| teacher_dir: ${puzzle_dir}/ckpts/teacher/ | ||
| replacement_library_path: ${puzzle_dir}/replacement_library.json | ||
| dataset_path: ??? # path to Nemotron-Post-Training-Dataset-v2 | ||
|
|
||
| skip_realize_model: false | ||
|
|
||
| build_replacement_library: | ||
| add_ffn_no_ops: true | ||
| add_attention_no_ops: true | ||
|
|
||
| calc_subblock_stats: | ||
| batch_sizes: [64, 96, 128] | ||
| prefill_seq_len: 4096 | ||
| generation_seq_len: 4096 | ||
| num_active_tokens_override: # Optional override for sequence lengths | ||
| prefill_queue_size: 0 | ||
| allocate_prefill_query: false | ||
| benchmark_iterations: # Set to a number (e.g., 1000) to enable runtime benchmarking | ||
| merge_with_existing_stats: false | ||
| subblock_stats_filename: "subblock_stats.json" | ||
| moe_stats_filename: "moe_stats.json" | ||
| runtime_stats: | ||
| backend: trt_torch | ||
|
|
||
| scoring: | ||
| descriptor: ${descriptor} | ||
| solutions_to_validate: | ||
| skip_existing_solutions: true | ||
|
|
||
| replacement_library_path: ${replacement_library_path} | ||
| solutions_path: ${to_path:${puzzle_dir}/single_sequence_replacement_solutions.json} | ||
| teacher_dir: ${to_path:${teacher_dir}} | ||
| output_dir: ${puzzle_dir}/single_sequence_replacement_solutions--validation | ||
|
|
||
| eval_samples: 128 | ||
| micro_batch_size: 1 | ||
| seed: 42 | ||
| shuffle_seed: 444 | ||
| dataset_path: ${dataset_path} | ||
|
|
||
| mip: | ||
| single_block_replacement_validation_dir: ${to_path:${scoring.output_dir}} | ||
| subblock_stats_path: ${to_path:${puzzle_dir}/${calc_subblock_stats.subblock_stats_filename}} | ||
| output_path: ${to_path:${puzzle_dir}/mip/puzzle_solutions} | ||
| gathered_metrics_path: | ||
| puzzle_profile: | ||
|
|
||
| # puzzle_profile: | ||
| objective: metrics.cosine_embedding_loss_hidden_states | ||
| bigger_is_better: false | ||
|
|
||
| subblock_stats_args: | ||
| - batch_size: 96 | ||
| weights_dtype: torch.bfloat16 | ||
| activations_dtype: torch.bfloat16 | ||
| kv_cache_dtype: torch.bfloat16 | ||
|
|
||
| report_additional_costs: | ||
| - stats.memory_mib | ||
| - stats.num_params | ||
| - stats.num_kv_heads | ||
| - stats.has_attention | ||
| - stats.has_ffn | ||
| - stats.kv_cache_memory_mib | ||
| - stats.attention_memory_mib | ||
| - stats.ffn_memory_mib | ||
| - stats.ffn_num_params | ||
| - stats.attention_num_params | ||
|
|
||
| human_constraints: | ||
| target_memory: 45_000 | ||
| num_params: 3_000_000_000 | ||
|
|
||
| mip_constraints: | ||
| metric_overrides: | ||
| max_seconds_per_solution: 60 | ||
|
|
||
| realize_model: | ||
| descriptor: ${descriptor} | ||
| teacher_dir: ${to_path:${teacher_dir}} | ||
| tokenizer_name: ${to_path:${teacher_dir}} | ||
| replacement_library_path: ${replacement_library_path} | ||
| save_models: true | ||
| solutions_path: # Filled dynamically | ||
|
|
||
| # Validate params | ||
| skip_validation: false # To enable validation of the model solution set `skip_validation` as False | ||
| eval_samples: 128 | ||
| micro_batch_size: 1 | ||
| seed: 42 | ||
| shuffle_seed: 444 | ||
| dataset_path: ${dataset_path} | ||
|
|
||
| nccl_timeout_minutes: ${timedelta_minutes:10} | ||
|
|
||
| # This section redirects Hydra outputs | ||
| hydra: | ||
| run: | ||
| dir: ${puzzle_dir}/hydra_logs/${now:%Y-%m-%d}/${now:%H-%M-%S} | ||
|
|
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| @@ -0,0 +1,17 @@ | ||||||
| defaults: | ||||||
| - gptoss-20b | ||||||
| - _self_ | ||||||
|
|
||||||
| # Input Hugging Face model to compress | ||||||
| input_hf_model_path: /workspace/hf_models/openai/gpt-oss-20b | ||||||
|
|
||||||
| # Dataset path for pruning and NAS scoring | ||||||
| dataset_path: /workspace/datasets/Nemotron-Post-Training-Dataset-v2 | ||||||
|
|
||||||
| # Working directory for compression outputs | ||||||
| puzzle_dir: /workspace/puzzle_dir | ||||||
|
|
||||||
| # MIP memory constraint (in MiB) | ||||||
| mip: | ||||||
| human_constraints: | ||||||
| target_memory: 16_000 # 45 GiB | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Align Line 17 is inconsistent: Possible fix (if 16,000 MiB is intended)- target_memory: 16_000 # 45 GiB
+ target_memory: 16_000 # ~15.6 GiB📝 Committable suggestion
Suggested change
🤖 Prompt for AI Agents |
||||||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,21 @@ | ||
| defaults: | ||
| - pruning_defaults | ||
|
|
||
| eval_samples: 2500 #10 | ||
| activations_log_dir: ${puzzle_dir}/pruning/pruning_scores/expert_removal/${pruning.experiment_id} | ||
|
|
||
| pruning_mixin: | ||
| _target_: modelopt.torch.puzzletron.pruning.expert_removal_pruning_mixin.ExpertRemovalPruningMixIn | ||
| layer_descriptor: | ||
| _target_: modelopt.torch.puzzletron.anymodel.models.gpt_oss.gpt_oss_model_descriptor.GptOssExpertRemovalLayerDescriptor | ||
| target_name: "mlp.router" | ||
|
|
||
| hook_class: ${get_object:modelopt.torch.nas.plugins.megatron_hooks.base_hooks.RankedChoiceVotingHook} | ||
| activation_hooks_kwargs: # Additional kwargs to pass to the hook init | ||
|
|
||
|
Comment on lines
+13
to
+15
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧩 Analysis chain🏁 Script executed: #!/bin/bash
# Verify whether activation_hooks_kwargs is ever unpacked as kwargs
rg -n -C3 'activation_hooks_kwargs'
rg -n -P '\*\*\s*[^,)\s]*activation_hooks_kwargs'Repository: NVIDIA/Model-Optimizer Length of output: 50380 Use an explicit empty mapping for Line 14 currently resolves to 🤖 Prompt for AI Agents |
||
| num_experts_to_keep_list: [24, 16, 8] # num_experts in teacher is 128 | ||
| mlp_init_mode: "ExpertRemoval" | ||
| mlp_init_config_yaml: | ||
| expert_scores_key: "expert_ranks" | ||
| layer_prefix_template: "model.layers.{layer_idx}.mlp.router" | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,34 @@ | ||
| defaults: | ||
| - /validate_model_defaults | ||
|
|
||
| model_name_or_path: ${teacher_dir} | ||
| experiment_id: ${pruning.eval_samples}samples_diverse_mini | ||
| activations_log_dir: ??? | ||
| activation_hooks_kwargs: ??? | ||
|
|
||
| descriptor: ${descriptor} | ||
|
|
||
| # Data: | ||
| eval_samples: 10_000 | ||
| micro_batch_size: 1 | ||
| dataset_path: ${dataset_path} | ||
| val_dataset_name: train | ||
|
|
||
| # Prune ckpts | ||
| pruned_ckpts_output_dir: ${puzzle_dir}/pruning/${pruning.experiment_id} | ||
|
|
||
| ## FFN pruning | ||
| ffn_list: | ||
| mlp_init_mode: "Truncate" # PruneByActivationsLog | ||
|
|
||
| ## KV-heads pruning | ||
| n_heads_in_group_list: | ||
| gqa_init_mode: "AverageKV" | ||
|
|
||
| ## Hidden dimension pruning | ||
| hidden_size_list: | ||
| hidden_size_init_mode: "PruneByChannelRanking" | ||
| linear_init_mode: "FromTeacher" | ||
|
|
||
| mlp_init_config_yaml: | ||
| activations_log_dir: ${pruning.activations_log_dir} |
| Original file line number | Diff line number | Diff line change | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,18 @@ | ||||||||||
| model_dtype: torch.bfloat16 # dtype to cast the model for validate_model | ||||||||||
| autocast_dtype: torch.bfloat16 # dtype for torch.autocast for validate_model | ||||||||||
| block_size: 8192 | ||||||||||
| bos_rate: 0.5 | ||||||||||
| data_column: messages | ||||||||||
| val_dataset_name: valid | ||||||||||
| shuffle_seed: 81436 | ||||||||||
| seed: 42 | ||||||||||
| fim_rate: 0 | ||||||||||
| fim_spm_rate: 0 | ||||||||||
| source_datasets_to_discard: | ||||||||||
| varlen: false | ||||||||||
|
Comment on lines
+11
to
+12
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fix YAML nesting for Line 11 starts a mapping, but Line 12 is not indented. That makes Proposed fix source_datasets_to_discard:
-varlen: false
+ varlen: false📝 Committable suggestion
Suggested change
🤖 Prompt for AI Agents |
||||||||||
| write_results: false | ||||||||||
| calc_losses_on_cpu: false | ||||||||||
| activations_log_dir: | ||||||||||
| model_name_or_path: | ||||||||||
| load_dataset_fn: ${get_object:modelopt.torch.puzzletron.utils.data.dataloaders.load_from_disk_fn} | ||||||||||
|
|
||||||||||
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,11 @@ | ||||||||||||||||||||||||||
| defaults: | ||||||||||||||||||||||||||
| - /validate_model_defaults | ||||||||||||||||||||||||||
| - _self_ | ||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
| solutions_to_validate: | ||||||||||||||||||||||||||
| skip_validation: false | ||||||||||||||||||||||||||
| save_models: false | ||||||||||||||||||||||||||
| bigger_is_better: false | ||||||||||||||||||||||||||
| sort_solutions_by: | ||||||||||||||||||||||||||
| calculate_full_score_ablations: false | ||||||||||||||||||||||||||
|
Comment on lines
+5
to
+10
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Lines 6-10 need to be indented under Line 5. Current structure makes Proposed fix solutions_to_validate:
-skip_validation: false
-save_models: false
-bigger_is_better: false
-sort_solutions_by:
-calculate_full_score_ablations: false
+ skip_validation: false
+ save_models: false
+ bigger_is_better: false
+ sort_solutions_by:
+ calculate_full_score_ablations: false📝 Committable suggestion
Suggested change
🤖 Prompt for AI Agents |
||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
|
|
@@ -7,6 +7,7 @@ defaults: | |||||
| - _self_ | ||||||
|
|
||||||
| puzzle_dir: ??? | ||||||
| descriptor: llama | ||||||
| teacher_dir: ${puzzle_dir}/ckpts/teacher/ | ||||||
| replacement_library_path: ${puzzle_dir}/replacement_library.json | ||||||
| dataset_path: ??? # ppath to Nemotron-Post-Training-Dataset-v2 | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Minor typo in comment. "ppath" should be "path". 📝 Proposed fix-dataset_path: ??? # ppath to Nemotron-Post-Training-Dataset-v2
+dataset_path: ??? # path to Nemotron-Post-Training-Dataset-v2📝 Committable suggestion
Suggested change
🤖 Prompt for AI Agents |
||||||
|
|
@@ -32,6 +33,7 @@ calc_subblock_stats: | |||||
| backend: trt_torch | ||||||
|
|
||||||
| scoring: | ||||||
| descriptor: ${descriptor} | ||||||
| solutions_to_validate: | ||||||
| skip_existing_solutions: true | ||||||
|
|
||||||
|
|
@@ -84,6 +86,7 @@ mip: | |||||
| max_seconds_per_solution: 60 | ||||||
|
|
||||||
| realize_model: | ||||||
| descriptor: ${descriptor} | ||||||
| teacher_dir: ${to_path:${teacher_dir}} | ||||||
| tokenizer_name: ${to_path:${teacher_dir}} | ||||||
| replacement_library_path: ${replacement_library_path} | ||||||
|
|
||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how do we arrive at
stats_num_params_18014757184folder?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is computed automatically by Puzzletron, added to TODO to improve this doc.