Merge mbridge distillation for any_model#1036
Merge mbridge distillation for any_model#1036danielkorzekwa wants to merge 9 commits intodkorzekwa/anymodel_tutorialfrom
Conversation
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. 🗂️ Base branches to auto review (3)
Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Comment |
There was a problem hiding this comment.
Pretty much everything in this PR seems like we should instead merge to M-Bridge. Are we confident enough to upstream these changes?
There was a problem hiding this comment.
We are not confident, e.g., we would need to talk to mbrdige/megatron-lm people on that first, align with their plans for heterogenous support. Let's think about it once puzzletron is in main.
We also have to do support for gpt-oss and mamba, so it is not the best time to merge it to mcore
There was a problem hiding this comment.
nemo:26.04 container code freeze is in 2 weeks. Lets make sure we raise a PR for required changes to M-Bridge before that so we can see what can and cannot be upstreamed
tests/examples/puzzletron/mbridge_distillation/test_distill_hf.py
Outdated
Show resolved
Hide resolved
tests/examples/puzzletron/mbridge_distillation/test_distill_hf.py
Outdated
Show resolved
Hide resolved
| "--master-addr", | ||
| "127.0.0.1", # Explicitly set master address | ||
| "--master-port", | ||
| str(get_free_port()), # Pass port directly to torchrun to avoid conflicts |
There was a problem hiding this comment.
is this necessary? I've never had the need to manually set these
There was a problem hiding this comment.
needed, somehow on interactive cluster node, the default port is already in use
tests/examples/puzzletron/mbridge_distillation/test_distill_hf.py
Outdated
Show resolved
Hide resolved
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
| "--hf-export-path", | ||
| str(hf_export_dir), | ||
| "--hf-model", | ||
| "meta-llama/Llama-3.1-8B-Instruct", |
There was a problem hiding this comment.
can we change these argparse arguments to be underscore format as well so we have consistenct with other arguments?
What does this PR do?
Merge anymodel mbridge distillation