Skip to content

Merge mbridge distillation for any_model#1036

Open
danielkorzekwa wants to merge 9 commits intodkorzekwa/anymodel_tutorialfrom
dkorzekwa/anymodel_mbridgedist
Open

Merge mbridge distillation for any_model#1036
danielkorzekwa wants to merge 9 commits intodkorzekwa/anymodel_tutorialfrom
dkorzekwa/anymodel_mbridgedist

Conversation

@danielkorzekwa
Copy link

What does this PR do?

Merge anymodel mbridge distillation

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
@danielkorzekwa danielkorzekwa requested a review from a team as a code owner March 13, 2026 17:51
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 13, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

🗂️ Base branches to auto review (3)
  • main
  • release/.*
  • feature/.*

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: f3cf157a-736c-454f-9179-69b95399f03a

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch dkorzekwa/anymodel_mbridgedist
📝 Coding Plan
  • Generate coding plan for human review comments

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty much everything in this PR seems like we should instead merge to M-Bridge. Are we confident enough to upstream these changes?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are not confident, e.g., we would need to talk to mbrdige/megatron-lm people on that first, align with their plans for heterogenous support. Let's think about it once puzzletron is in main.

We also have to do support for gpt-oss and mamba, so it is not the best time to merge it to mcore

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nemo:26.04 container code freeze is in 2 weeks. Lets make sure we raise a PR for required changes to M-Bridge before that so we can see what can and cannot be upstreamed

Comment on lines +56 to +59
"--master-addr",
"127.0.0.1", # Explicitly set master address
"--master-port",
str(get_free_port()), # Pass port directly to torchrun to avoid conflicts
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this necessary? I've never had the need to manually set these

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needed, somehow on interactive cluster node, the default port is already in use

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Comment on lines +80 to +83
"--hf-export-path",
str(hf_export_dir),
"--hf-model",
"meta-llama/Llama-3.1-8B-Instruct",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we change these argparse arguments to be underscore format as well so we have consistenct with other arguments?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants