Skip to content

Conversation

@AAnoosheh
Copy link
Contributor

@AAnoosheh AAnoosheh commented Jan 23, 2026

What does this PR do?

Type of change: Bug Fix

Overview: Bug report appeared where this was an issue, since we save our own modelopt_run_config.yaml which has repr() issues. The main fix is happening in MCore's ProcessGroupCollection.__repr__, but this is a backup measure since we don't want to save it anyway, regardless. Megatron-LM does not have this issue since only M-Bridge's Model Provider classes extend TransformerConfig and store an extra _pg_collection attribute.

Usage

# Add a code snippet demonstrating how to use this

Testing

Before your PR is "Ready for review"

  • Make sure you read and follow Contributor guidelines and your commits are signed.
  • Is this change backward compatible?: Yes/No
  • Did you write any new necessary tests?: Yes/No
  • Did you add or update any necessary documentation?: Yes/No
  • Did you update Changelog?: Yes/No

Additional Information

Summary by CodeRabbit

  • Bug Fixes
    • Improved configuration key filtering in distributed training checkpoints to better preserve relevant configuration entries while maintaining necessary prefix-based filtering rules.

✏️ Tip: You can customize this high-level summary in your review settings.

Signed-off-by: Asha Anoosheh <aanoosheh@nvidia.com>
@AAnoosheh AAnoosheh requested a review from a team as a code owner January 23, 2026 17:33
@AAnoosheh AAnoosheh self-assigned this Jan 23, 2026
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 23, 2026

📝 Walkthrough

Walkthrough

The change modifies configuration key filtering logic in transformer config parsing. It replaces substring-based filtering of keys containing "tp_" with prefix-based filtering for keys starting with "tp_". This alters which config entries are preserved during processing.

Changes

Cohort / File(s) Summary
Configuration filtering logic
modelopt/torch/opt/plugins/mcore_dist_checkpointing.py
Changed config key filtering mechanism from DROP_SUBSTRINGS to DROP_STARTSWITH. Removed "tp_" from substring matching and added prefix-based check. Keys starting with "tp_" remain filtered; keys containing "tp_" in non-prefix positions are now preserved.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

🚥 Pre-merge checks | ✅ 1 | ❌ 2
❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Title check ⚠️ Warning The title mentions dropping 'pg_collection' but the actual change is from substring-based to prefix-based filtering of 'tp' keys, which are unrelated concerns. Update the title to accurately reflect the main change: from substring-based 'tp_' filtering to prefix-based filtering, or clarify the relationship between 'pg_collection' dropping and the 'tp' filtering logic change.
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@AAnoosheh AAnoosheh requested a review from ChenhanYu January 23, 2026 17:33
@codecov
Copy link

codecov bot commented Jan 23, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 74.17%. Comparing base (4f4558a) to head (f761fa2).

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #813   +/-   ##
=======================================
  Coverage   74.17%   74.17%           
=======================================
  Files         192      192           
  Lines       19246    19246           
=======================================
  Hits        14276    14276           
  Misses       4970     4970           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant