Skip to content

[megatron]support hybridep, Add moe_flex_dispatcher_backend argument and hybridep padding fix#8393

Open
tingmingzhong wants to merge 1 commit intomodelscope:mainfrom
tingmingzhong:main
Open

[megatron]support hybridep, Add moe_flex_dispatcher_backend argument and hybridep padding fix#8393
tingmingzhong wants to merge 1 commit intomodelscope:mainfrom
tingmingzhong:main

Conversation

@tingmingzhong
Copy link

Description:

What does this PR do?
This PR adds a new moe_flex_dispatcher_backend argument to MegatronArguments and fixes a hang issue when using the hybridep backend with the MoE flex token dispatcher.

Background

When using moe_token_dispatcher_type='flex' with the hybridep backend, the all-gather-into-tensor operation can hang due to inconsistent sequence lengths across ranks. This is resolved by padding all sequences to seq_length (or max_length) when moe_flex_dispatcher_backend='hybridep' is set.

Changes

swift/megatron/arguments/megatron_args.py: Added moe_flex_dispatcher_backend field with options 'deepep' and 'hybridep' (default None).
swift/megatron/utils/utils.py: Extended get_padding_to() to set padding_to = seq_length when moe_flex_dispatcher_backend == 'hybridep', preventing the all-gather-into-tensor hang.
tests/megatron/test_utils.py: Added 20 unit tests covering all get_padding_to() branches including the new hybridep logic.
Impact
Only affects the Megatron training path when moe_flex_dispatcher_backend='hybridep' is explicitly set. No impact on other configurations.

Experiment results

image

When use ms-swift training Qwen3-30B-A3B on a single-node 8-GPU B200 machine with TP=1, PP=1, and EP=8, HybridEP achieves nearly a 30% performance improvement.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a critical hang issue encountered when the MoE flex token dispatcher is used with the hybridep backend. The solution involves introducing a new configuration option to explicitly enable the hybridep backend and modifying the padding logic to ensure consistent sequence lengths across ranks, thereby resolving the all-gather-into-tensor hang. The changes are accompanied by extensive unit tests and have shown significant performance gains in specific training setups.

Highlights

  • New Argument for MoE Flex Dispatcher: Introduced a new argument, moe_flex_dispatcher_backend, to MegatronArguments to specify the backend for the MoE flex token dispatcher, with options 'deepep' and 'hybridep'.
  • HybridEP Padding Fix: Implemented a fix in get_padding_to() to prevent a hang issue when using the hybridep backend with the MoE flex token dispatcher. This is achieved by padding all sequences to seq_length or max_length.
  • Comprehensive Unit Tests: Added 20 new unit tests for the get_padding_to() utility function, covering various scenarios including the new hybridep padding logic, ensuring its correctness and robustness.
  • Performance Improvement: Demonstrated a nearly 30% performance improvement when using HybridEP for training Qwen3-30B-A3B on a single-node 8-GPU B200 machine with specific parallelism configurations.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for the hybridep backend for the MoE flex token dispatcher in Megatron. This is achieved by adding a new moe_flex_dispatcher_backend argument and modifying the padding logic in get_padding_to to prevent hangs, which is a solid improvement. The inclusion of a comprehensive new test suite for get_padding_to is particularly commendable, as it covers the new logic thoroughly and improves the overall robustness of the utility function. My review includes a couple of minor suggestions for the new test file to improve its long-term maintainability by ensuring the tested code is an exact mirror of the production code.

if args.context_parallel_size > 1:
padding_to = (padding_to or 1) * args.context_parallel_size
origin_padding_to = padding_to
fp8_format: Optional[str] = args.fp8_format or args.fp8
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

To keep the inlined function perfectly in sync with the source code in swift/megatron/utils/utils.py, it's better to use getattr for safety. This ensures the test accurately reflects the production code's behavior, improving maintainability.

Suggested change
fp8_format: Optional[str] = args.fp8_format or args.fp8
fp8_format = getattr(args, 'fp8_format', None) or getattr(args, 'fp8', None)

# padding to max seq_length to avoid hybridep all-gather-into-tensor hang
moe_backend: Optional[str] = getattr(args, 'moe_flex_dispatcher_backend', None)
if moe_backend == 'hybridep':
seq_length: Optional[int] = args.seq_length or args.max_length
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

To keep the inlined function perfectly in sync with the source code in swift/megatron/utils/utils.py, it's better to use getattr for safety. This ensures the test accurately reflects the production code's behavior, improving maintainability.

Suggested change
seq_length: Optional[int] = args.seq_length or args.max_length
seq_length = getattr(args, 'seq_length', None) or getattr(args, 'max_length', None)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant