[megatron]support hybridep, Add moe_flex_dispatcher_backend argument and hybridep padding fix by tingmingzhong · Pull Request #8393 · modelscope/ms-swift

tingmingzhong · 2026-03-20T10:17:50Z

Description:

What does this PR do?
This PR adds a new moe_flex_dispatcher_backend argument to MegatronArguments and fixes a hang issue when using the hybridep backend with the MoE flex token dispatcher.

Background

When using moe_token_dispatcher_type='flex' with the hybridep backend, the all-gather-into-tensor operation can hang due to inconsistent sequence lengths across ranks. This is resolved by padding all sequences to seq_length (or max_length) when moe_flex_dispatcher_backend='hybridep' is set.

Changes

swift/megatron/arguments/megatron_args.py: Added moe_flex_dispatcher_backend field with options 'deepep' and 'hybridep' (default None).
swift/megatron/utils/utils.py: Extended get_padding_to() to set padding_to = seq_length when moe_flex_dispatcher_backend == 'hybridep', preventing the all-gather-into-tensor hang.
tests/megatron/test_utils.py: Added 20 unit tests covering all get_padding_to() branches including the new hybridep logic.
Impact
Only affects the Megatron training path when moe_flex_dispatcher_backend='hybridep' is explicitly set. No impact on other configurations.

Experiment results

When use ms-swift training Qwen3-30B-A3B on a single-node 8-GPU B200 machine with TP=1, PP=1, and EP=8, HybridEP achieves nearly a 30% performance improvement.

gemini-code-assist · 2026-03-20T10:18:03Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a critical hang issue encountered when the MoE flex token dispatcher is used with the hybridep backend. The solution involves introducing a new configuration option to explicitly enable the hybridep backend and modifying the padding logic to ensure consistent sequence lengths across ranks, thereby resolving the all-gather-into-tensor hang. The changes are accompanied by extensive unit tests and have shown significant performance gains in specific training setups.

Highlights

New Argument for MoE Flex Dispatcher: Introduced a new argument, moe_flex_dispatcher_backend, to MegatronArguments to specify the backend for the MoE flex token dispatcher, with options 'deepep' and 'hybridep'.
HybridEP Padding Fix: Implemented a fix in get_padding_to() to prevent a hang issue when using the hybridep backend with the MoE flex token dispatcher. This is achieved by padding all sequences to seq_length or max_length.
Comprehensive Unit Tests: Added 20 new unit tests for the get_padding_to() utility function, covering various scenarios including the new hybridep padding logic, ensuring its correctness and robustness.
Performance Improvement: Demonstrated a nearly 30% performance improvement when using HybridEP for training Qwen3-30B-A3B on a single-node 8-GPU B200 machine with specific parallelism configurations.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces support for the hybridep backend for the MoE flex token dispatcher in Megatron. This is achieved by adding a new moe_flex_dispatcher_backend argument and modifying the padding logic in get_padding_to to prevent hangs, which is a solid improvement. The inclusion of a comprehensive new test suite for get_padding_to is particularly commendable, as it covers the new logic thoroughly and improves the overall robustness of the utility function. My review includes a couple of minor suggestions for the new test file to improve its long-term maintainability by ensuring the tested code is an exact mirror of the production code.

gemini-code-assist · 2026-03-20T10:19:30Z

tests/megatron/test_utils.py

+    if args.context_parallel_size > 1:
+        padding_to = (padding_to or 1) * args.context_parallel_size
+    origin_padding_to = padding_to
+    fp8_format: Optional[str] = args.fp8_format or args.fp8


To keep the inlined function perfectly in sync with the source code in swift/megatron/utils/utils.py, it's better to use getattr for safety. This ensures the test accurately reflects the production code's behavior, improving maintainability.

Suggested change

fp8_format: Optional[str] = args.fp8_format or args.fp8

fp8_format = getattr(args, 'fp8_format', None) or getattr(args, 'fp8', None)

gemini-code-assist · 2026-03-20T10:19:30Z

tests/megatron/test_utils.py

+    # padding to max seq_length to avoid hybridep all-gather-into-tensor hang
+    moe_backend: Optional[str] = getattr(args, 'moe_flex_dispatcher_backend', None)
+    if moe_backend == 'hybridep':
+        seq_length: Optional[int] = args.seq_length or args.max_length


To keep the inlined function perfectly in sync with the source code in swift/megatron/utils/utils.py, it's better to use getattr for safety. This ensures the test accurately reflects the production code's behavior, improving maintainability.

Suggested change

seq_length: Optional[int] = args.seq_length or args.max_length

seq_length = getattr(args, 'seq_length', None) or getattr(args, 'max_length', None)

[megatron]support hybridep

f5ea225

gemini-code-assist bot reviewed Mar 20, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[megatron]support hybridep, Add moe_flex_dispatcher_backend argument and hybridep padding fix#8393

[megatron]support hybridep, Add moe_flex_dispatcher_backend argument and hybridep padding fix#8393
tingmingzhong wants to merge 1 commit intomodelscope:mainfrom
tingmingzhong:main

tingmingzhong commented Mar 20, 2026

Uh oh!

gemini-code-assist bot commented Mar 20, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Mar 20, 2026

Uh oh!

gemini-code-assist bot Mar 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	fp8_format: Optional[str] = args.fp8_format or args.fp8
	fp8_format = getattr(args, 'fp8_format', None) or getattr(args, 'fp8', None)

	seq_length: Optional[int] = args.seq_length or args.max_length
	seq_length = getattr(args, 'seq_length', None) or getattr(args, 'max_length', None)

Conversation

tingmingzhong commented Mar 20, 2026

Description:

Background

Changes

Experiment results

Uh oh!

gemini-code-assist bot commented Mar 20, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant