Feat/minimax m2.5 support by xs1997zju · Pull Request #1929 · THUDM/slime

xs1997zju · 2026-05-21T03:32:34Z

Summary

Add full integration for MiniMax-M2.5 (256 experts, top-8 routing), including:

Model spec plugin (slime_plugins/models/minimax_m2.py): Custom SelfAttention with full-dimension QK Norm (RMSNorm over all heads concatenated, with TP gather/scatter)
mbridge weight bridge (slime_plugins/mbridge/minimax_m2.py): HF ↔ Megatron weight mapping extending Qwen2MoEBridge
Megatron-to-HF converter (slime/backends/megatron_utils/megatron_to_hf/minimax_m2.py): Reverse conversion for saving trained checkpoints back to HF format
Shell scripts (scripts/): Model architecture args, RL training launch script, and 3-script HF ↔ Megatron weight conversion pipeline

Key architecture differences from standard Qwen2MoE

Feature	MiniMax-M2.5	Qwen2MoE
MoE prefix	`block_sparse_moe` (w1/w2/w3)	`mlp`
QK Norm	Full-dimension (all heads concat)	Per-head
Router	Sigmoid + `e_score_correction_bias`	Softmax
RoPE	Partial (50%)	Full
Experts	256 × 62 layers	varies

Add full integration for MiniMax-M2.5, a 229B MoE model with 256 experts and top-8 routing. This includes: - Model spec plugin with custom SelfAttention for full-dimension QK Norm (RMSNorm over all heads concatenated, with TP gather/scatter) - mbridge weight bridge (HF <-> Megatron conversion via Qwen2MoEBridge) - Megatron-to-HF converter for saving trained checkpoints - Shell scripts: model args, RL training launch, HF<->Megatron weight conversion (3-script pipeline) Key architecture differences from standard Qwen2MoE: - block_sparse_moe prefix with w1/w2/w3 expert naming - Full-dimension QK Norm (q_norm/k_norm, not per-head) - Sigmoid router with e_score_correction_bias - Partial RoPE (rotary_percent=0.5) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

zhangxinsen and others added 2 commits May 21, 2026 11:25

fix: update actor-num-nodes to 16 for MiniMax-M2.5 training script

f3d74b3

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat/minimax m2.5 support#1929

Feat/minimax m2.5 support#1929
xs1997zju wants to merge 2 commits into
THUDM:mainfrom
xs1997zju:feat/minimax-m2.5-support

xs1997zju commented May 21, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

xs1997zju commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Key architecture differences from standard Qwen2MoE

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

xs1997zju commented May 21, 2026 •

edited

Loading