Skip to content

Feat/minimax m2.5 support#1929

Open
xs1997zju wants to merge 2 commits into
THUDM:mainfrom
xs1997zju:feat/minimax-m2.5-support
Open

Feat/minimax m2.5 support#1929
xs1997zju wants to merge 2 commits into
THUDM:mainfrom
xs1997zju:feat/minimax-m2.5-support

Conversation

@xs1997zju
Copy link
Copy Markdown

@xs1997zju xs1997zju commented May 21, 2026

Summary

Add full integration for MiniMax-M2.5 (256 experts, top-8 routing), including:

  • Model spec plugin (slime_plugins/models/minimax_m2.py): Custom SelfAttention with full-dimension QK Norm (RMSNorm over all heads concatenated, with TP gather/scatter)
  • mbridge weight bridge (slime_plugins/mbridge/minimax_m2.py): HF ↔ Megatron weight mapping extending Qwen2MoEBridge
  • Megatron-to-HF converter (slime/backends/megatron_utils/megatron_to_hf/minimax_m2.py): Reverse conversion for saving trained checkpoints back to HF format
  • Shell scripts (scripts/): Model architecture args, RL training launch script, and 3-script HF ↔ Megatron weight conversion pipeline

Key architecture differences from standard Qwen2MoE

Feature MiniMax-M2.5 Qwen2MoE
MoE prefix block_sparse_moe (w1/w2/w3) mlp
QK Norm Full-dimension (all heads concat) Per-head
Router Sigmoid + e_score_correction_bias Softmax
RoPE Partial (50%) Full
Experts 256 × 62 layers varies

zhangxinsen and others added 2 commits May 21, 2026 11:25
Add full integration for MiniMax-M2.5, a 229B MoE model with 256 experts
and top-8 routing. This includes:

- Model spec plugin with custom SelfAttention for full-dimension QK Norm
  (RMSNorm over all heads concatenated, with TP gather/scatter)
- mbridge weight bridge (HF <-> Megatron conversion via Qwen2MoEBridge)
- Megatron-to-HF converter for saving trained checkpoints
- Shell scripts: model args, RL training launch, HF<->Megatron weight
  conversion (3-script pipeline)

Key architecture differences from standard Qwen2MoE:
- block_sparse_moe prefix with w1/w2/w3 expert naming
- Full-dimension QK Norm (q_norm/k_norm, not per-head)
- Sigmoid router with e_score_correction_bias
- Partial RoPE (rotary_percent=0.5)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant