Skip to content

[feature] enable improved Group Linear module for MoE #2885

@erhoo82

Description

@erhoo82

User problem

  • Unfused expert GEMMs and cast functions
  • Unfused activation func

Desired outcome

Enable all the features in the group linear module
MXFP8 fusion: GroupGEMM + (d)SwiGLU + quant + swizzle (TE/#2769, MCore#3971, Mbridge#2841)
Grouped MXFP8 quantization (TE/#2769, MCore#3971)
Use cublas fused GEMMs for unfused group GEMM cases (TE/#2769, MCore#3971)

Alternatives or workarounds considered

No response

Affected area

area:perf

Urgency / use case

Important but not blocking

Environment

No response

Metadata

Metadata

Assignees

Labels

26.04.01performanceperformance/releasePerformance items related with NeMo releasetrackingTracking issue for an ongoing project with smaller steps

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions