Skip to content

Add option to replicate attn weights on FSDP or EP#3480

Open
gobbleturk wants to merge 7 commits intomainfrom
mattdavidow-embed-module-sharding
Open

Add option to replicate attn weights on FSDP or EP#3480
gobbleturk wants to merge 7 commits intomainfrom
mattdavidow-embed-module-sharding

Conversation

@gobbleturk
Copy link
Collaborator

@gobbleturk gobbleturk commented Mar 21, 2026

Not meant to be submitted.

Gives an easy (CLI) option for how to shard embed (embed-attn) one of three ways:

  • by both FSDP and EP (same as head)
  • by only EP (new default)
  • by only FSDP

We give these options and set only EP as default since it is more performant for large scale moe runs - we don't want to shard the embed dimension 4096 ways, but only EP=64 ways (we use EP since EP is 2D in our best configs)

@codecov
Copy link

codecov bot commented Mar 21, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant