Skip to content

Conversation

@carlushuang
Copy link
Collaborator

This PR adopt the idea from #22 and applied into the unified fmha pipeline.
And use 32x32x16 mfma to do 1st/2nd gemm.

performance:
4K seqlen, 128 hdim : 131T
8K seqlen, 128 hdim : 135T

@asroy asroy merged commit e71aa1d into main Nov 3, 2023
@carlushuang carlushuang deleted the q_persistent_unify branch November 3, 2023 09:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants