-
Notifications
You must be signed in to change notification settings - Fork 263
Implement device_gemm_universal_preshuffle_instance for RDNA4 #3429
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Implement device_gemm_universal_preshuffle_instance for RDNA4 #3429
Conversation
ErwinTerpstra
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One small comment, beyond that it looks good!
library/include/ck/library/tensor_operation_instance/gpu/gemm_universal_preshuffle.hpp
Outdated
Show resolved
Hide resolved
d8e7a2e to
3dee146
Compare
|
Can we remove some code duplications between the examples? |
3dee146 to
6eb3f39
Compare
|
Apart from some refactoring in the examples code to remove duplications which it would be nice to have, it looks good to me |
6eb3f39 to
15248f6
Compare
I have removed it in the last commit (15248f6). Thanks. |
Proposed changes
Summary:
Implementation
FP8/FP16 WMMA examples
WMMA instances
add WMMA instances to existing tests
Checklist
Please put an
xinto the boxes that apply. You can also fill these out after creating the PR. If you're not sure, please don't hesitate to ask.clang-formaton all changed filesDiscussion
If this is a relatively large or complex change, feel free to start a discussion by explaining why you chose the solution you did and what alternatives you considered