Skip to content

Conversation

@yungshengtu
Copy link

Proposed changes

Summary:

  • Implementation

  • FP8/FP16 WMMA examples

  • WMMA instances

  • add WMMA instances to existing tests

Checklist

Please put an x into the boxes that apply. You can also fill these out after creating the PR. If you're not sure, please don't hesitate to ask.

  • I have added tests relevant to the introduced functionality, and the unit tests are passing locally
  • I have added the test to REGRESSION_TESTS list defined at the top of CMakeLists.txt in tests/CMakeLists.txt, IF the test takes more than 30 seconds to run.
  • I have added inline documentation which enables the maintainers with understanding the motivation
  • I have removed the stale documentation which is no longer relevant after this pull request
  • (If this change is user-facing) I have added release notes which provide the end users with a brief summary of the improvement from this pull request
  • I have run clang-format on all changed files
  • Any dependent changes have been merged

Discussion

If this is a relatively large or complex change, feel free to start a discussion by explaining why you chose the solution you did and what alternatives you considered

Copy link
Contributor

@ErwinTerpstra ErwinTerpstra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small comment, beyond that it looks good!

@yungshengtu yungshengtu self-assigned this Dec 18, 2025
@yungshengtu yungshengtu force-pushed the users/yungshengtu/implement-device_gemm_universal_preshuffle_instance-for-rdna4 branch from d8e7a2e to 3dee146 Compare December 18, 2025 09:09
@EnricoDeg EnricoDeg self-requested a review January 14, 2026 12:10
@EnricoDeg
Copy link
Contributor

Can we remove some code duplications between the examples?

@yungshengtu yungshengtu force-pushed the users/yungshengtu/implement-device_gemm_universal_preshuffle_instance-for-rdna4 branch from 3dee146 to 6eb3f39 Compare January 14, 2026 13:21
@EnricoDeg
Copy link
Contributor

Apart from some refactoring in the examples code to remove duplications which it would be nice to have, it looks good to me

@yungshengtu yungshengtu force-pushed the users/yungshengtu/implement-device_gemm_universal_preshuffle_instance-for-rdna4 branch from 6eb3f39 to 15248f6 Compare January 14, 2026 13:46
@yungshengtu
Copy link
Author

yungshengtu commented Jan 14, 2026

Can we remove some code duplications between the examples?

I have removed it in the last commit (15248f6). Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants