[proof-of-concept] add MXFP8 pre-swizzling for gfx1250 by matthiasdiener · Pull Request #568 · ROCm/TransformerEngine

matthiasdiener · 2026-04-29T16:59:38Z

Description

Please include a brief summary of the changes, relevant motivation and context.

Fixes https://github.com/ROCm/frameworks-internal/issues/16428

Type of change

Documentation change (change only to the documentation, either a fix or a new content)
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Infra/Build change
Code refactoring

Changes

Please list the changes introduced in this PR:

Change A
Change B

Checklist:

I have read and followed the contributing guidelines
The functionality is complete
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

alextmagro

Hi Matthias, a few comments. I also assume you are still planning on adding in the hooks to scale swizzle when we're on gfx1250? I believe there were hooks in all of common, pytorch and jax. These PRs removed them, so would be a partial revert.

#420
#424
#442

alextmagro · 2026-05-05T14:29:00Z

-    asm volatile("ds_swizzle_b32 %0, %1 offset:0x041F\n\t"
-                 "s_waitcnt lgkmcnt(0)" : "=v"(r) : "v"(v));
-    return r;
+    return __shfl_xor(v, 1);


Do we still need these helper functions now that we're just doing a __shfl_xor?

This change is only inadvertently part of this PR, it is already part of #571. Will revert here.

alextmagro · 2026-05-05T14:34:30Z

+// Col-wise: input is [K_scale, M] row-major (M contiguous), representing
+// the column-wise scale matrix logically shaped [M, K_scale].
+// Logical (m, k) maps to physical address k * original_M + m.
+__global__ void __launch_bounds__(256)


This function is almost identical to the rowwise scaling func, can we merge them and template? Ideally we could have 1 thread write both the colwise and rowwise scale if we are doing both.

alextmagro · 2026-05-05T14:34:48Z

+  const int k = idx % K_scale;
+
+  uint8_t val = 127;
+  if (m < original_M && k < original_K) {


Could we move this check to the hostside, or remove it completely?

alextmagro · 2026-05-05T14:44:02Z

+    : public ::testing::TestWithParam<
+          std::tuple<std::pair<int, int>, bool>> {};
+
+TEST_P(MxSwizzleTestSuite, TestMxSwizzle) {


I think full GEMM tests should live in test_cublaslt.cu. Also, wondering if this is not already covered by the MXFP8 gemm tests present there, if we are always swizzling. Probably should limit tests here to just testing the swizzled scales, if needed at all.

alextmagro · 2026-05-05T14:44:19Z

 #include <cstdint>

 #include "../common.h"
+#include "../util/cuda_runtime.h"


Why is this include needed?

alextmagro · 2026-05-05T14:52:28Z

             " (got shape=", shape, ")");
 #ifdef USE_ROCM
+  // gfx1250 MX pre-swizzle (Tensile 3D) layout requires M padded to multiple of 4.
+  // Other ROCm architectures use 128x4 tiles but currently skip padding


I'm not sure this is true regarding us using 128x4 tiles. 128x4 scaling is an upstream requirement. We also have padding expectations in pytorch, jax, and all 3 test dirs have padding that will probably need fixing.

This reverts commit 76ca4b1.

This reverts commit d714038.

matthiasdiener added 3 commits April 27, 2026 15:36

add MX scale pre-swizzling for gfx1250

bc363fa

switch to mxfp4

a6ca3af

tensile-like implementation

d1ee5bd

matthiasdiener self-assigned this Apr 29, 2026

Merge remote-tracking branch 'upstream/dev' into mdiener/mxfp8-swizzle

d1647ee

matthiasdiener added the ci-level 1 CI test level 1 label Apr 29, 2026

matthiasdiener added 9 commits May 1, 2026 18:41

Merge remote-tracking branch 'origin/dev' into mdiener/mxfp8-swizzle

1fff6d9

gfx1250 swizzle_xor changes for FP4

d714038

change line endings to unix, trim trailing whitespace

76ca4b1

Merge branch 'mdiener/swizzle_xor-1250' into mdiener/mxfp8-swizzle

81a0a27

fix arch

2991bcf

[WIP] e2e gemm test, not working yet

8ceb89c

fix for gfx1250

167d2eb

k-tile

5d46537

extend tests

313a6b7

matthiasdiener force-pushed the mdiener/mxfp8-swizzle branch from ddf19da to 313a6b7 Compare May 3, 2026 22:06

remove ifdef

2a8eeb5

matthiasdiener requested a review from alextmagro May 4, 2026 16:33

undo BLK32_UE8M0_32_8_EXT

c37a781

alextmagro requested changes May 5, 2026

View reviewed changes

matthiasdiener added 3 commits May 5, 2026 10:16

Merge remote-tracking branch 'upstream/dev' into mdiener/mxfp8-swizzle

5d2d38f

Revert "change line endings to unix, trim trailing whitespace"

f093f64

This reverts commit 76ca4b1.

Revert "gfx1250 swizzle_xor changes for FP4"

ecbffea

This reverts commit d714038.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[proof-of-concept] add MXFP8 pre-swizzling for gfx1250#568

[proof-of-concept] add MXFP8 pre-swizzling for gfx1250#568
matthiasdiener wants to merge 18 commits intodevfrom
mdiener/mxfp8-swizzle

matthiasdiener commented Apr 29, 2026 •

edited

Loading

Uh oh!

alextmagro left a comment

Uh oh!

alextmagro May 5, 2026

Uh oh!

matthiasdiener May 5, 2026

Uh oh!

alextmagro May 5, 2026

Uh oh!

alextmagro May 5, 2026

Uh oh!

alextmagro May 5, 2026

Uh oh!

alextmagro May 5, 2026

Uh oh!

alextmagro May 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

matthiasdiener commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Changes

Checklist:

Uh oh!

alextmagro left a comment

Choose a reason for hiding this comment

Uh oh!

alextmagro May 5, 2026

Choose a reason for hiding this comment

Uh oh!

matthiasdiener May 5, 2026

Choose a reason for hiding this comment

Uh oh!

alextmagro May 5, 2026

Choose a reason for hiding this comment

Uh oh!

alextmagro May 5, 2026

Choose a reason for hiding this comment

Uh oh!

alextmagro May 5, 2026

Choose a reason for hiding this comment

Uh oh!

alextmagro May 5, 2026

Choose a reason for hiding this comment

Uh oh!

alextmagro May 5, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

matthiasdiener commented Apr 29, 2026 •

edited

Loading