use ATOM_USE_FP4_NON_SHUFFLE_TRITON_GEMM to enable non shuffle triton gemm by zhuyuhua-v · Pull Request #1031 · ROCm/ATOM

zhuyuhua-v · 2026-06-02T07:10:52Z

Motivation

As shown below, atom use preshuffled asm gemm as default path for fp4 gemm path while mori-sglang using non-shuffle gemm for better performance, plus, triton preshuffle fp4 gemm got a worse perf:

Hence, this pr add a flag `ATOM_USE_FP4_NON_SHUFFLE_TRITON_GEMM=1` to control whether to use triton non-shuffle fp4 gemm to imporve decode(small bs) cases' perf.

Test Plan

benchmark + gsm8k

Test Result

https://github.com/ROCm/ATOM/actions/runs/26651782340 for benchmark test:

for accuracy test

Submission Checklist

Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

github-actions

Remaining comments which cannot be posted as a review comment to avoid GitHub Rate Limit

ruff

⚠️ [ruff] _{reported by reviewdog 🐶}
unindent does not match any outer indentation level

ATOM/atom/model_ops/linear.py

Line 595 in c3e9ae0

@mark_trace

Signed-off-by: zhuyuhua-v <yuhzhu@amd.com>

Copilot

Pull request overview

Adds an environment-variable-controlled switch to use AITER’s non-shuffled Triton FP4 GEMM path (intended to improve small-batch/decode performance) and wires it into the FP4 GEMM dispatch and weight post-processing logic.

Changes:

Document a new env var to enable non-shuffle Triton FP4 GEMM behavior.
Add env var plumbing in atom.utils.envs.
Update FP4 GEMM selection logic and weight/scale shuffle handling in atom/model_ops/linear.py.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File	Description
docs/environment_variables.md	Documents the new env var and its precedence relative to `ATOM_USE_TRITON_GEMM`.
atom/utils/envs.py	Adds the new env var accessor (with suggested aliasing to match PR text).
atom/model_ops/linear.py	Adds non-shuffle Triton FP4 GEMM dispatch and adjusts shuffling behavior for FP4 weights/scales.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+    "ATOM_USE_FP4_NON_SHUFFLE_TRITON_GEMM": lambda: (
+        os.getenv("ATOM_USE_FP4_NON_SHUFFLE_TRITON_GEMM", "0") == "1"
+    ),


Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

 | **ATOM_USE_TRITON_GEMM** | bool | 0 (false) | If set to `1`, use AITER Triton FP4 weight preshuffled GEMM. Otherwise use AITER ASM FP4 weight preshuffled GEMM. |
+| **ATOM_USE_FP4_NON_SHUFFLE_TRITON_GEMM** | bool | 0 (false) | If set to `1`, use AITER Triton FP4 GEMM with non-shuffled weights. Takes precedence over the FP4 preshuffled GEMM path selected by `ATOM_USE_TRITON_GEMM`. |


+    "ATOM_USE_FP4_NON_SHUFFLE_TRITON_GEMM": lambda: (
+        os.getenv("ATOM_USE_FP4_NON_SHUFFLE_TRITON_GEMM", "0") == "1"
+    ),


+def use_fp4_non_shuffle_triton_gemm() -> bool:
+    return envs.ATOM_USE_FP4_NON_SHUFFLE_TRITON_GEMM
+
+
+if use_fp4_non_shuffle_triton_gemm():
+    try:
+        from aiter.ops.triton.gemm_afp4wfp4 import gemm_afp4wfp4  # noqa: E402
+    except ImportError as e:
+        logger.warning(f"Triton FP4 GEMM not available: {e}")
+        gemm_afp4wfp4 = None
+else:
+    gemm_afp4wfp4 = None


zhuyuhua-v force-pushed the yuhua/fp4-triton-gemm branch from 5375206 to c3e9ae0 Compare June 2, 2026 07:14

github-actions Bot reviewed Jun 2, 2026

View reviewed changes

zhuyuhua-v force-pushed the yuhua/fp4-triton-gemm branch from c3e9ae0 to 5603464 Compare June 2, 2026 07:16

use ATOM_USE_FP4_TRITON_GEMM to enable non shuffle triton gemm

6d85564

Signed-off-by: zhuyuhua-v <yuhzhu@amd.com>

zhuyuhua-v force-pushed the yuhua/fp4-triton-gemm branch from 5603464 to 6d85564 Compare June 2, 2026 07:26

zhuyuhua-v changed the title ~~add ATOM_USE_FP4_TRITON_GEMM=1 for 1/1024 cases~~ use ATOM_USE_FP4_TRITON_GEMM to enable non shuffle triton gemm Jun 2, 2026

update env name and add comments

f09417f

Signed-off-by: zhuyuhua-v <yuhzhu@amd.com>

XiaobingSuper previously approved these changes Jun 2, 2026

View reviewed changes

zhuyuhua-v marked this pull request as ready for review June 2, 2026 08:46

Copilot AI review requested due to automatic review settings June 2, 2026 08:46

Copilot started reviewing on behalf of zhuyuhua-v June 2, 2026 08:46 View session

Copilot AI reviewed Jun 2, 2026

View reviewed changes

Comment thread atom/utils/envs.py

Comment on lines +131 to +133

"ATOM_USE_FP4_NON_SHUFFLE_TRITON_GEMM": lambda: (

os.getenv("ATOM_USE_FP4_NON_SHUFFLE_TRITON_GEMM", "0") == "1"

),

Comment thread atom/model_ops/linear.py Outdated

Comment thread atom/model_ops/linear.py Outdated

Apply suggestions from code review

0331743

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

zhuyuhua-v dismissed XiaobingSuper’s stale review via 0331743 June 2, 2026 09:36

Copilot AI review requested due to automatic review settings June 2, 2026 09:36

Copilot started reviewing on behalf of zhuyuhua-v June 2, 2026 09:36 View session

Copilot AI reviewed Jun 2, 2026

View reviewed changes

valarLip approved these changes Jun 2, 2026

View reviewed changes

zhuyuhua-v changed the title ~~use ATOM_USE_FP4_TRITON_GEMM to enable non shuffle triton gemm~~ use ATOM_USE_FP4_NON_SHUFFLE_TRITON_GEMM to enable non shuffle triton gemm Jun 3, 2026

zejunchen-zejun approved these changes Jun 3, 2026

View reviewed changes

zhuyuhua-v merged commit c42ab44 into main Jun 3, 2026
72 of 88 checks passed

zhuyuhua-v deleted the yuhua/fp4-triton-gemm branch June 3, 2026 13:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

use ATOM_USE_FP4_NON_SHUFFLE_TRITON_GEMM to enable non shuffle triton gemm#1031

use ATOM_USE_FP4_NON_SHUFFLE_TRITON_GEMM to enable non shuffle triton gemm#1031
zhuyuhua-v merged 3 commits into
mainfrom
yuhua/fp4-triton-gemm

zhuyuhua-v commented Jun 2, 2026 •

edited

Loading

Uh oh!

github-actions Bot left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

		\| ATOM_USE_TRITON_GEMM \| bool \| 0 (false) \| If set to `1`, use AITER Triton FP4 weight preshuffled GEMM. Otherwise use AITER ASM FP4 weight preshuffled GEMM. \|
		\| ATOM_USE_FP4_NON_SHUFFLE_TRITON_GEMM \| bool \| 0 (false) \| If set to `1`, use AITER Triton FP4 GEMM with non-shuffled weights. Takes precedence over the FP4 preshuffled GEMM path selected by `ATOM_USE_TRITON_GEMM`. \|

Conversation

zhuyuhua-v commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Test Plan

Test Result

Submission Checklist

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

zhuyuhua-v commented Jun 2, 2026 •

edited

Loading