Gfx1250 bringup moe by yadaish · Pull Request #1007 · ROCm/ATOM

yadaish · 2026-06-01T10:26:36Z

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

Route HIP kernels that fault/hang on gfx1250 to their AITER Triton or torch-native equivalents at every call site: - layernorm: rmsnorm2d_fwd / rmsnorm2d_fwd_with_add -> Triton rmsnorm - sampler: mixed_sample_outer_exponential -> torch greedy argmax (temp=0 bring-up) - topK: biased_grouped_topk (module_moe_asm) -> biased_grouped_topk_torch - attention_mla: gfx1250 routing Also fold structural HF fields (num_hidden_layers, first_k_dense_replace, num_nextn_predict_layers) into the compile-cache key so a graph compiled for one model depth isn't silently reused for another shape. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Route gfx1250 MoE scale prep through the grouped a8w4 kernel's _grouped_a8w4_prepare_scale_batch preshuffle, keeping the original shuffle_scale path for all other archs. gfx1250 only supports the non-interleaved (gate|up separated) scale layout, so raise if ATOM_MOE_GU_ITLV is set on that arch. Rename the gfx12 triton auto-enable to is_gfx1250; ATOM_USE_TRITON_MOE override is unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

yadaish and others added 3 commits May 30, 2026 20:02

update ATOM

8334845

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gfx1250 bringup moe#1007

Gfx1250 bringup moe#1007
yadaish wants to merge 3 commits into
mainfrom
gfx1250-bringup-moe

yadaish commented Jun 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

yadaish commented Jun 1, 2026

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant