Skip to content

Gfx1250 bringup moe#1007

Draft
yadaish wants to merge 3 commits into
mainfrom
gfx1250-bringup-moe
Draft

Gfx1250 bringup moe#1007
yadaish wants to merge 3 commits into
mainfrom
gfx1250-bringup-moe

Conversation

@yadaish
Copy link
Copy Markdown

@yadaish yadaish commented Jun 1, 2026

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

yadaish and others added 3 commits May 30, 2026 20:02
Route HIP kernels that fault/hang on gfx1250 to their AITER Triton or
torch-native equivalents at every call site:
- layernorm: rmsnorm2d_fwd / rmsnorm2d_fwd_with_add -> Triton rmsnorm
- sampler: mixed_sample_outer_exponential -> torch greedy argmax (temp=0 bring-up)
- topK: biased_grouped_topk (module_moe_asm) -> biased_grouped_topk_torch
- attention_mla: gfx1250 routing

Also fold structural HF fields (num_hidden_layers, first_k_dense_replace,
num_nextn_predict_layers) into the compile-cache key so a graph compiled
for one model depth isn't silently reused for another shape.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Route gfx1250 MoE scale prep through the grouped a8w4 kernel's
_grouped_a8w4_prepare_scale_batch preshuffle, keeping the original
shuffle_scale path for all other archs. gfx1250 only supports the
non-interleaved (gate|up separated) scale layout, so raise if
ATOM_MOE_GU_ITLV is set on that arch. Rename the gfx12 triton
auto-enable to is_gfx1250; ATOM_USE_TRITON_MOE override is unchanged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant