[ROCm] Add AMD Instinct MI300X/MI325X/MI350X/MI355X GPU support#3
Open
andyluo7 wants to merge 1 commit intoRightNow-AI:mainfrom
Open
[ROCm] Add AMD Instinct MI300X/MI325X/MI350X/MI355X GPU support#3andyluo7 wants to merge 1 commit intoRightNow-AI:mainfrom
andyluo7 wants to merge 1 commit intoRightNow-AI:mainfrom
Conversation
- Add MI300X, MI325X, MI350X, MI355X to _KNOWN_GPUS table with correct peak FP16 TFLOPS, memory bandwidth, and L2 cache specs - Add gcnArchName-based GPU detection for ROCm (device name is often empty on ROCm; gcnArchName like 'gfx942' is always available) - Guard clock_rate access behind hasattr check (ROCm devices report clock_rate=0, causing division issues in fallback estimation) - Apply same fixes to profile.py fallback detector Tested on AMD Instinct MI300X (gfx942) with ROCm 6.3 / PyTorch 2.9: - GPU correctly detected as 'AMD Instinct MI300X' - All FP16/BF16 correctness tests PASS - Benchmark harness runs end-to-end - PyTorch baseline: 607.9 TFLOPS on xlarge matmul (46.5% peak)
Author
✅ Verified on AMD Instinct MI350X (gfx950 / CDNA4)Tested end-to-end on 8x MI350X (ROCm 7.2, PyTorch 2.10.0+rocm7.0, Triton 3.6.0): GPU DetectionGPU correctly identified via Correctness
Performance (starter kernel, not optimized)PyTorch baseline xlarge BF16: 956.5 TFLOPS (41.6% of 2300 TFLOPS peak). Environment
Previously also verified on MI300X (gfx942, ROCm 6.3, PyTorch 2.9) — see PR description. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Add GPU detection and performance specs for AMD Instinct GPUs (MI300X, MI325X, MI350X, MI355X) to enable AutoKernel on ROCm.
Problem
On ROCm,
torch.cuda.get_device_properties()returns an empty device name andclock_rate=0, causing:_KNOWN_GPUStable → falls back to estimationclock_rate / 1e6in fallback path → incorrect zero/near-zero TFLOPS estimatesSolution
GPU Database (
bench.py+profile.py)ROCm-aware GPU Detection (
bench.py)_KNOWN_AMD_GPUSdict keyed bygcnArchNameprefix (e.g.gfx942→ MI300X)props.gcnArchNamefor identificationclock_rateaccess behindhasattr+> 0checkSame fixes applied to
profile.pyfallback detectorTesting
Tested end-to-end on AMD Instinct MI300X (gfx942, ROCm 6.3, PyTorch 2.9):
Zero NVIDIA impact
_KNOWN_GPUSentries unchangedclock_ratepath only skipped whenclock_rateis 0 or missing (never happens on NVIDIA)