- Download and install the CUDA Toolkit for your corresponding platform. For system requirements and installation instructions of the CUDA toolkit, please refer to the Linux Installation Guide
- Create a new virtual env and install build dependencies
pip install -r requirements-cuda.txt
- Download and install the ROCm Toolkit for your corresponding platform. For system requirements and installation instructions of ROCm toolkit, please refer to the Linux Installation Guide
- Create a new virtual env and install build dependencies
pip install -r requirements-rocm.txt
- Download and install the Intel oneAPI 2025.3 Toolkit
- Create a new virtual env and install build dependencies
pip install -r requirements-xpu.txt
pip install flash-attn-4
git clone https://github.com/Dao-AILab/flash-attention
cd flash-attention/hopper
python setup.py install
git clone --recursive https://github.com/ROCm/flash-attention.git
cd flash-attention
MAX_JOBS=$((`nproc` - 1)) pip install -v .
python driver.py --help
The README files in sub-directories provide the commands to run the benchmarks.
| Benchmark name | Intel B580 (16GB) | AMD MI250X (64GB) | AMD MI300A (128GB) | NVIDIA H100 (80GB) |
|---|---|---|---|---|
| attn | ❌ | ✅ | ✅ | ✅ |
| attn2 | ❌ | ✅ | ✅ | ✅ |
| attn3 | ❌ | ❌ | ❌ | ✅ |
| attn4 | ❌ | ❌ | ❌ | ✅ |
| attn_triton | ✅ | ✅ | ✅ | ✅ |
| sdpa | limited | ✅ | ✅ | ✅ |
| flex | ✅ | ✅ | ✅ | ✅ |
| fp8_gemm | ✅ | ❌ | ✅ | ✅ |
| semianalysiswork | ✅ | ❌ | ✅ | ✅ |
| bgemm | ✅ | ✅ | ✅ | ✅ |
| group-gemm-torch | ✅ | ✅ | ✅ | ✅ |
| group-gemm-triton | ✅ | ✅ | ✅ | ✅ |
| group-gemm-turbo | ❌ | ❌ | ✅ | ❌ |
| gemm | ✅ | ✅ | ✅ | ✅ |
| int8_gemm | ✅ | ✅ | ✅ | ✅ |
| aiter_gemm | ❌ | ✅ | ✅ | ❌ |
| tritonBLAS | ❌ | ✅ | ✅ | ❌ |
| vllm | ❌ | ❌ | ✅ | ✅ |
| geometrics kernel | ✅ | ✅ | ✅ | ✅ |
| neural_operators | ❌ | ✅ | ✅ | ✅ |
| tensor_ops | ✅ | ✅ | ✅ | ✅ |
| communication | ✅ | ✅ | ✅ | ✅ |
| moe | ✅ | ✅ | ✅ | ✅ |
| storeKVCache | ✅ | ✅ | ✅ | ✅ |
| mixtral-moe | ❌ | ✅ | ✅ | ✅ |
| unet | ✅ | ✅ | ✅ | ✅ |
| wan2 | ✅ | ✅ | ✅ | ✅ |
https://rocm.blogs.amd.com/artificial-intelligence/flash-attention/README.html
https://github.com/pytorch/ao
https://github.com/ROCm/tritonBLAS
https://github.com/ROCm/aiter
https://github.com/vllm-project/vllm
https://github.com/Dao-AILab/flash-attention
https://github.com/pyg-team/pytorch_geometric
https://github.com/NVIDIA/physicsnemo
https://github.com/neuraloperator/neuraloperator
https://github.com/ORNL/HydraGNN