Quick Start

NVIDIA GPUs

Download and install the CUDA Toolkit for your corresponding platform. For system requirements and installation instructions of the CUDA toolkit, please refer to the Linux Installation Guide
Create a new virtual env and install build dependencies

   pip install -r requirements-cuda.txt

AMD GPUs

Download and install the ROCm Toolkit for your corresponding platform. For system requirements and installation instructions of ROCm toolkit, please refer to the Linux Installation Guide
Create a new virtual env and install build dependencies

   pip install -r requirements-rocm.txt

Intel XPU devices

Download and install the Intel oneAPI 2025.3 Toolkit
Create a new virtual env and install build dependencies

   pip install -r requirements-xpu.txt

Install Flash Attention 4

pip install flash-attn-4

Install Flash Attention 3

git clone https://github.com/Dao-AILab/flash-attention
cd flash-attention/hopper
python setup.py install

Install Flash Attention with ROCm support

git clone --recursive https://github.com/ROCm/flash-attention.git
cd flash-attention
MAX_JOBS=$((`nproc` - 1)) pip install -v .

Get Help

python driver.py --help

Run benchmarks

The README files in sub-directories provide the commands to run the benchmarks.

Support matrix

Benchmark name	Intel B580 (16GB)	AMD MI250X (64GB)	AMD MI300A (128GB)	NVIDIA H100 (80GB)
attn	❌	✅	✅	✅
attn2	❌	✅	✅	✅
attn3	❌	❌	❌	✅
attn4	❌	❌	❌	✅
attn_triton	✅	✅	✅	✅
sdpa	limited	✅	✅	✅
flex	✅	✅	✅	✅
fp8_gemm	✅	❌	✅	✅
semianalysiswork	✅	❌	✅	✅
bgemm	✅	✅	✅	✅
group-gemm-torch	✅	✅	✅	✅
group-gemm-triton	✅	✅	✅	✅
group-gemm-turbo	❌	❌	✅	❌
gemm	✅	✅	✅	✅
int8_gemm	✅	✅	✅	✅
aiter_gemm	❌	✅	✅	❌
tritonBLAS	❌	✅	✅	❌
vllm	❌	❌	✅	✅
geometrics kernel	✅	✅	✅	✅
neural_operators	❌	✅	✅	✅
tensor_ops	✅	✅	✅	✅
communication	✅	✅	✅	✅
moe	✅	✅	✅	✅
storeKVCache	✅	✅	✅	✅
mixtral-moe	❌	✅	✅	✅
unet	✅	✅	✅	✅
wan2	✅	✅	✅	✅

Reference

https://rocm.blogs.amd.com/artificial-intelligence/flash-attention/README.html
https://github.com/pytorch/ao
https://github.com/ROCm/tritonBLAS
https://github.com/ROCm/aiter
https://github.com/vllm-project/vllm
https://github.com/Dao-AILab/flash-attention
https://github.com/pyg-team/pytorch_geometric
https://github.com/NVIDIA/physicsnemo
https://github.com/neuraloperator/neuraloperator
https://github.com/ORNL/HydraGNN

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Quick Start

NVIDIA GPUs

AMD GPUs

Intel XPU devices

Install Flash Attention 4

Install Flash Attention 3

Install Flash Attention with ROCm support

Get Help

Run benchmarks

Support matrix

Reference

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 249 Commits
agent		agent
attention		attention
communication		communication
geometric		geometric
io		io
matrix_multiply		matrix_multiply
models		models
moe		moe
neural_operator		neural_operator
rag		rag
recipe		recipe
tensor_ops		tensor_ops
README.md		README.md
requirements-cuda.txt		requirements-cuda.txt
requirements-rocm.txt		requirements-rocm.txt
requirements-xpu.txt		requirements-xpu.txt

Folders and files

Latest commit

History

Repository files navigation

Quick Start

NVIDIA GPUs

AMD GPUs

Intel XPU devices

Install Flash Attention 4

Install Flash Attention 3

Install Flash Attention with ROCm support

Get Help

Run benchmarks

Support matrix

Reference

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages