llm_course_flashattention

Assignment for the LLM course on FlashAttention

Installation

Create and activate the conda environment:

conda env create -f environment.yml
conda activate flashattention

Project Structure

.
├── online_softmax/
│   ├── online_softmax.py    # Online softmax implementation (Triton + PyTorch)
│   └── fused_softmax.py     # Fused softmax with matrix multiplication
├── benchmarking/
│   ├── bench_softmax.py     # Benchmark for online softmax
│   └── bench_fused.py       # Benchmark for fused softmax
├── tests/
│   ├── test_online_softmax.py
│   └── test_fused_softmax.py
└── environment.yml

Running Tests

pytest tests/ -v

Running Benchmarks

python benchmarking/bench_softmax.py
python benchmarking/bench_fused.py

Results are saved to outputs/ directory.

Notes

The fused softmax kernel uses tl.dot() which requires all dimensions to be >= 16 (tensor core constraint)
Block sizes must be >= 16 for the fused softmax implementation
Numerical tolerance for fused softmax tests is 1e-3 (vs 1e-5 for simple softmax) due to error accumulation from online algorithm + TF32 tensor core operations

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
benchmarking		benchmarking
online_softmax		online_softmax
tests		tests
.gitignore		.gitignore
FlashAttention.ipynb		FlashAttention.ipynb
README.md		README.md
environment.yml		environment.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

llm_course_flashattention

Installation

Project Structure

Running Tests

Running Benchmarks

Notes

About

Uh oh!

Releases

Packages

Languages

dataflowr/llm_course_flashattention

Folders and files

Latest commit

History

Repository files navigation

llm_course_flashattention

Installation

Project Structure

Running Tests

Running Benchmarks

Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages