Tutorial materials and hands-on exercises for GPU kernel optimization on GPUs using HIP, Triton, and AI-assisted optimization techniques.
This repository contains comprehensive materials for learning GPU kernel optimization, including:
- Low-level HIP/C++ implementations demonstrating optimization techniques
- High-level Triton kernel development tutorials
- AI-powered kernel optimization using GEAK (GPU Kernel Optimization Agent)
C++ kernel implementations with naive and optimized versions:
- 01-memory-coalescing: Optimizing memory access patterns
- 02-loop-unrolling: A comparison case using unrolling.
- Hands_On_Kernels_and_Optmiztion.ipynb: Interactive tutorial notebook
Python-based kernel optimization tutorials:
- Fused softmax implementation
- Layer normalization kernels
- Comprehensive Triton optimization guide
Agent-based kernel optimization framework for automated kernel tuning and optimization.
Neurips_tutorial.pdf: Complete tutorial documentationNeurips_tutorial.pptx: Presentation slides
- HIP Examples: Navigate to
src/hip/and compile the C++ files using ROCm toolchain - Triton Examples: Open the Jupyter notebooks in
src/triton/(requires Triton installation) - GEAK: Start with
src/geak/Main.ipynbfor agent-based optimization
- ROCm toolkit (for HIP examples)
- Python with Jupyter (for Triton and GEAK examples)
- AMD GPU with ROCm support
MIT License - see LICENSE for details.