I'm Zhaoyuan Bi, an undergraduate student in Computer Science at Peking University (Class of 2027).
I focus on Machine Learning Systems, especially GPU kernel optimization for LLM inference.
Recently, I have been working on:
- CUDA kernel development in GGML / Llama.cpp
- Optimization of quantization, RoPE, and vecdot kernels
- Performance analysis using Nsight (memory access, CPI, bottlenecks)
- Improving end-to-end inference throughput
- GPU Computing (CUDA)
- LLM Inference Optimization
- Parallel Algorithms & Memory Optimization
- Systems for Machine Learning (MLSys)
- Parallel primitives (e.g., scan, reduction)
- Performance-critical kernel design
- Memory-bound optimization in GPU workloads


