vMLX - Home of JANG_Q - No other MLX inferencer can do this. Cont Batch, Prefix, Paged, KV Cache Quant, VL - Powers MLX Studio. Image gen/edit, OpenAI/Anth
-
Updated
Mar 21, 2026 - Python
vMLX - Home of JANG_Q - No other MLX inferencer can do this. Cont Batch, Prefix, Paged, KV Cache Quant, VL - Powers MLX Studio. Image gen/edit, OpenAI/Anth
Algorithm-System Co-design: accurate and efficient 2-bit KV cache quantization for LLM Inference..
Production-ready 2/4-bit KV Cache quantization for vLLM via Triton; 70% VRAM saving & 1.8x speedup
Add a description, image, and links to the kvcache-compression topic page so that developers can more easily learn about it.
To associate your repository with the kvcache-compression topic, visit your repo's landing page and select "manage topics."