kvcache-compression

Here are 10 public repositories matching this topic...

jjang-ai / vmlx

vMLX - JANGTQ Uber Compressed MLX Models - L2 Disk Cache (survives restart) + L1 Paged (super fast ttft) + Hybrid SSM Scheduler + Cont Batching + etc!

macbook persistent-memory mlx openai-api llm lmstudio anthropic-api mcp-server kvcache-optimization kvcache-compression openclaw kvcache-reuse openclaw-agent prefix-cache mlxllm mlxstudio vmlx omlx omlx-alternative

Updated Jun 2, 2026
Python

Summer-Summer / Kitty

Star

Algorithm-System Co-design: accurate and efficient 2-bit KV cache quantization for LLM Inference..

gpu-acceleration llm-inference kvcache-compression

Updated May 20, 2026
Python

llmsresearch / kvcompress

Sponsor

Star

KV-cache compression for LLMs: reference implementations of TurboAngle and TurboQuant codecs with Triton GPU kernels

kvcache kvcache-compression turboquant turboangle

Updated Apr 5, 2026
Python

amitshekhariitbhu / turboquant-experiment

Star

KV Cache with PagedAttention vs PagedAttention + TurboQuant - experiments across token sizes comparing memory, latency, and accuracy.

inference large-language-models llm llms llm-inference kvcache kvcache-optimization kvcache-compression turboquant

Updated Mar 26, 2026
Python

LLAA178 / vllm-kivi

Star

Production-ready 2/4-bit KV Cache quantization for vLLM via Triton; 70% VRAM saving & 1.8x speedup

triton quantization vllm kvcache-compression

Updated May 26, 2026
Python

nihilistau / shannon-prime-lattice

Star

Umbrella for the decentralized cooperative AI training/inference architecture built on the prime-factored coordinate lattice and the dominance order. Theory + Systems + Roadmap papers, contracts, offload pattern.

distributed-systems machine-learning ai machine-learning-algorithms blockchain distributed-computing transformers blockchain-technology kvcache-optimization kvcache-compression

Updated Jun 2, 2026
HTML

nihilistau / Position_Is_Arithmetic

Star

Prime Power Transformer: A Number-Theoretic Architecture for Compute

machine-learning ai machine-learning-algorithms distributed-computing transformers attention-mechanism distributed-ledger attention-model attention-is-all-you-need attention-mechanisms transformer-architecture mathematical-proof llm kvcache kvcache-optimization kvcache-compression

Updated Jun 2, 2026
Python

nihilistau / shannon-prime-system

Star

Clean from-scratch math core for shannon-prime-lattice: KSTE encoder, Friedman sieve, ARM (HRR in CRT cyclotomic ring), CRT NTT primitives, Position-as-Arithmetic.

machine-learning ai machine-learning-algorithms transformers attention-mechanism attention-is-all-you-need transformer-encoder transformer-architecture transformer-models kvcache kvcache-optimization kvcache-compression

Updated Jun 2, 2026
C

nihilistau / shannon-prime-system-engine

Star

Clean from-scratch inference engine for shannon-prime-lattice. NTT-based attention, two-node CRT-sharded inference path, KSTE-encoded KV state.

machine-learning machine-learning-algorithms transformers feedforward-neural-network attention-mechanism attention-is-all-you-need transformer-encoder transformer-architecture kvcache kvcache-optimization kvcache-compression

Updated Jun 2, 2026
HTML

LyndonBlack / llama.cpp-Ternary-1.58Bit-and-TurboQuant

Star

LLM inference in C/C++ Including TurboQuant and Ternary models, referencing from PrismML-Eng & TheTom. Grab a Bonasi Ternary 8B model and test it out: https://huggingface.co/prism-ml/Ternary-Bonsai-8B-gguf

local ternary bonsai kv-cache llm kvcache-compression turboquant

Updated May 29, 2026
C++

Improve this page

Add a description, image, and links to the kvcache-compression topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the kvcache-compression topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kvcache-compression

Here are 10 public repositories matching this topic...

jjang-ai / vmlx

Summer-Summer / Kitty

llmsresearch / kvcompress

amitshekhariitbhu / turboquant-experiment

LLAA178 / vllm-kivi

nihilistau / shannon-prime-lattice

nihilistau / Position_Is_Arithmetic

nihilistau / shannon-prime-system

nihilistau / shannon-prime-system-engine

LyndonBlack / llama.cpp-Ternary-1.58Bit-and-TurboQuant

Improve this page

Add this topic to your repo