[EMNLP 2024 & AAAI 2026] A powerful toolkit for compressing large models including LLMs, VLMs, and video generative models.
-
Updated
Apr 1, 2026 - Python
[EMNLP 2024 & AAAI 2026] A powerful toolkit for compressing large models including LLMs, VLMs, and video generative models.
The code for "TokenPacker: Efficient Visual Projector for Multimodal LLM", IJCV2025
CLI proxy that reduces LLM token usage by 60-90%. Declarative YAML filters for Claude Code, Cursor, Copilot, Gemini. rtk alternative in Go.
A discovery and compression tool for your Python codebase. Creates a knowledge graph for a LLM context window, efficiently outlining your project | Code structure visualization | LLM Context Window Efficiency | Static analysis for AI | Large Language Model tooling #LLM #AI #Python #CodeAnalysis #ContextWindow #DeveloperTools
AI-powered text compression library for RAG systems and API calls. Reduce token usage by up to 50-60% while preserving semantic meaning with advanced compression strategies.
A smart context filter that removes noise, improves responses, and reduces token usage up to 90%
A lightweight tool to optimize your Javascript / Typescript project for LLM context windows by using a knowledge graph | AI code understanding | LLM context enhancement | Code structure visualization | Static analysis for AI | Large Language Model tooling #LLM #AI #JavaScript #TypeScript #CodeAnalysis #ContextWindow #DeveloperTools
[CVPR 2025] PACT: Pruning and Clustering-Based Token Reduction for Faster Visual Language Models
ZON → 35-70% cheaper LLM prompts than JSON/TOON. Zero overhead.
[AAAI 2026] Global Compression Commander: Plug-and-Play Inference Acceleration for High-Resolution Large Vision-Language Models
97% token reduction for AI coding sessions — zero deps, 21 languages, MCP server
A lightweight tool to optimize your C# project for LLM context windows by using a knowledge graph | Code structure visualization | Static analysis for AI | Large Language Model tooling | .NET ecosystem support #LLM #AI #CSharp #DotNet #CodeAnalysis #ContextWindow #DeveloperTools
A discovery and compression tool for your Java codebase. Creates a knowledge graph for a LLM context window, efficiently outlining your project #LLM #AI #Java #CodeAnalysis #ContextWindow #DeveloperTools #StaticAnalysis #CodeVisualization
CLI proxy for coding agents that cuts noisy terminal output while preserving command behavior
⚡ Cut Claude token usage by 90%+ — free, open-source, local-first context compression for Claude Code. Hybrid RAG (BM25 + ONNX vectors), AST chunking, reranking. No API needed.
😎 Awesome papers on token redundancy reduction
The official implementation of CVPR Workshop 2025 paper: Window Token Concatenation for Efficient Visual Large Language Models.
Context-Optimized Memory Bank — Reduce AI token usage with structured documentation and cache-aware reading strategies
Script workflow management via MCP. Converts AI workflows to persistent scripts, reducing tokens & delays while minimizing hallucination risks.
Token-compression skill. An adaptation of caveman — short common words, trust context, say just enough, be laconic.
Add a description, image, and links to the token-reduction topic page so that developers can more easily learn about it.
To associate your repository with the token-reduction topic, visit your repo's landing page and select "manage topics."