Model Compression Papers and Repositories

Generated from papers.txt and repos.txt · 21 papers · 1 repos · 2026-03-17 Add URLs to papers.txt or repos.txt and commit — Action regenerates this automatically.

Quantization

Papers

SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models — Guangxuan Xiao, Ji Lin, Mickael Seznec et al. (2022)
First-Order Error Matters: Accurate Compensation for Quantized Large Language Models — Xingyu Zheng, Haotong Qin, Yuye Li et al. (2025)
Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference — Benoit Jacob, Skirmantas Kligys, Bo Chen et al. (2017)
1-bit AI Infra: Part 1.1, Fast and Lossless BitNet b1.58 Inference on CPUs — Jinheng Wang, Hansong Zhou, Ting Song et al. (2024)
LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale — Tim Dettmers, Mike Lewis, Younes Belkada et al. (2022)
QLoRA: Efficient Finetuning of Quantized LLMs — Tim Dettmers, Artidoro Pagnoni, Ari Holtzman et al. (2023)
Evaluating the Impact of Post-Training Quantization on Reliable VQA with Multimodal LLMs — Paul Jonas Kurz, Tobias Jan Wieczorek, Mohamed A. Abdelsalam et al. (2026)
Float8@2bits: Entropy Coding Enables Data-Free Model Compression — Patrick Putzky, Martin Genzel, Mattes Mollenhauer et al. (2026)
CoopQ: Cooperative Game Inspired Layerwise Mixed Precision Quantization for LLMs — Junchen Zhao, Ali Derakhshan, Jayden Kana Hyman et al. (2025)
Bielik-Q2-Sharp: A Comparative Study of Extreme 2-bit Quantization Methods for a Polish 11B Language Model — Jakub Prejzner (2026)
CASP: Compression of Large Multimodal Models Based on Attention Sparsity — Mohsen Gholami, Mohammad Akbari, Kevin Cannons et al. (2025)
Investigating the Impact of Quantization Methods on the Safety and Reliability of Large Language Models — Artyom Kharinaev, Viktor Moskvoretskii, Egor Shvetsov et al. (2025)
PV-Tuning: Beyond Straight-Through Estimation for Extreme LLM Compression — Vladimir Malinovskii, Denis Mazur, Ivan Ilin et al. (2024)
Extreme Compression of Large Language Models via Additive Quantization — Vage Egiazarian, Andrei Panferov, Denis Kuznedelev et al. (2024)
AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration — Ji Lin, Jiaming Tang, Haotian Tang et al. (2023)
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers — Elias Frantar, Saleh Ashkboos, Torsten Hoefler et al. (2022)
QuIP#: Even Better LLM Quantization with Hadamard Incoherence and Lattice Codebooks — Albert Tseng, Jerry Chee, Qingyao Sun et al. (2024)
Beyond Perplexity: Multi-dimensional Safety Evaluation of LLM Compression — Xu, Zhichao, Gupta, Ashim, Li, Tao et al. (2024)
ReALLM: A general framework for LLM compression and fine-tuning — Louis Leconte, Lisa Bedin, Van Minh Nguyen et al. (2024)

Repositories

ggml-org/llama.cpp — LLM inference in C/C++

Pruning

Papers

LLM Pruning and Distillation in Practice: The Minitron Approach — Sharath Turuvekere Sreenivas, Saurav Muralidharan, Raviraj Joshi et al. (2024)
LazyLLM: Dynamic Token Pruning for Efficient Long Context LLM Inference — Qichen Fu, Minsik Cho, Thomas Merth et al. (2024)

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.github/workflows		.github/workflows
.cache.txt		.cache.txt
.gitignore		.gitignore
.repos_cache.txt		.repos_cache.txt
LICENSE		LICENSE
README.md		README.md
build.py		build.py
manual.bib		manual.bib
papers.bib		papers.bib
papers.txt		papers.txt
repos.txt		repos.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Model Compression Papers and Repositories

Quantization

Papers

Repositories

Pruning

Papers

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Model Compression Papers and Repositories

Quantization

Papers

Repositories

Pruning

Papers

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages