Adaptive Speculative Decoding

Pet project and research playground for LLM decoding acceleration.

This repository compares exact and approximate decoding strategies under a single benchmark harness, with reproducible configs, JSONL outputs, and report scripts.

Why This Project

Side-by-side comparison of Baseline, Speculative Sampling, AutoJudge, Consensus AutoJudge, Top-K, and SpecExec.
Paper-aligned AutoJudge implementation (GSM8K label mining + LogisticRegression calibration).
Long-run friendly workflow (resume keys, checkpoints, strict result schema validation).
Real benchmark reports are versioned in reports/.

Implemented Methods

Method	Exact vs target distribution	Main idea
`baseline`	exact	Target-only decoding
`speculative`	exact	Draft proposes, target verifies
`autojudge`	approximate	Judge can accept some mismatches
`consensus_autojudge`	approximate	Two drafts + consensus gate decide accept / escalate / fallback
`topk`	approximate	Accept mismatch if target token in top-k
`specexec`	exact	Parallel speculative branches + cache reuse

Latest Benchmark Snapshot

Latest full Llama run: 2026-03-28-llama-48h-cgrid8 on RTX 5090.

Source reports:

reports/yandex_llama3_8b_3b_2026-03-28-llama-48h-cgrid8-gsm8k.md
reports/yandex_llama3_8b_3b_2026-03-28-llama-48h-cgrid8-livecodebench.md

GSM8K highlights (k=4):

Method	Accuracy (%)	Speed (tok/s)
Baseline	70.89	72.68
Speculative	71.89	40.68
AutoJudge (t=0.14)	78.67	45.98
Top-K (all)	75.67	59.29

LiveCodeBench highlights (throughput only):

Method	Speed (tok/s)
Baseline	71.52
Speculative	34.80
AutoJudge (t=1.0)	29.27
Top-K (all)	36.53

More context and historical runs: docs/RESULTS.md.

Quick Start (5 Minutes)

make setup
make check
make test
make bench-toy OUT=/tmp/bench_toy.jsonl

Optional tiny HF smoke:

make smoke-hf OUT=/tmp/smoke_hf.jsonl

Reproduce Main Runs

Paper-style Qwen sweep:

make paper-eval

Local Qwen 7B/1.5B sweep:

make local-eval

Local Llama 8B/3B sweep:

bash scripts/run_llama3_8b_3b_eval.sh

Validate any JSONL output:

.venv/bin/python scripts/validate_results_jsonl.py --path datasets/results.jsonl --strict

Project Structure

sp_samp/ core implementations and HF adapters
benchmarks/ benchmark entrypoint and result logging
configs/ model, method, and experiment presets
scripts/ orchestration, validation, and report generation
tests/ unit tests
reports/ tracked benchmark artifacts
datasets/ local datasets and run outputs (gitignored)

Method design notes:

docs/CONSENSUS_AUTOJUDGE.md - disagreement-aware two-draft approximate decoding design

Constraints and Repro Notes

Draft and target must use tokenizer-compatible vocab mapping.
AutoJudge paper C-grid policy is 1e-7..1e0 (8 values).
Reusing the same output file enables automatic resume by resume_key.

For Reviewers and Contributors

Contribution guide: CONTRIBUTING.md
Open issues and feature proposals: GitHub issue templates
Current priorities: docs/ROADMAP.md
Repository presentation checklist: docs/GITHUB_SETUP.md

License

MIT. See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
.github		.github
benchmarks		benchmarks
configs		configs
datasets		datasets
docs		docs
file_changes		file_changes
models		models
papers		papers
reports		reports
scripts		scripts
sp_samp		sp_samp
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
.python-version		.python-version
CLAUDE.md		CLAUDE.md
CODEX.MD		CODEX.MD
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
Dockerfile.gpu		Dockerfile.gpu
LICENSE		LICENSE
Makefile		Makefile
README.MD		README.MD
requirements-gpu.txt		requirements-gpu.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Adaptive Speculative Decoding

Why This Project

Implemented Methods

Latest Benchmark Snapshot

Quick Start (5 Minutes)

Reproduce Main Runs

Project Structure

Constraints and Repro Notes

For Reviewers and Contributors

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Adaptive Speculative Decoding

Why This Project

Implemented Methods

Latest Benchmark Snapshot

Quick Start (5 Minutes)

Reproduce Main Runs

Project Structure

Constraints and Repro Notes

For Reviewers and Contributors

License

About

Topics

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages