Skip to content

dmatth1/krunch

Repository files navigation

Krunch

Krunch is a neural codec for text. It works on any NVIDIA GPU and beats traditional compression algorithms (like zstd-22) by 30-40% on natural-language text (chat, prose, code).

Run it on one machine or parallelize across a cluster with any batch system you already use.

Install + compress

Run on any host with an NVIDIA GPU + Docker:

# 1. Install (~5-10 min one-time — downloads CLI + pulls 3.5 GB image)
curl -fsSL https://raw.githubusercontent.com/dmatth1/krunch/main/install.sh | sudo bash
# For a pinned, reproducible install:
#   curl -fsSL https://raw.githubusercontent.com/dmatth1/krunch/main/install.sh | sudo KRUNCH_VERSION=v0.1.1 bash

# 2. Use it (instant — image is cached)
krunch compress   data.jsonl  -o data.krunch
krunch decompress data.krunch -o data.jsonl

# Or pipe-style (Unix idiom)
krunch compress   < data.jsonl  > data.krunch
krunch decompress < data.krunch > data.jsonl

Distributed compression

For large files / archival workloads, run krunch as parallel tasks on whatever batch system you already use. krunch plan emits a ready-to-run artifact for the target you pick.

# Compress
krunch plan --target aws-batch --mode compress \
    --source s3://… --dest s3://… --workers 16 > compress.json

# Decompress
krunch plan --target aws-batch --mode decompress \
    --source s3://… --dest s3://… --workers 16 > decompress.json

# Planned targets — same flag shape, not yet implemented
krunch plan --target k8s       --mode compress --source … --dest … --workers 16 > job.yaml
krunch plan --target modal     --mode compress --source … --dest … --workers 16 > run.py
krunch plan --target ray       --mode compress --source … --dest … --workers 16 > run.py
krunch plan --target slurm     --mode compress --source … --dest … --workers 16 > run.sbatch
krunch plan --target gcp-batch --mode compress --source … --dest … --workers 16 > job.json

Then submit with your own tooling and credentials: aws batch submit-job --cli-input-json file://compress.json, kubectl apply -f job.yaml, modal run run.py, etc.

Only --target aws-batch works today; the rest are illustrative of the intended UX. Contributions welcome — see CONTRIBUTING.md.

See deploy/aws-cdk/ for a working AWS Batch reference stack you can cdk deploy as-is.

Throughput

Measured on AWS Batch (A10G g5.xlarge, 100 MB WildChat-English) — real-work elapsed inside compress_all / decompress_all, excluding cold-start container init:

Krunch throughput vs fleet size

Note: cold-start tax may increase runtimes on the first job, but amortizes to zero on warm fleets and on large jobs.

Ratio comparisons

Compressed-size ratio (smaller = better) on a single A10G g5.xlarge, 1 MB chunks.

corpus krunch zstd -22 --long krunch vs zstd
Chat — WildChat-English (100 MB) 0.114 0.170 −33%
Wikipedia — enwik8 (100 MB) 0.146 0.253 −42%
Python code — CodeParrot (100 MB) 0.097 0.154 −37%
Support tickets — Bitext (19 MB) 0.099 0.083 +20%
HTTP logs — NASA Apache (100 MB) 0.157 0.061 +158%

krunch wins decisively on natural-language text (chat, prose, code) and loses to zstd-22 on highly-repetitive structured text (templated logs, intent labels).

What's inside the Docker image

  • RWKV-4-Pile-169M pretrained language model (Apache-2.0, BlinkDL) — the next-byte predictor.
  • Custom WKV CUDA kernel — fused recurrence op, ~1000× faster than HF transformers' eval-mode fallback.
  • constriction arithmetic coder — turns the model's next-token distribution into a bitstream.

Adding a new batch target

The artifact krunch plan emits contains both the worker tasks (each computes its byte range from a framework-injected index) and a finalize task that stitches partial blobs into the final output. The container contract (KRUNCH_INPUT_URL, KRUNCH_PART_INDEX, KRUNCH_PART_COUNT, …) is documented and stable — you can wire krunch into a batch system we don't have a template for in ~30 lines.

When not to use krunch

Krunch is a neural compressor for text. Avoid it when:

  • Your data is highly repetitive structured text (templated logs, intent labels, repeating timestamps). zstd-22's 128 MB dictionary window catches that pattern far more cheaply than a 169 M-parameter language model — see the ratio table above.
  • Arbitrary binary, mixed media, or already-compressed payloads. A 169 M-parameter language model has no advantage predicting randomness; krunch will produce larger output than the input.

Contributing

See CONTRIBUTING.md.

License

Apache-2.0. See NOTICE for upstream attributions.

About

Neural compressor that beats traditional algorithms by 30-40% on natural-language text compression. Can be parallelized across a cluster.

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors