GitHub - yigitkonur/cli-batch-requester: 10K+ req/s batch API client for LLM endpoints — Rust, async, load-balanced

high-throughput batch API client for LLM workloads. load-balances across endpoints, retries with backoff, streams results to disk. written in Rust.

cbr -i requests.jsonl -o results.jsonl

that's it. 100k requests, zero babysitting.

what it does

10k+ req/sec on modest hardware (Tokio async runtime, work-stealing)
weighted load balancing across multiple endpoints/API keys
exponential backoff with jitter — respects rate limits, no thundering herd
streaming output — results hit disk as they complete, crash-safe
per-endpoint health tracking — unhealthy endpoints get cooled off automatically
connection pooling — HTTP/2 keep-alive, no repeated TCP handshakes

install

cargo install cli-batch-requester

or build from source:

git clone https://github.com/yigitkonur/cli-batch-requester.git
cd cli-batch-requester && cargo build --release

usage

basic

cbr -i requests.jsonl -o results.jsonl

# crank it up
cbr -i data.jsonl -o out.jsonl --rate 10000 --workers 200

with multiple endpoints

cbr -i requests.jsonl -o results.jsonl --config endpoints.json

via environment

export CBR_ENDPOINT_URL="https://api.openai.com/v1/completions"
export CBR_API_KEY="sk-..."
export CBR_MODEL="gpt-4"
cbr -i requests.jsonl -o results.jsonl

input format

one JSON object per line:

{"input": "What is the capital of France?"}
{"input": "Explain quantum computing in simple terms."}

or with full request bodies:

{"body": {"messages": [{"role": "user", "content": "Hello!"}], "model": "gpt-4"}}

output

successes go to your output file:

{"input": "...", "response": {"choices": [...]}, "metadata": {"endpoint": "...", "latency_ms": 234, "attempts": 1}}

failures go to errors.jsonl:

{"input": "...", "error": "HTTP 429: Rate limit exceeded", "status_code": 429, "attempts": 3}

configuration

CLI flags

OPTIONS:
    -i, --input <FILE>        JSONL input file
    -o, --output <FILE>       output file for successful responses
    -e, --errors <FILE>       error output [default: errors.jsonl]
    -r, --rate <N>            max requests/sec [default: 1000]
    -w, --workers <N>         concurrent workers [default: 50]
    -t, --timeout <SECS>      request timeout [default: 30]
    -a, --max-attempts <N>    retry attempts [default: 3]
    -c, --config <FILE>       endpoint config file (JSON)
    -v, --verbose             debug logging
        --json-logs           structured JSON logs
        --no-progress         disable progress bar
        --dry-run             validate without processing

all flags also work as env vars with CBR_ prefix.

endpoint config

for multiple endpoints with weighted distribution:

{
  "endpoints": [
    {
      "url": "https://api.openai.com/v1/completions",
      "weight": 2,
      "api_key": "sk-key-1",
      "model": "gpt-4",
      "max_concurrent": 100
    },
    {
      "url": "https://api.openai.com/v1/completions",
      "weight": 1,
      "api_key": "sk-key-2",
      "model": "gpt-4",
      "max_concurrent": 50
    }
  ],
  "request": {
    "timeout": "30s",
    "rate_limit": 5000,
    "workers": 100
  },
  "retry": {
    "max_attempts": 3,
    "initial_backoff": "100ms",
    "max_backoff": "10s",
    "multiplier": 2.0
  }
}

library usage

use cli_batch_requester::{Config, EndpointConfig, Processor};

#[tokio::main]
async fn main() -> anyhow::Result<()> {
    let config = Config {
        endpoints: vec![EndpointConfig {
            url: "https://api.example.com/v1/completions".to_string(),
            weight: 1,
            api_key: Some("your-key".to_string()),
            model: Some("gpt-4".to_string()),
            max_concurrent: 100,
        }],
        ..Default::default()
    };

    let processor = Processor::new(config)?;
    processor.process_file(
        "requests.jsonl".into(),
        Some("results.jsonl".into()),
        "errors.jsonl".into(),
        true,
    ).await?.print_summary();

    Ok(())
}

troubleshooting

problem	fix
"too many open files"	`ulimit -n 65535`
connection timeouts	increase `--timeout` or reduce `--workers`
429 rate limit errors	lower `--rate` or add more API keys
high memory usage	reduce `--workers`
OpenSSL build errors	`apt install libssl-dev` or use `--features rustls`

project structure

src/
  lib.rs        — library entry point
  main.rs       — CLI
  config.rs     — configuration
  client.rs     — HTTP client + retry logic
  endpoint.rs   — load balancer
  processor.rs  — processing orchestration
  request.rs    — request/response types
  tracker.rs    — statistics
  error.rs      — error types

contributing

cargo test && cargo fmt && cargo clippy

PRs welcome.

license

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
benches		benches
examples		examples
src		src
.gitignore		.gitignore
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

what it does

install

usage

basic

with multiple endpoints

via environment

input format

output

configuration

CLI flags

endpoint config

library usage

troubleshooting

project structure

contributing

license

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

what it does

install

usage

basic

with multiple endpoints

via environment

input format

output

configuration

CLI flags

endpoint config

library usage

troubleshooting

project structure

contributing

license

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages