Skip to content

yigitkonur/cli-batch-requester

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

high-throughput batch API client for LLM workloads. load-balances across endpoints, retries with backoff, streams results to disk. written in Rust.

cbr -i requests.jsonl -o results.jsonl

that's it. 100k requests, zero babysitting.

crates.io rust license


what it does

  • 10k+ req/sec on modest hardware (Tokio async runtime, work-stealing)
  • weighted load balancing across multiple endpoints/API keys
  • exponential backoff with jitter — respects rate limits, no thundering herd
  • streaming output — results hit disk as they complete, crash-safe
  • per-endpoint health tracking — unhealthy endpoints get cooled off automatically
  • connection pooling — HTTP/2 keep-alive, no repeated TCP handshakes

install

cargo install cli-batch-requester

or build from source:

git clone https://github.com/yigitkonur/cli-batch-requester.git
cd cli-batch-requester && cargo build --release

usage

basic

cbr -i requests.jsonl -o results.jsonl

# crank it up
cbr -i data.jsonl -o out.jsonl --rate 10000 --workers 200

with multiple endpoints

cbr -i requests.jsonl -o results.jsonl --config endpoints.json

via environment

export CBR_ENDPOINT_URL="https://api.openai.com/v1/completions"
export CBR_API_KEY="sk-..."
export CBR_MODEL="gpt-4"
cbr -i requests.jsonl -o results.jsonl

input format

one JSON object per line:

{"input": "What is the capital of France?"}
{"input": "Explain quantum computing in simple terms."}

or with full request bodies:

{"body": {"messages": [{"role": "user", "content": "Hello!"}], "model": "gpt-4"}}

output

successes go to your output file:

{"input": "...", "response": {"choices": [...]}, "metadata": {"endpoint": "...", "latency_ms": 234, "attempts": 1}}

failures go to errors.jsonl:

{"input": "...", "error": "HTTP 429: Rate limit exceeded", "status_code": 429, "attempts": 3}

configuration

CLI flags

OPTIONS:
    -i, --input <FILE>        JSONL input file
    -o, --output <FILE>       output file for successful responses
    -e, --errors <FILE>       error output [default: errors.jsonl]
    -r, --rate <N>            max requests/sec [default: 1000]
    -w, --workers <N>         concurrent workers [default: 50]
    -t, --timeout <SECS>      request timeout [default: 30]
    -a, --max-attempts <N>    retry attempts [default: 3]
    -c, --config <FILE>       endpoint config file (JSON)
    -v, --verbose             debug logging
        --json-logs           structured JSON logs
        --no-progress         disable progress bar
        --dry-run             validate without processing

all flags also work as env vars with CBR_ prefix.

endpoint config

for multiple endpoints with weighted distribution:

{
  "endpoints": [
    {
      "url": "https://api.openai.com/v1/completions",
      "weight": 2,
      "api_key": "sk-key-1",
      "model": "gpt-4",
      "max_concurrent": 100
    },
    {
      "url": "https://api.openai.com/v1/completions",
      "weight": 1,
      "api_key": "sk-key-2",
      "model": "gpt-4",
      "max_concurrent": 50
    }
  ],
  "request": {
    "timeout": "30s",
    "rate_limit": 5000,
    "workers": 100
  },
  "retry": {
    "max_attempts": 3,
    "initial_backoff": "100ms",
    "max_backoff": "10s",
    "multiplier": 2.0
  }
}

library usage

use cli_batch_requester::{Config, EndpointConfig, Processor};

#[tokio::main]
async fn main() -> anyhow::Result<()> {
    let config = Config {
        endpoints: vec![EndpointConfig {
            url: "https://api.example.com/v1/completions".to_string(),
            weight: 1,
            api_key: Some("your-key".to_string()),
            model: Some("gpt-4".to_string()),
            max_concurrent: 100,
        }],
        ..Default::default()
    };

    let processor = Processor::new(config)?;
    processor.process_file(
        "requests.jsonl".into(),
        Some("results.jsonl".into()),
        "errors.jsonl".into(),
        true,
    ).await?.print_summary();

    Ok(())
}

troubleshooting

problem fix
"too many open files" ulimit -n 65535
connection timeouts increase --timeout or reduce --workers
429 rate limit errors lower --rate or add more API keys
high memory usage reduce --workers
OpenSSL build errors apt install libssl-dev or use --features rustls

project structure

src/
  lib.rs        — library entry point
  main.rs       — CLI
  config.rs     — configuration
  client.rs     — HTTP client + retry logic
  endpoint.rs   — load balancer
  processor.rs  — processing orchestration
  request.rs    — request/response types
  tracker.rs    — statistics
  error.rs      — error types

contributing

cargo test && cargo fmt && cargo clippy

PRs welcome.

license

MIT

About

10K+ req/s batch API client for LLM endpoints — Rust, async, load-balanced

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages