Stability Shield

ReliAPI acts as a Stability Shield between clients (especially IDEs like Cursor) and upstream LLM/HTTP providers, smoothing out bursts, handling rate limits intelligently, and managing pools of provider keys.

Overview

The Stability Shield is a set of features designed to:

Smooth out bursts: Prevent clients from hitting upstream rate limits
Handle rate limits intelligently: Automatic key rotation and retry with different keys
Manage key pools: Health tracking and automatic selection of best key
Support client profiles: Different limits and behavior for different client types

Provider Key Pool

What It Does

Provider Key Pool Manager manages multiple API keys per provider, tracks their health, and automatically selects the best key for each request.

Key Features

Multi-Key Support: Manage multiple API keys per provider
Health Tracking: Track error scores, consecutive errors, and key status
Automatic Selection: Select best key based on load score (lowest load = best choice)
Status Management: Keys transition between states:
- active: Normal operation
- degraded: After 5 consecutive errors
- exhausted: After 10 consecutive errors
- banned: Manually banned keys
Automatic Recovery: Degraded keys recover to active when error score decreases

Configuration

provider_key_pools:
  openai:
    keys:
      - id: "openai-main-1"
        api_key: "env:OPENAI_KEY_1"
        qps_limit: 3
      - id: "openai-main-2"
        api_key: "env:OPENAI_KEY_2"
        qps_limit: 3
      - id: "openai-backup-1"
        api_key: "env:OPENAI_KEY_BACKUP_1"
        qps_limit: 2

How It Works

Key Selection: For each request, selects key with:
- status == "active"
- Lowest load_score = current_qps / qps_limit + error_penalty
Health Tracking: On errors:
- 429 errors: +0.1 to error score
- 5xx errors: +0.05 to error score
- Other errors: +0.02 to error score
Status Transitions:
- active → degraded: After 5 consecutive errors
- degraded → exhausted: After 10 consecutive errors
- degraded → active: When error score < 0.3
Automatic Fallback: On 429/5xx errors, automatically retries with different key (up to 3 key switches per request)

Metrics

Exports Prometheus metrics per provider_key_id:

reliapi_key_pool_requests_total: Request counts (success/error)
reliapi_key_pool_errors_total: Error counts by type
reliapi_key_pool_qps: Current QPS per key
reliapi_key_pool_status: Key status (0=active, 1=degraded, 2=exhausted, 3=banned)

Rate Smoothing

What It Does

Rate Scheduler uses token bucket algorithm to smooth bursts and enforce per-key, per-tenant, and per-client-profile rate limits before requests reach upstream providers.

Key Features

Token Bucket Algorithm: Separate buckets for provider key, tenant, and client profile
Burst Protection: Configurable burst size for smoothing traffic spikes
Concurrent Limiting: Semaphore-based concurrent request limiting
Normalized 429: Returns stable 429 errors from ReliAPI (not upstream chaos)

Configuration

Rate limits are configured via:

Provider Key Pool: qps_limit per key
Client Profiles: max_qps_per_tenant, max_qps_per_provider_key
Tenant Config: rate_limit_rpm (legacy, in-memory)

How It Works

Token Buckets: Creates separate token buckets for:
- Provider key: provider_key:{key_id}
- Tenant: tenant:{tenant_name}
- Client profile: profile:{profile_name}
Rate Limiting: Before each request:
- Checks provider key bucket
- Checks tenant bucket
- Checks client profile bucket
- Returns 429 if any bucket is empty
Token Refill: Tokens refill at configured QPS rate
Normalized Errors: Returns 429 with:
- type: "rate_limit"
- source: "reliapi"
- retry_after_s: Estimated seconds until retry
- provider_key_status: Status of provider key
- hint: "Upstream provider is being protected"

Example

provider_key_pools:
  openai:
    keys:
      - id: "openai-1"
        qps_limit: 3  # 3 requests per second

client_profiles:
  cursor_default:
    max_qps_per_provider_key: 2  # Override: max 2 QPS per key for Cursor

Result: Cursor clients limited to 2 QPS per key (minimum of key limit and profile limit).

Smart Retries

What It Does

Enhanced retry logic with Retry-After header support and automatic key pool fallback on 429/5xx errors.

Key Features

Retry-After Support: Respects Retry-After header from upstream (capped at max_s)
Key Pool Fallback: On 429/5xx errors, automatically retries with different key from pool
Exponential Backoff: Uses exponential backoff with jitter when Retry-After not present
Limited Attempts: Maximum 3 key switches per request to prevent infinite loops

How It Works

Retry-After: If upstream returns Retry-After header:
- Uses header value for delay (capped at max_s)
- Skips exponential backoff calculation
Key Pool Fallback: On 429/5xx errors:
- Records error in key pool manager
- Selects new key from pool (if available)
- Retries request with new key (up to 3 switches)
- Falls back to normal error handling if all keys fail
Error Classification: Classifies errors for retry policy:
- 429: Rate limit errors
- 5xx: Server errors
- net: Network/timeout errors

Client Profiles

What It Does

Client Profile Manager provides different rate limits and behavior for different client types (e.g., Cursor IDE vs API clients).

Key Features

Profile Detection: Priority: X-Client header → tenant.profile → default
Per-Profile Limits: Different rate limits per client type
Configurable: max_parallel_requests, max_qps_per_tenant, max_qps_per_provider_key, burst_size

Configuration

client_profiles:
  cursor_default:
    max_parallel_requests: 4
    max_qps_per_tenant: 3
    max_qps_per_provider_key: 2
    burst_size: 2
    default_timeout_s: 60

Usage

Option 1: X-Client Header (highest priority)

curl -X POST http://localhost:8000/proxy/llm \
  -H "X-Client: cursor" \
  -d '{"target": "openai", "messages": [...]}'

Option 2: Tenant Profile (fallback)

tenants:
  cursor_user:
    api_key: "sk-..."
    profile: "cursor_default"  # Used if X-Client header absent

Option 3: Default Profile (final fallback)

If no header and no tenant profile, uses default profile.

How It Works

Profile Detection:
- Checks X-Client header first
- Falls back to tenant.profile if header absent
- Uses default profile if neither present
Limit Application:
- Profile limits override provider key limits (minimum wins)
- Applied to rate scheduler before request
Multiple Limits: All limits checked:
- Provider key QPS
- Tenant QPS
- Client profile QPS

Normalized Rate Limit Errors

What It Does

ReliAPI returns stable, predictable 429 errors with metadata, not random upstream chaos.

Error Format

{
  "success": false,
  "error": {
    "type": "rate_limit",
    "code": "RATE_LIMIT_RELIAPI",
    "message": "Rate limit exceeded (provider_key)",
    "retryable": true,
    "source": "reliapi",
    "retry_after_s": 0.5,
    "target": "openai",
    "status_code": 429,
    "provider_key_status": "active",
    "hint": "Upstream provider is being protected"
  },
  "meta": {
    "target": "openai",
    "cache_hit": false,
    "idempotent_hit": false,
    "retries": 0,
    "duration_ms": 10,
    "request_id": "req_abc123"
  }
}

Fields

source: "reliapi": Indicates error from ReliAPI (not upstream)
retry_after_s: Estimated seconds until retry (from token bucket)
provider_key_status: Status of provider key (active/degraded/exhausted)
hint: Helpful message for debugging

Complete Example

# Provider Key Pools
provider_key_pools:
  openai:
    keys:
      - id: "openai-main-1"
        api_key: "env:OPENAI_KEY_1"
        qps_limit: 3
      - id: "openai-main-2"
        api_key: "env:OPENAI_KEY_2"
        qps_limit: 3

# Client Profiles
client_profiles:
  cursor_default:
    max_parallel_requests: 4
    max_qps_per_tenant: 3
    max_qps_per_provider_key: 2
    burst_size: 2
    default_timeout_s: 60

# Tenants
tenants:
  cursor_user:
    api_key: "sk-cursor-..."
    profile: "cursor_default"

Result:

Cursor clients (via X-Client: cursor or tenant.profile) limited to 2 QPS per provider key
Automatic key rotation on 429/5xx errors
Health tracking and automatic recovery
Stable 429 errors with retry_after_s metadata

Next Steps

Configuration — Configuration guide
Reliability Features — Detailed feature explanations
Architecture — Architecture overview

Stability Shield

Stability Shield

Overview

Provider Key Pool

What It Does

Key Features

Configuration

How It Works

Metrics

Rate Smoothing

What It Does

Key Features

Configuration

How It Works

Example

Smart Retries

What It Does

Key Features

How It Works

Client Profiles

What It Does

Key Features

Configuration

Usage

How It Works

Normalized Rate Limit Errors

What It Does

Error Format

Fields

Complete Example

Next Steps

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally