Skip to content

Stability Shield

Nick edited this page Nov 26, 2025 · 2 revisions

Stability Shield

ReliAPI acts as a Stability Shield between clients (especially IDEs like Cursor) and upstream LLM/HTTP providers, smoothing out bursts, handling rate limits intelligently, and managing pools of provider keys.


Overview

The Stability Shield is a set of features designed to:

  • Smooth out bursts: Prevent clients from hitting upstream rate limits
  • Handle rate limits intelligently: Automatic key rotation and retry with different keys
  • Manage key pools: Health tracking and automatic selection of best key
  • Support client profiles: Different limits and behavior for different client types

Provider Key Pool

What It Does

Provider Key Pool Manager manages multiple API keys per provider, tracks their health, and automatically selects the best key for each request.

Key Features

  • Multi-Key Support: Manage multiple API keys per provider
  • Health Tracking: Track error scores, consecutive errors, and key status
  • Automatic Selection: Select best key based on load score (lowest load = best choice)
  • Status Management: Keys transition between states:
    • active: Normal operation
    • degraded: After 5 consecutive errors
    • exhausted: After 10 consecutive errors
    • banned: Manually banned keys
  • Automatic Recovery: Degraded keys recover to active when error score decreases

Configuration

provider_key_pools:
  openai:
    keys:
      - id: "openai-main-1"
        api_key: "env:OPENAI_KEY_1"
        qps_limit: 3
      - id: "openai-main-2"
        api_key: "env:OPENAI_KEY_2"
        qps_limit: 3
      - id: "openai-backup-1"
        api_key: "env:OPENAI_KEY_BACKUP_1"
        qps_limit: 2

How It Works

  1. Key Selection: For each request, selects key with:

    • status == "active"
    • Lowest load_score = current_qps / qps_limit + error_penalty
  2. Health Tracking: On errors:

    • 429 errors: +0.1 to error score
    • 5xx errors: +0.05 to error score
    • Other errors: +0.02 to error score
  3. Status Transitions:

    • activedegraded: After 5 consecutive errors
    • degradedexhausted: After 10 consecutive errors
    • degradedactive: When error score < 0.3
  4. Automatic Fallback: On 429/5xx errors, automatically retries with different key (up to 3 key switches per request)

Metrics

Exports Prometheus metrics per provider_key_id:

  • reliapi_key_pool_requests_total: Request counts (success/error)
  • reliapi_key_pool_errors_total: Error counts by type
  • reliapi_key_pool_qps: Current QPS per key
  • reliapi_key_pool_status: Key status (0=active, 1=degraded, 2=exhausted, 3=banned)

Rate Smoothing

What It Does

Rate Scheduler uses token bucket algorithm to smooth bursts and enforce per-key, per-tenant, and per-client-profile rate limits before requests reach upstream providers.

Key Features

  • Token Bucket Algorithm: Separate buckets for provider key, tenant, and client profile
  • Burst Protection: Configurable burst size for smoothing traffic spikes
  • Concurrent Limiting: Semaphore-based concurrent request limiting
  • Normalized 429: Returns stable 429 errors from ReliAPI (not upstream chaos)

Configuration

Rate limits are configured via:

  1. Provider Key Pool: qps_limit per key
  2. Client Profiles: max_qps_per_tenant, max_qps_per_provider_key
  3. Tenant Config: rate_limit_rpm (legacy, in-memory)

How It Works

  1. Token Buckets: Creates separate token buckets for:

    • Provider key: provider_key:{key_id}
    • Tenant: tenant:{tenant_name}
    • Client profile: profile:{profile_name}
  2. Rate Limiting: Before each request:

    • Checks provider key bucket
    • Checks tenant bucket
    • Checks client profile bucket
    • Returns 429 if any bucket is empty
  3. Token Refill: Tokens refill at configured QPS rate

  4. Normalized Errors: Returns 429 with:

    • type: "rate_limit"
    • source: "reliapi"
    • retry_after_s: Estimated seconds until retry
    • provider_key_status: Status of provider key
    • hint: "Upstream provider is being protected"

Example

provider_key_pools:
  openai:
    keys:
      - id: "openai-1"
        qps_limit: 3  # 3 requests per second

client_profiles:
  cursor_default:
    max_qps_per_provider_key: 2  # Override: max 2 QPS per key for Cursor

Result: Cursor clients limited to 2 QPS per key (minimum of key limit and profile limit).


Smart Retries

What It Does

Enhanced retry logic with Retry-After header support and automatic key pool fallback on 429/5xx errors.

Key Features

  • Retry-After Support: Respects Retry-After header from upstream (capped at max_s)
  • Key Pool Fallback: On 429/5xx errors, automatically retries with different key from pool
  • Exponential Backoff: Uses exponential backoff with jitter when Retry-After not present
  • Limited Attempts: Maximum 3 key switches per request to prevent infinite loops

How It Works

  1. Retry-After: If upstream returns Retry-After header:

    • Uses header value for delay (capped at max_s)
    • Skips exponential backoff calculation
  2. Key Pool Fallback: On 429/5xx errors:

    • Records error in key pool manager
    • Selects new key from pool (if available)
    • Retries request with new key (up to 3 switches)
    • Falls back to normal error handling if all keys fail
  3. Error Classification: Classifies errors for retry policy:

    • 429: Rate limit errors
    • 5xx: Server errors
    • net: Network/timeout errors

Client Profiles

What It Does

Client Profile Manager provides different rate limits and behavior for different client types (e.g., Cursor IDE vs API clients).

Key Features

  • Profile Detection: Priority: X-Client header → tenant.profiledefault
  • Per-Profile Limits: Different rate limits per client type
  • Configurable: max_parallel_requests, max_qps_per_tenant, max_qps_per_provider_key, burst_size

Configuration

client_profiles:
  cursor_default:
    max_parallel_requests: 4
    max_qps_per_tenant: 3
    max_qps_per_provider_key: 2
    burst_size: 2
    default_timeout_s: 60

Usage

Option 1: X-Client Header (highest priority)

curl -X POST http://localhost:8000/proxy/llm \
  -H "X-Client: cursor" \
  -d '{"target": "openai", "messages": [...]}'

Option 2: Tenant Profile (fallback)

tenants:
  cursor_user:
    api_key: "sk-..."
    profile: "cursor_default"  # Used if X-Client header absent

Option 3: Default Profile (final fallback)

If no header and no tenant profile, uses default profile.

How It Works

  1. Profile Detection:

    • Checks X-Client header first
    • Falls back to tenant.profile if header absent
    • Uses default profile if neither present
  2. Limit Application:

    • Profile limits override provider key limits (minimum wins)
    • Applied to rate scheduler before request
  3. Multiple Limits: All limits checked:

    • Provider key QPS
    • Tenant QPS
    • Client profile QPS

Normalized Rate Limit Errors

What It Does

ReliAPI returns stable, predictable 429 errors with metadata, not random upstream chaos.

Error Format

{
  "success": false,
  "error": {
    "type": "rate_limit",
    "code": "RATE_LIMIT_RELIAPI",
    "message": "Rate limit exceeded (provider_key)",
    "retryable": true,
    "source": "reliapi",
    "retry_after_s": 0.5,
    "target": "openai",
    "status_code": 429,
    "provider_key_status": "active",
    "hint": "Upstream provider is being protected"
  },
  "meta": {
    "target": "openai",
    "cache_hit": false,
    "idempotent_hit": false,
    "retries": 0,
    "duration_ms": 10,
    "request_id": "req_abc123"
  }
}

Fields

  • source: "reliapi": Indicates error from ReliAPI (not upstream)
  • retry_after_s: Estimated seconds until retry (from token bucket)
  • provider_key_status: Status of provider key (active/degraded/exhausted)
  • hint: Helpful message for debugging

Complete Example

# Provider Key Pools
provider_key_pools:
  openai:
    keys:
      - id: "openai-main-1"
        api_key: "env:OPENAI_KEY_1"
        qps_limit: 3
      - id: "openai-main-2"
        api_key: "env:OPENAI_KEY_2"
        qps_limit: 3

# Client Profiles
client_profiles:
  cursor_default:
    max_parallel_requests: 4
    max_qps_per_tenant: 3
    max_qps_per_provider_key: 2
    burst_size: 2
    default_timeout_s: 60

# Tenants
tenants:
  cursor_user:
    api_key: "sk-cursor-..."
    profile: "cursor_default"

Result:

  • Cursor clients (via X-Client: cursor or tenant.profile) limited to 2 QPS per provider key
  • Automatic key rotation on 429/5xx errors
  • Health tracking and automatic recovery
  • Stable 429 errors with retry_after_s metadata

Next Steps

Clone this wiki locally