-
Notifications
You must be signed in to change notification settings - Fork 0
Stability Shield
ReliAPI acts as a Stability Shield between clients (especially IDEs like Cursor) and upstream LLM/HTTP providers, smoothing out bursts, handling rate limits intelligently, and managing pools of provider keys.
The Stability Shield is a set of features designed to:
- Smooth out bursts: Prevent clients from hitting upstream rate limits
- Handle rate limits intelligently: Automatic key rotation and retry with different keys
- Manage key pools: Health tracking and automatic selection of best key
- Support client profiles: Different limits and behavior for different client types
Provider Key Pool Manager manages multiple API keys per provider, tracks their health, and automatically selects the best key for each request.
- Multi-Key Support: Manage multiple API keys per provider
- Health Tracking: Track error scores, consecutive errors, and key status
- Automatic Selection: Select best key based on load score (lowest load = best choice)
-
Status Management: Keys transition between states:
-
active: Normal operation -
degraded: After 5 consecutive errors -
exhausted: After 10 consecutive errors -
banned: Manually banned keys
-
- Automatic Recovery: Degraded keys recover to active when error score decreases
provider_key_pools:
openai:
keys:
- id: "openai-main-1"
api_key: "env:OPENAI_KEY_1"
qps_limit: 3
- id: "openai-main-2"
api_key: "env:OPENAI_KEY_2"
qps_limit: 3
- id: "openai-backup-1"
api_key: "env:OPENAI_KEY_BACKUP_1"
qps_limit: 2-
Key Selection: For each request, selects key with:
status == "active"- Lowest
load_score = current_qps / qps_limit + error_penalty
-
Health Tracking: On errors:
- 429 errors: +0.1 to error score
- 5xx errors: +0.05 to error score
- Other errors: +0.02 to error score
-
Status Transitions:
-
active→degraded: After 5 consecutive errors -
degraded→exhausted: After 10 consecutive errors -
degraded→active: When error score < 0.3
-
-
Automatic Fallback: On 429/5xx errors, automatically retries with different key (up to 3 key switches per request)
Exports Prometheus metrics per provider_key_id:
-
reliapi_key_pool_requests_total: Request counts (success/error) -
reliapi_key_pool_errors_total: Error counts by type -
reliapi_key_pool_qps: Current QPS per key -
reliapi_key_pool_status: Key status (0=active, 1=degraded, 2=exhausted, 3=banned)
Rate Scheduler uses token bucket algorithm to smooth bursts and enforce per-key, per-tenant, and per-client-profile rate limits before requests reach upstream providers.
- Token Bucket Algorithm: Separate buckets for provider key, tenant, and client profile
- Burst Protection: Configurable burst size for smoothing traffic spikes
- Concurrent Limiting: Semaphore-based concurrent request limiting
- Normalized 429: Returns stable 429 errors from ReliAPI (not upstream chaos)
Rate limits are configured via:
-
Provider Key Pool:
qps_limitper key -
Client Profiles:
max_qps_per_tenant,max_qps_per_provider_key -
Tenant Config:
rate_limit_rpm(legacy, in-memory)
-
Token Buckets: Creates separate token buckets for:
- Provider key:
provider_key:{key_id} - Tenant:
tenant:{tenant_name} - Client profile:
profile:{profile_name}
- Provider key:
-
Rate Limiting: Before each request:
- Checks provider key bucket
- Checks tenant bucket
- Checks client profile bucket
- Returns 429 if any bucket is empty
-
Token Refill: Tokens refill at configured QPS rate
-
Normalized Errors: Returns 429 with:
type: "rate_limit"source: "reliapi"-
retry_after_s: Estimated seconds until retry -
provider_key_status: Status of provider key -
hint: "Upstream provider is being protected"
provider_key_pools:
openai:
keys:
- id: "openai-1"
qps_limit: 3 # 3 requests per second
client_profiles:
cursor_default:
max_qps_per_provider_key: 2 # Override: max 2 QPS per key for CursorResult: Cursor clients limited to 2 QPS per key (minimum of key limit and profile limit).
Enhanced retry logic with Retry-After header support and automatic key pool fallback on 429/5xx errors.
-
Retry-After Support: Respects
Retry-Afterheader from upstream (capped at max_s) - Key Pool Fallback: On 429/5xx errors, automatically retries with different key from pool
- Exponential Backoff: Uses exponential backoff with jitter when Retry-After not present
- Limited Attempts: Maximum 3 key switches per request to prevent infinite loops
-
Retry-After: If upstream returns
Retry-Afterheader:- Uses header value for delay (capped at
max_s) - Skips exponential backoff calculation
- Uses header value for delay (capped at
-
Key Pool Fallback: On 429/5xx errors:
- Records error in key pool manager
- Selects new key from pool (if available)
- Retries request with new key (up to 3 switches)
- Falls back to normal error handling if all keys fail
-
Error Classification: Classifies errors for retry policy:
-
429: Rate limit errors -
5xx: Server errors -
net: Network/timeout errors
-
Client Profile Manager provides different rate limits and behavior for different client types (e.g., Cursor IDE vs API clients).
-
Profile Detection: Priority:
X-Clientheader →tenant.profile→default - Per-Profile Limits: Different rate limits per client type
- Configurable: max_parallel_requests, max_qps_per_tenant, max_qps_per_provider_key, burst_size
client_profiles:
cursor_default:
max_parallel_requests: 4
max_qps_per_tenant: 3
max_qps_per_provider_key: 2
burst_size: 2
default_timeout_s: 60Option 1: X-Client Header (highest priority)
curl -X POST http://localhost:8000/proxy/llm \
-H "X-Client: cursor" \
-d '{"target": "openai", "messages": [...]}'Option 2: Tenant Profile (fallback)
tenants:
cursor_user:
api_key: "sk-..."
profile: "cursor_default" # Used if X-Client header absentOption 3: Default Profile (final fallback)
If no header and no tenant profile, uses default profile.
-
Profile Detection:
- Checks
X-Clientheader first - Falls back to
tenant.profileif header absent - Uses
defaultprofile if neither present
- Checks
-
Limit Application:
- Profile limits override provider key limits (minimum wins)
- Applied to rate scheduler before request
-
Multiple Limits: All limits checked:
- Provider key QPS
- Tenant QPS
- Client profile QPS
ReliAPI returns stable, predictable 429 errors with metadata, not random upstream chaos.
{
"success": false,
"error": {
"type": "rate_limit",
"code": "RATE_LIMIT_RELIAPI",
"message": "Rate limit exceeded (provider_key)",
"retryable": true,
"source": "reliapi",
"retry_after_s": 0.5,
"target": "openai",
"status_code": 429,
"provider_key_status": "active",
"hint": "Upstream provider is being protected"
},
"meta": {
"target": "openai",
"cache_hit": false,
"idempotent_hit": false,
"retries": 0,
"duration_ms": 10,
"request_id": "req_abc123"
}
}-
source: "reliapi": Indicates error from ReliAPI (not upstream) -
retry_after_s: Estimated seconds until retry (from token bucket) -
provider_key_status: Status of provider key (active/degraded/exhausted) -
hint: Helpful message for debugging
# Provider Key Pools
provider_key_pools:
openai:
keys:
- id: "openai-main-1"
api_key: "env:OPENAI_KEY_1"
qps_limit: 3
- id: "openai-main-2"
api_key: "env:OPENAI_KEY_2"
qps_limit: 3
# Client Profiles
client_profiles:
cursor_default:
max_parallel_requests: 4
max_qps_per_tenant: 3
max_qps_per_provider_key: 2
burst_size: 2
default_timeout_s: 60
# Tenants
tenants:
cursor_user:
api_key: "sk-cursor-..."
profile: "cursor_default"Result:
- Cursor clients (via
X-Client: cursorortenant.profile) limited to 2 QPS per provider key - Automatic key rotation on 429/5xx errors
- Health tracking and automatic recovery
- Stable 429 errors with retry_after_s metadata
- Configuration — Configuration guide
- Reliability Features — Detailed feature explanations
- Architecture — Architecture overview