feat: add self-optimizing adaptive rate limiting with automatic retry #162

leocavalcante · 2025-12-30T17:51:04Z

Summary

Implements a production-ready, self-optimizing adaptive rate limiting system with automatic retry, intelligent error handling, and comprehensive rate limit headers. The system learns from both successes and failures to find the optimal request rate automatically, favoring reliability over speed.

Motivation

When multiple requests arrive faster than GitHub's rate limits allow, the API needs to handle errors gracefully while maximizing throughput. AI agents need to work autonomously all day long without getting blocked by rate limits or transient failures.

This PR introduces:

Bidirectional adaptive rate limiting - learns to speed up and slow down automatically
Smart 1s default - proactive rate limiting enabled by default
Conservative tuning - favors reliability, minimizes 429 errors
Automatic retry with intelligent backoff and jitter
Dynamic rate limit adjustment based on API responses and frequency
Comprehensive rate limit headers on all responses
Smart error categorization (retryable vs permanent)
Request timeout protection against hanging requests

Key Principles

✅ Never reject client requests - Queue everything, let clients decide based on headers
✅ Always return rate limit headers - Full transparency for client-side backpressure
✅ Reliability over speed - Conservative tuning minimizes 429 errors
✅ Automatic optimization - Finds optimal rate automatically without manual tuning
✅ Prevent thundering herd - Jitter distributes retry attempts

Features

🎯 Bidirectional Adaptive Rate Limiting

Starts Smart:

Default: 1 second between requests (proactive)
Use --rate-limit 0 to explicitly disable
Use --rate-limit N to set custom initial rate

Learns from Failures (increases rate limit) - Conservative:

Tracks 429 responses in 60-second windows
Adjusts to GitHub's Retry-After header instantly
Adds 40% buffer when hitting >2 rate limits/minute (very conservative)
More aggressive buffer when hitting many 429s
Maximum: 60s between requests

Learns from Success (decreases rate limit) - Cautious:

Tracks consecutive successful requests
Decreases rate limit by 5% after 20 successes (cautious)
Gradually speeds up only when API consistently allows it
Minimum: 100ms between requests

Conservative Tuning (commit 5a47bff):

Based on production testing showing ~64% rate limit errors:
- Success threshold: 10 → 20 requests (slower speed-up)
- Decrease factor: 10% → 5% (smaller speed-up)
- Buffer trigger: >3 → >2 hits/min (faster slow-down)
- Buffer percentage: 20% → 40% (more aggressive slow-down)

Result: System favors staying at higher rate limits longer

Example Adaptation:

Start: 1.0s → Hit 429 → 14.0s (conservative jump)
→ 3 more 429s → 19.6s (14s × 1.4 buffer applied)
→ 20 successes → 18.6s (5% decrease)
→ 20 successes → 17.7s (5% decrease)
→ Hit 429 → 24.8s (17.7s × 1.4 buffer)

🔄 Automatic Retry with Jitter

Automatic retry: Up to 5 retry attempts per request on transient errors
Intelligent backoff:
- Rate limit (429): Uses Retry-After header from GitHub
- Other transient errors: Exponential backoff (1s, 2s, 4s, 8s, 16s)
Jitter: Adds ±20% randomization to prevent thundering herd
No dropped requests: All requests eventually succeed (unless max retries exceeded)

📊 Dynamic Rate Limit Adjustment

Learning system: Adjusts rate limit based on actual 429 responses
Real-time adaptation: Updates queue delay when rate limits are hit
Proactive prevention: Prevents future 429s by learning from API
Frequency-aware: More aggressive buffer when hitting many 429s
Conservative by default: Minimizes rate limit errors over maximizing speed

🛡️ Resilient Error Handling

Retryable errors (with retry):
- 429 (rate limit) - uses Retry-After header
- 500, 502, 503, 504 (server errors)
- Timeout errors
- Network errors (ECONNRESET, ETIMEDOUT, etc.)
Non-retryable errors (fail immediately):
- 400, 401, 403, 404 (client errors)
Request timeout: 60s timeout per request prevents hanging
Body caching: Prevents "Body already used" errors

📡 Rate Limit Headers on All Responses

Standard Headers:

X-RateLimit-Limit: Maximum requests per minute (based on configured rate)
X-RateLimit-Remaining: Requests remaining before hitting queue depth
X-RateLimit-Reset: Unix timestamp when rate limit window resets
Retry-After: Set when queue depth is high (>50), suggests client slowdown

Custom Headers:

X-Queue-Depth: Current number of requests waiting in queue

Benefits:

Clients can implement client-side backpressure
Full visibility into proxy state
Compatible with standard rate limit conventions
Proactive notification before hitting limits

🚀 Request Queue

1s default with adaptive adjustment: Optimal starting point for most use cases
Never rejects requests: Logs warnings at >100 queue depth, but always queues
Automatic queuing: Requests queued and processed with optimal spacing
Sequential processing: Respects learned rate limit between requests
Queue visibility: Exposed via X-Queue-Depth header
Conservative tuning: Favors reliability over throughput

Implementation

Core Files

src/lib/queue.ts (enhanced)

Conservative parameters:
- successThresholdToDecrease = 20 (was 10)
- decreaseFactor = 0.95 (was 0.9 - 5% vs 10%)
- Buffer trigger: >2 hits (was >3)
- Buffer: 40% (was 20%)
trackRateLimitHit(): Tracks 429 frequency in 60s windows
adjustRateLimitUp(): Increases rate limit on 429s, adds 40% buffer on frequent hits
trackSuccessfulRequest(): Tracks successes and decreases rate limit cautiously
executeWithRetry(): Automatic retry logic with jitter
executeWithTimeout(): 60s request timeout
Smart default: 1 second (not 0)
Min: 100ms, Max: 60s

src/lib/retry.ts

addJitter(): Adds ±20% random jitter to delays
isRetryableError(): Categorizes errors (retryable vs permanent)
isTransientError(): Checks HTTP status codes for transience
parseRetryAfter(): Parses Retry-After header (seconds or HTTP date)
RateLimitError: Structured error with retry information
checkRateLimitError(): Detects 429 responses and extracts retry info

src/lib/rate-limit-headers.ts

addRateLimitHeaders(): Adds standard rate limit headers to responses
Calculates limits, remaining, reset based on queue state
Compatible with GitHub/Anthropic header conventions

src/lib/error.ts (enhanced)

Special handling for RateLimitError
Returns structured 429 responses with retry information
Includes Retry-After header for client compatibility
Caches error body to prevent "Body already used" errors

src/services/copilot/create-chat-completions.ts (enhanced)

Detects 429 responses before other errors
Throws RateLimitError instead of generic HTTPError
Caches error body for reuse in error handlers
Checks for transient errors for retry logic

src/routes/*/handler.ts (enhanced)

Adds rate limit headers to all responses
Non-streaming and streaming responses both include headers

src/start.ts (enhanced)

Updated help text to reflect 1s default
Only overrides default if --rate-limit is explicitly provided

Usage

# Smart 1s default with conservative adaptive adjustment (RECOMMENDED)
# Will learn optimal rate automatically, favoring reliability
copilot-api start

# Explicitly disable rate limiting (not recommended)
copilot-api start --rate-limit 0

# Start with custom rate (e.g., 5s) and adapt from there
copilot-api start --rate-limit 5

Response Headers Example

Normal operation:

HTTP/1.1 200 OK
X-RateLimit-Limit: 60
X-RateLimit-Remaining: 58
X-RateLimit-Reset: 1704729601
X-Queue-Depth: 2
Content-Type: application/json

When queue is high (>50 requests):

HTTP/1.1 200 OK
X-RateLimit-Limit: 12
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1704729600
X-Queue-Depth: 75
Retry-After: 5
Content-Type: application/json

Behavior

Default (1s start with conservative adaptive adjustment):

Requests start with 1s spacing (prevents most 429s)
System learns optimal rate from API responses
Speeds up cautiously when API is consistently happy (20 successes → 5% decrease)
Slows down aggressively when hitting rate limits (40% buffer after 3 hits)
Transient errors trigger automatic retry with exponential backoff
429 errors trigger retry with GitHub's Retry-After + jitter
Rate limit headers reflect actual limits and queue state
Favors reliability: stays at higher rate limits longer

With custom --rate-limit N:

Starts with N seconds spacing
Adapts up or down from there based on API behavior
All other features work the same

Explicitly disabled (--rate-limit 0):

Requests execute immediately (no queue overhead)
Still retries transient errors automatically
Still shows rate limit headers ("unlimited" state)
Not recommended - will hit many 429s initially

Example Conservative Adaptive Behavior:

INFO  Rate limit: waiting 1s before processing next request
WARN  Rate limit hit (attempt 1/5). Waiting 14.2s before retry...
INFO  Rate limit increased: 1.0s → 14.0s (1 hits in last minute)

... 2 more 429s in 60s window ...

DEBUG Frequent rate limits detected (3 in last minute), adding 40% buffer
INFO  Rate limit increased: 14.0s → 19.6s (3 hits in last minute)

... after 20 successful requests with no 429s ...

INFO  Rate limit decreased: 19.6s → 18.6s (20 consecutive successes)

... after 20 more successful requests ...

INFO  Rate limit decreased: 18.6s → 17.7s (20 consecutive successes)

Example Resilience (Rate Limit):

WARN  Rate limit hit (attempt 1/5). Waiting 19.2s before retry...
INFO  Rate limit increased: 17.0s → 19.0s (2 hits in last minute)
DEBUG Frequent rate limits detected (3 in last minute), adding 40% buffer
WARN  Rate limit hit (attempt 2/5). Waiting 4.3s before retry...
INFO  Retrying request after rate limit wait...
SUCCESS POST /v1/messages 200 32s (successful after 2 retries!)

Example Resilience (Transient Error):

WARN  Transient error (attempt 1/5): Request timeout after 60000ms. Waiting 1.2s before retry...
INFO  Retrying request after transient error...
SUCCESS POST /v1/messages 200 3s (successful after 1 retry!)

Benefits

✅ Minimizes 429 errors: Conservative tuning reduces rate limit hits dramatically
✅ Self-optimizing: Finds optimal rate automatically without manual tuning
✅ Reliability first: Favors staying at safe rate limits over maximizing speed
✅ Proactive by default: 1s spacing prevents most 429s before they happen
✅ Learns from failures: Instantly adapts with 40% buffer when hitting rate limits
✅ Cautious on success: Only speeds up by 5% after 20 consecutive successes
✅ Never rejects clients: All requests are queued and processed
✅ Full transparency: Rate limit headers on every response
✅ Client-side backpressure: Clients can self-regulate based on headers
✅ Truly autonomous: No manual intervention needed for transient failures
✅ Unstoppable: Agents can work all day without rate limit blocks
✅ No thundering herd: Jitter prevents synchronized retries
✅ Smart retries: Only retries errors that make sense
✅ Timeout protection: 60s timeout prevents hanging requests
✅ Production-ready: Handles edge cases and provides fallbacks

Technical Notes

Conservative Adaptive Rate Limiting Algorithm

Increase on 429 (Aggressive):

// Track hits in 60s window
if (rateLimitHitsInWindow > 2) {  // Was >3
  // Add 40% buffer when hitting frequently (was 20%)
  adjustedRateLimit = retryAfter * 1.4  // Was 1.2
}
// Always increase, never decrease on 429
if (adjustedRateLimit > currentRateLimit) {
  currentRateLimit = adjustedRateLimit
}

Decrease on Success (Cautious):

// After 20 consecutive successes (was 10)
successCount++
if (successCount >= 20) {  // Was 10
  // Decrease by only 5% (was 10%)
  currentRateLimit = currentRateLimit * 0.95  // Was 0.9
  successCount = 0
}
// Reset counter on any failure

Rate Limit Detection

Parses Retry-After header (supports seconds and HTTP dates)
Reads x-ratelimit-user-retry-after (GitHub-specific)
Extracts x-ratelimit-exceeded for detailed error info
Falls back to 60s if no retry information provided

Retry Strategy

Rate limit errors: Use Retry-After header + jitter (±20%)
Other transient errors: Exponential backoff + jitter (1s, 2s, 4s, 8s, 16s)
Max 5 retries per request
Dynamically adjusts queue rate limit after each 429

Jitter Implementation

// Adds ±20% randomization to prevent thundering herd
// Example: 10s delay becomes 8-12s (randomized)
const delayWithJitter = addJitter(10) // Returns 8-12s

Error Categorization

// Retryable: 429, 500, 502, 503, 504, timeouts, network errors
// Non-retryable: 400, 401, 403, 404, etc.
if (!isRetryableError(error)) {
  throw error // Fail immediately on permanent errors
}

Rate Limit Headers Calculation

// X-RateLimit-Limit: requests per minute
const limit = Math.floor(60 / rateLimitSeconds) // 20s rate = 3 req/min

// X-RateLimit-Remaining: based on queue depth
const remaining = Math.max(0, limit - queueSize)

// X-RateLimit-Reset: when current window expires
const resetTime = lastProcessedTime + rateLimitSeconds

Breaking Changes

Changed default from disabled (0) to 1s adaptive: Previously no rate limiting by default, now starts with 1s and adapts conservatively. Use --rate-limit 0 to explicitly disable.

Test Plan

Implements a RequestQueue class that manages API requests with configurable rate limiting. The queue automatically processes requests at the specified interval, preventing rate limit errors while ensuring all requests are eventually fulfilled. Key features: - Automatic request queuing when rate limit is configured - Sequential processing with configurable delays - Detailed logging of queue status and wait times - Zero overhead when rate limiting is disabled Signed-off-by: leocavalcante <leonardo.cavalcante@picpay.com>

Updates the rate limiting system to use the new RequestQueue for better handling of concurrent requests. Instead of rejecting or blocking requests that exceed the rate limit, they are now automatically queued and processed at the configured interval. Changes: - Add requestQueue to global state - Introduce executeWithRateLimit() wrapper function - Update chat-completions and messages handlers to use queue - Initialize queue with configured rate limit on server startup - Add eslint exception for state assignment race condition The old checkRateLimit() function is kept for backwards compatibility but marked as deprecated. Signed-off-by: leocavalcante <leonardo.cavalcante@picpay.com>

Signed-off-by: leocavalcante <leonardo.cavalcante@picpay.com>

Add utility module to parse rate limit headers from API responses. Supports multiple header formats: - X-RateLimit-* (GitHub style) - RateLimit-* (RFC draft) - Retry-After (for 429 responses) Implements even distribution strategy to calculate optimal delay based on remaining requests and reset time. Signed-off-by: leocavalcante <leonardo.cavalcante@picpay.com>

Add optional onHeaders callback parameter to createChatCompletions service to allow capturing response headers before processing the response body. Works for both streaming and non-streaming responses. Signed-off-by: leocavalcante <leonardo.cavalcante@picpay.com>

Integrate rate limit header parsing in chat completions and messages handlers. The system now: - Parses rate limit headers from API responses - Calculates optimal delay using even distribution - Dynamically updates request queue rate limit - Falls back to configured rate limit when headers absent This enables automatic adaptation to API rate limits and helps prevent abuse detection while maximizing throughput. Signed-off-by: leocavalcante <leonardo.cavalcante@picpay.com>

Add unit tests covering all rate limit header formats and delay calculation logic: - X-RateLimit-* (GitHub/Copilot style) - RateLimit-* (RFC draft format) - Retry-After header (seconds and HTTP date) - Header priority and fallback behavior - Delay calculation with various scenarios Signed-off-by: leocavalcante <leonardo.cavalcante@picpay.com>

Update comments to specify that X-RateLimit-* headers are in GitHub/Copilot style, since this proxy only calls the GitHub Copilot API. Signed-off-by: leocavalcante <leonardo.cavalcante@picpay.com>

Remove the default value of 3 seconds for --rate-limit flag to ensure rate limiting is only active when explicitly requested by the user. This allows requests to execute immediately without queuing when the flag is not provided. Signed-off-by: leocavalcante <leonardo.cavalcante@picpay.com>

Remove adaptive rate limiting since GitHub Copilot API does not provide rate limit headers. The API only returns x-quota-snapshot-* headers which track quota usage, not rate limits, and overage is permitted freely. Removed: - src/lib/rate-limit-parser.ts - tests/rate-limit-parser.test.ts - onHeaders callback from createChatCompletions - Rate limit header parsing logic from handlers The opt-in request queue remains functional for users who want to set a fixed rate limit via --rate-limit flag. Signed-off-by: leocavalcante <leonardo.cavalcante@picpay.com>

Implement comprehensive rate limit resilience to make the API proxy unstoppable for AI agents running autonomously. Features: - Parse Retry-After header from 429 responses (supports seconds and HTTP dates) - Automatic retry with exponential backoff (up to 5 retries) - Dynamic rate limit adjustment based on API responses - Enhanced error messages with retry information - Works with and without --rate-limit flag Implementation: - New RateLimitError class with retry information - parseRetryAfter() handles GitHub's retry headers - RequestQueue.executeWithRetry() handles automatic retries - Queue adjusts rate limit dynamically when 429s occur - forwardError() returns structured 429 responses with Retry-After Benefits: - No manual intervention needed for rate limit errors - Agents can work autonomously all day long - Learns and adapts to API rate limits in real-time - Never drops requests (retries up to 5 times) - Clear logging shows retry attempts and wait times Signed-off-by: leocavalcante <leonardo.cavalcante@picpay.com>

High-impact improvements for production resilience: 1. Jitter for Retry Delays: - Adds ±20% random jitter to all retry delays - Prevents thundering herd when many requests retry simultaneously - Applies to both rate limit retries and exponential backoff 2. Request Timeout: - 60-second timeout per request to prevent hanging - Timeout errors are automatically retried (transient) - Protects against unresponsive upstream API 3. Queue Backpressure Warning (NOT rejection): - Logs warning when queue depth exceeds 100 requests - NEVER rejects client requests - queues them all - Allows API proxy to handle any volume gracefully 4. Better Error Categorization: - Retries transient errors: 429, 500, 502, 503, 504, timeouts, network errors - Fails immediately on permanent errors: 400, 401, 403, 404 - Uses exponential backoff with jitter for non-429 retries (1s, 2s, 4s, 8s, 16s) - Smart detection of HTTPError status codes 5. Rate Limit Headers on All Responses: - X-RateLimit-Limit: Maximum requests per minute - X-RateLimit-Remaining: Requests remaining before rate limit - X-RateLimit-Reset: Unix timestamp when rate limit resets - X-Queue-Depth: Current queue size for visibility - Retry-After: Set when queue depth is high (>50 requests) Benefits: - Clients get proactive rate limit information - No client requests are ever rejected - Better distributed retry attempts (jitter) - Faster failure on permanent errors - Automatic recovery from transient failures - Full transparency into API proxy state Signed-off-by: leocavalcante <leonardo.cavalcante@picpay.com>

Store error body text when creating HTTPError to avoid consuming Response body twice. The body can only be read once, so we cache it during initial error logging and reuse it in forwardError. This fixes crashes when handling non-retryable errors like 499 (client canceled request). Signed-off-by: leocavalcante <leonardo.cavalcante@picpay.com>

Adds intelligent rate limiting that learns from both successes and failures: **Adaptive Increase (on 429s):** - Tracks rate limit hits in 60s windows - Adds 20% buffer when >3 hits/minute - Adjusts to GitHub's Retry-After + buffer **Adaptive Decrease (on successes):** - Tracks consecutive successful requests - Decreases rate limit by 10% after 10 successes - Speeds up when API allows it **Smart Default:** - Changed from 0 (disabled) to 1s (adaptive enabled) - Use --rate-limit 0 to explicitly disable - Minimum: 100ms, Maximum: 60s **Frequency-Based Adjustment:** - More conservative when hitting many 429s - Gradually speeds up when API is happy - Prevents over-aggressive rate limiting This reduces 429 responses while maximizing throughput automatically. Signed-off-by: leocavalcante <leonardo.cavalcante@picpay.com>

Only call updateRateLimit() if --rate-limit flag is explicitly provided. This allows the RequestQueue constructor's default of 1s to take effect for adaptive rate limiting. Signed-off-by: leocavalcante <leonardo.cavalcante@picpay.com>

Makes the system much more conservative to reduce 429 errors: **Slower Decrease (Speed Up Less Aggressively):** - Increase success threshold: 10 → 20 requests - Decrease factor: 10% (0.9) → 5% (0.95) - Now requires 20 consecutive successes before speeding up by only 5% **Faster Increase with Buffer (Slow Down More Aggressively):** - Lower buffer trigger: >3 hits → >2 hits per minute - Increase buffer: 20% → 40% - Applies 40% buffer after just 3 rate limit hits in 60s window **Impact:** - Reduces 429 errors significantly - Stays at higher rate limits longer - More cautious when speeding up - More aggressive when hitting rate limits This should dramatically reduce the ~64% rate limit error rate observed in production. Signed-off-by: leocavalcante <leonardo.cavalcante@picpay.com>

leocavalcante · 2026-01-08T18:14:12Z

⚙️ Conservative Tuning Applied

Based on production testing showing ~64% rate limit errors, the adaptive algorithm has been made significantly more conservative:

Changes in commit 5a47bff:

Slower Speed-Up (Decrease Rate Limit)

Success threshold: 10 → 20 requests
- Now requires 20 consecutive successes before speeding up
Decrease factor: 10% → 5%
- Only speeds up by 5% instead of 10% each time

Faster Slow-Down (Increase Rate Limit)

Buffer trigger: >3 hits → >2 hits per minute
- Applies buffer after just 3 rate limit hits (was 4)
Buffer percentage: 20% → 40%
- Adds 40% extra time instead of 20% when hitting frequent 429s

Expected Impact:

Dramatically fewer 429 errors
System stays at higher (slower) rate limits longer
More cautious when attempting to speed up
More aggressive when detecting rate limit pressure

The system will now favor reliability over throughput optimization.

Implements two major optimizations to balance speed and reliability: **1. Adaptive Decrease Strategy (Smarter Initial Rate Discovery)** - When far from limit (>10s): 10% decrease after 10 successes - When medium distance (2-10s): 7% decrease after 15 successes - When close to limit (<2s): 5% decrease after 20 successes (cautious) Impact: Converges to optimal rate much faster (3-4x improvement) Example: 20s → 18s → 16.7s → 15.4s (instead of 20s → 19s → 18.1s...) **2. Request Deduplication/Caching** - In-memory cache with 30s TTL, max 1000 entries - SHA-256 hash of request payload as cache key - Only caches non-streaming responses - Reduces GitHub API calls for identical requests - Automatic cleanup of expired entries Impact: Dramatically reduces API calls for duplicate requests Example: count_tokens requests, repeated messages **Benefits:** - Faster convergence from high rate limits (20s → ~10s) - Reduced GitHub API usage (fewer 429s, lower quota consumption) - Better client experience (faster responses for cached requests) - Still maintains conservative approach near actual limits **Implementation:** - Created RequestCache class with get/set/cleanup methods - Integrated cache into both /messages and /chat-completions handlers - Cache only used for non-streaming to keep implementation simple - Cache returns null if entry expired or not found Signed-off-by: leocavalcante <leonardo.cavalcante@picpay.com>

leocavalcante · 2026-01-08T18:36:46Z

🚀 Speed & Efficiency Improvements

Added two major optimizations in commit 2339859:

1. Adaptive Decrease Strategy (Faster Convergence)

Instead of fixed 5% decrease after 20 successes, the system now adapts based on distance from limit:

Distance from Limit	Success Threshold	Decrease %	Result
>10s (far)	10 successes	10%	Aggressive speed-up
2-10s (medium)	15 successes	7%	Moderate speed-up
<2s (close)	20 successes	5%	Cautious (unchanged)

Impact:

Converges 3-4x faster from high rate limits
Example: 20s → 18s → 16.2s → 14.6s (instead of 20s → 19s → 18.1s...)
Still maintains caution when close to actual limits

2. Request Caching (Reduced API Calls)

Simple in-memory cache for duplicate requests:

TTL: 30 seconds
Max size: 1000 entries
Key: SHA-256 hash of request payload
Scope: Non-streaming responses only

Impact:

Dramatically reduces GitHub API calls for identical requests
Common scenarios: count_tokens requests, repeated messages, retries
Faster response times for cached requests (no GitHub API call)
Lower quota consumption

Combined benefit: Faster optimization + fewer API calls = better experience for proxy clients while maintaining low error rate.

Combination approach to minimize 429 errors: - Always add buffer on every rate limit hit (no more bare minimum) - 1st hit: +25% buffer - 2+ hits: +50% buffer - 3+ hits: +75% buffer This addresses the issue of hitting multiple 429s in succession by being immediately conservative on the first rate limit, then increasingly cautious if we continue to hit limits. Signed-off-by: leocavalcante <leonardo.cavalcante@picpay.com>

leocavalcante added 7 commits December 30, 2025 14:49

chore: update bun lockfile

4f52b53

Signed-off-by: leocavalcante <leonardo.cavalcante@picpay.com>

chore: add .claude/ to gitignore

c7d9af4

Signed-off-by: leocavalcante <leonardo.cavalcante@picpay.com>

leocavalcante changed the title ~~feat: add request queue for better rate limiting~~ feat: add request queue with adaptive rate limiting Jan 8, 2026

leocavalcante added 3 commits January 8, 2026 10:31

docs: clarify rate limit headers are GitHub/Copilot style

94ea329

Update comments to specify that X-RateLimit-* headers are in GitHub/Copilot style, since this proxy only calls the GitHub Copilot API. Signed-off-by: leocavalcante <leonardo.cavalcante@picpay.com>

leocavalcante changed the title ~~feat: add request queue with adaptive rate limiting~~ feat: add opt-in request queue with adaptive rate limiting Jan 8, 2026

leocavalcante changed the title ~~feat: add opt-in request queue with adaptive rate limiting~~ feat: add opt-in request queue for rate limiting Jan 8, 2026

leocavalcante changed the title ~~feat: add opt-in request queue for rate limiting~~ feat: add resilient rate limiting with automatic retry Jan 8, 2026

leocavalcante added 4 commits January 8, 2026 11:22

leocavalcante changed the title ~~feat: add resilient rate limiting with automatic retry~~ feat: add self-optimizing adaptive rate limiting with automatic retry Jan 8, 2026

leocavalcante closed this by deleting the head repository Jan 8, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

feat: add self-optimizing adaptive rate limiting with automatic retry #162

feat: add self-optimizing adaptive rate limiting with automatic retry #162

Uh oh!

leocavalcante commented Dec 30, 2025 •

edited

Loading

Uh oh!

leocavalcante commented Jan 8, 2026

Uh oh!

leocavalcante commented Jan 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

feat: add self-optimizing adaptive rate limiting with automatic retry #162

feat: add self-optimizing adaptive rate limiting with automatic retry #162

Uh oh!

Conversation

leocavalcante commented Dec 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation

Key Principles

Features

🎯 Bidirectional Adaptive Rate Limiting

🔄 Automatic Retry with Jitter

📊 Dynamic Rate Limit Adjustment

🛡️ Resilient Error Handling

📡 Rate Limit Headers on All Responses

🚀 Request Queue

Implementation

Core Files

Usage

Response Headers Example

Behavior

Benefits

Technical Notes

Conservative Adaptive Rate Limiting Algorithm

Rate Limit Detection

Retry Strategy

Jitter Implementation

Error Categorization

Rate Limit Headers Calculation

Breaking Changes

Test Plan

Uh oh!

leocavalcante commented Jan 8, 2026

⚙️ Conservative Tuning Applied

Slower Speed-Up (Decrease Rate Limit)

Faster Slow-Down (Increase Rate Limit)

Uh oh!

leocavalcante commented Jan 8, 2026

🚀 Speed & Efficiency Improvements

1. Adaptive Decrease Strategy (Faster Convergence)

2. Request Caching (Reduced API Calls)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

leocavalcante commented Dec 30, 2025 •

edited

Loading