Add Kilocode provider support by ShmidtS · Pull Request #115 · Mirrowel/LLM-API-Key-Proxy

ShmidtS · 2026-01-31T18:24:31Z

Description

This PR adds support for the Kilocode LLM provider (https://kilocode.ai).

Kilocode is an OpenRouter-compatible API that routes requests to various LLM providers through a unified interface.

Changes

New file: src/rotator_library/providers/kilocode_provider.py
- Implements KilocodeProvider class inheriting from ProviderInterface
- Supports endpoint: https://kilocode.ai/api/openrouter/chat/completions
- Handles model routing with provider prefixes (e.g., minimax/, moonshotai/, z-ai/)
Modified: src/proxy_app/provider_urls.py
- Added Kilocode endpoint configuration
Modified: src/rotator_library/provider_config.py
- Registered Kilocode provider in LITELLM_PROVIDERS with category popular
- Added support for KILOCODE_API_BASE environment variable

Provider Features

API Endpoint: https://kilocode.ai/api/openrouter/chat/completions
Model Routing: Supports models with provider prefixes:
- minimax/minimax-m2.1:free
- moonshotai/kimi-k2.5:free
- z-ai/glm-4.7:free
- Other models in provider/model format

Testing

Provider correctly registers via the plugin system
Endpoint resolves properly for API requests
Model routing with prefixes works as expected

Important

Adds support for Kilocode provider with new class and configuration updates, plus minor refactoring for readability.

Kilocode Provider Support:
- Adds KilocodeProvider class in kilocode_provider.py for handling Kilocode API interactions.
- Supports model routing with prefixes like minimax/, moonshotai/, z-ai/.
Configuration Changes:
- Registers Kilocode in provider_config.py with category popular and supports KILOCODE_API_BASE env variable.
- Adds Kilocode endpoint to provider_urls.py.
Miscellaneous:
- Refactors code in client.py and usage_manager.py for better readability and consistency.
- Fixes Windows console encoding issues in main.py.

^{This description was created by}^{for e11ea15. You can customize this summary. It will automatically update as commits are pushed.}

mirrobot-agent · 2026-01-31T18:25:06Z

I'm starting my review of the Kilocode provider support. Diving into the provider implementation and the usage manager updates now!

mirrobot-agent

Overall Assessment

This PR successfully adds support for the Kilocode provider and introduces some valuable general improvements, such as the Windows console encoding fix and an optimized waiting mechanism in the UsageManager. The implementation follows the existing patterns for providers.

Architectural Feedback

The addition of the kilocode provider is straightforward. However, there are a few points regarding configuration consistency and concurrency efficiency that should be addressed to ensure the proxy remains robust under high load.

Key Suggestions

Configuration Consistency: Ensure kilocode_provider.py honors the KILOCODE_API_BASE environment variable.
Concurrency Optimization: Refine the waiting logic in UsageManager to avoid unnecessary delays when multiple keys are busy.
Request Path Consistency: Verify the /v1 suffix for Kilocode endpoints to ensure full OpenRouter compatibility as described.
Incomplete Fix: Apply the nvidia provider routing fix to the streaming request path as well.

Nitpicks and Minor Points

The UsageManager wait logic is duplicated; refactoring this into a helper would improve maintainability.

Questions for the Author

Does Kilocode definitely NOT require /v1 in the base URL for the models and chat/completions endpoints? The PR description and code seem slightly inconsistent on this point.

This review was generated by an AI assistant.

mirrobot-agent · 2026-01-31T18:28:38Z

src/rotator_library/providers/kilocode_provider.py

+        """
+        try:
+            response = await client.get(
+                "https://kilocode.ai/api/openrouter/models",


This URL is hardcoded and ignores the KILOCODE_API_BASE environment variable defined in provider_config.py. Consider using the environment variable if available, falling back to this URL as the default.

mirrobot-agent · 2026-01-31T18:28:38Z

src/rotator_library/usage_manager.py

+                            wait_condition = self.key_states[available_keys[0]]["condition"]
+                            try:
+                                async with wait_condition:
+                                    remaining_budget = deadline - time.time()
+                                    if remaining_budget <= 0:
+                                        break
+                                    await asyncio.wait_for(
+                                        wait_condition.wait(), timeout=min(0.5, remaining_budget)
+                                    )


Waiting on only the first available key's condition variable is suboptimal. If any other key in available_keys is released, this task will not wake up until the 0.5s timeout. While an improvement over a hard sleep, it could still lead to unnecessary delays under high concurrency. A better approach would be to wait on a shared condition variable for the model or provider.

mirrobot-agent · 2026-01-31T18:28:38Z

src/rotator_library/usage_manager.py

-                        await asyncio.sleep(1)
+                        # Pick any available key to wait on (they're all locked)
+                        if available_keys:
+                            wait_condition = self.key_states[available_keys[0]]["condition"]


This block of logic for waiting on keys is duplicated from earlier in the file (around line 2399). Consider refactoring it into a helper method.

mirrobot-agent · 2026-01-31T18:28:38Z

src/proxy_app/provider_urls.py

    "cohere": "https://api.cohere.ai/v1",
    "bedrock": "https://bedrock-runtime.us-east-1.amazonaws.com",
    "openrouter": "https://openrouter.ai/api/v1",
+    "kilocode": "https://kilocode.ai/api/openrouter",


The PR description mentions https://kilocode.ai/api/openrouter/v1/chat/completions as the endpoint, but here /v1 is missing. Please verify if Kilocode requires /v1 in the path for OpenRouter compatibility.

mirrobot-agent · 2026-01-31T18:28:38Z

src/rotator_library/client.py

+                    # If the provider is 'nvidia', set the custom provider to 'nvidia_nim'
+                    # and strip the prefix from the model name for LiteLLM.
+                    if provider == "nvidia":
+                        litellm_kwargs["custom_llm_provider"] = "nvidia_nim"
+                        litellm_kwargs["model"] = model.split("/", 1)[1]


This nvidia provider fix (setting custom_llm_provider and stripping the model prefix) should also be applied to _make_streaming_request (around line 2445) to ensure consistent behavior across both request types.

…cooldowns - cooldown_manager: rename provider parameter to credential, update docstrings - client: replace all start_cooldown(provider) calls with start_cooldown(current_cred) - translator: use empty string instead of None for missing content Allows other credentials from the same provider to be used while one is cooling down. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…ders Add _apply_provider_headers() method to strip problematic authorization headers from client requests and apply correct provider-specific headers from environment variables ({PROVIDER}_API_HEADERS). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…very - Expand problematic headers list with Anthropic-specific headers (anthropic-version, anthropic-beta, x-anthropic-*, etc.) - Refactor header removal with case-insensitive prefix matching - Remove raw_request passing to LiteLLM to prevent header leakage - Treat all errors as recoverable via credential rotation (BadRequestError/InvalidRequestError may come from provider issues) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…r handling - Add _normalize_model_string() and _extract_provider_from_model() helpers - Replace direct model.split("/") calls with safe extraction methods - Improve 400 error classification: upstream transient errors now trigger rotation (503) - Keep strict fail-fast for policy/safety violations (invalid_request) - Normalize model strings in request_sanitizer for reliable comparisons - Reduce log noise: change info→debug for key release notifications Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…ider support - Add token_calculator.py module for context window-aware max_tokens adjustment - Integrate ModelRegistry lookup to prevent context overflow errors - Add Kilocode provider with default API base and unsupported param filtering - Extend stream_options unsupported providers list (iflow, kilocode) - Handle BadRequestError/InvalidRequestError as transient errors for credential rotation Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Updated README to remove badge links and clarify functionality.

- Implement lazy HTTP client initialization with automatic recovery when client is closed, preventing "Client is closed" errors - Add early error detection in streaming to catch provider errors sooner with chunk index tracking for better diagnostics - Add get_retry_backoff() with error-type-specific backoff strategies: api_connection gets aggressive retry, server_error exponential backoff - Reduce default rate limit cooldown from 60s to 10s - Add parse_quota_error() in KilocodeProvider for OpenRouter format with retry_after extraction support Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

… handling Performance optimizations: - Add streaming_fast.py with orjson for 3-5x faster JSON parsing - Add http_client_pool.py with connection warmup and HTTP/2 support - Add credential_weight_cache.py for optimized credential selection - Add batched_persistence.py for background batched disk writes - Add async_locks.py with ReadWriteLock, AsyncSemaphore, RateLimitedLock Context overflow handling: - Add ContextOverflowError for pre-emptive request rejection - Update sanitize_request_payload to return (payload, should_reject) - Reject requests when input tokens exceed model context window Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…ety buffers - Increase default safety buffer from 100 to 1000 tokens - Add MAX_INPUT_RATIO (0.90) to reject oversized inputs - Add provider-specific safety buffers (kilocode: 2000, openrouter: 1500, etc.) - Add get_provider_safety_buffer() for auto-detection - calculate_max_tokens now returns None when input exceeds context - adjust_max_tokens_in_payload returns (payload, should_reject) tuple Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…ompatibility Add precomputed_input_tokens parameter to streaming wrappers to handle providers that don't return usage in stream (e.g., Kilocode without stream_options). This ensures Claude Code's context management works correctly across all providers. - Add precomputed_input_tokens param to streaming.py and streaming_fast.py - Pre-compute input tokens in client.py before streaming starts - Provider-returned usage still takes precedence when available - Increase MAX_INPUT_RATIO from 0.90 to 1.0 for full context utilization Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Major performance optimizations for high-throughput scenarios: - HttpClientPool integration: Replace per-instance httpx.AsyncClient with singleton pool supporting HTTP/2, connection pre-warming, and automatic recovery. Separate pools for streaming/non-streaming requests. - Connection pre-warming: Auto-detect provider API endpoints and establish TCP+TLS connections at startup, reducing first-request latency by ~10x. - Sharded locks: Add per-provider lock infrastructure in UsageManager for parallel access to different providers' data, reducing lock contention. - Credential priority caching: Cache priority/tier lookups per provider, eliminating repeated provider_plugin calls during request processing. - Fast path in acquire_key: Single-credential case bypasses complex priority and fair cycle logic for immediate lock acquisition. - Batch persistence option: Integrate UsagePersistenceManager for debounced disk writes (enabled via USAGE_BATCH_PERSISTENCE=true env var). Configurable via environment variables: - HTTP_MAX_KEEPALIVE (default: 50) - HTTP_MAX_CONNECTIONS (default: 200) - USAGE_BATCH_PERSISTENCE (default: false) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add _reset_litellm_client_cache() method to clear LiteLLM's internal async HTTP client cache when encountering "Cannot send a request, as the client has been closed" errors. This ensures fresh client creation on retry attempts. Also add api_bases property to ProviderConfig for read-only access to configured API bases. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Previously, the HTTP client warmup used hardcoded provider URLs that could differ from the actual API_BASE configured via environment variables. This caused connection warmup to hit wrong endpoints (e.g., api.kilocode.ai instead of kilocode.ai/api/openrouter). Now the warmup endpoint resolution follows this priority: 1. Custom API_BASE from environment variables 2. Hardcoded defaults as fallback Also fixed _provider_endpoints cache to store resolved endpoints correctly based on configuration. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Some providers (e.g., Z.AI) return finish_reason values like 'error', 'unknown', 'abort' that don't match LiteLLM's Pydantic schema, causing ValidationError. Add monkey-patch to normalize these values before validation runs. Also update .gitignore for tool directories and simplify start_proxy.bat by removing venv activation step. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Implement provider resilience mechanisms to prevent cascade exhaustion during IP-level throttling: - Add ProviderCircuitBreaker with CLOSED/OPEN/HALF_OPEN states - Add IPThrottleDetector to correlate 429 errors across credentials - Integrate circuit breaker into client for fail-fast on blocked providers - Add configurable failure thresholds and recovery timeouts - Support custom recovery duration from retry-after headers Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Increase HTTP keepalive from 30s to 60s for better connection reuse - Make warmup async to not block initialization - Add memory limit (100 records) to IP throttle detector - Minor formatting and code improvements across modules Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Integrate OpenCode as a new provider with: - OpencodeProvider class with auth headers (HTTP-Referer, X-Title) - Quota/rate-limit error parsing (codes 1113, 429) - Provider-specific header injection in RotatingClient - Configuration in provider_config.py Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Mirrowel · 2026-03-03T10:58:46Z

@greptile

Mirrowel · 2026-03-03T10:59:37Z

@ShmidtS I didn't notice before, but the base branch here is main - it should be Dev branch instead. Main is never directly modified and is outdated

greptile-apps · 2026-03-03T11:01:58Z

Greptile Summary

This PR introduces support for the Kilocode LLM provider, which acts as an OpenRouter-compatible API gateway. The implementation spans a new provider class (kilocode_provider.py), endpoint registration in provider_urls.py, provider config registration, request sanitization for Kilocode's free-model parameter restrictions, and backoff/throttle-detection tuning in error_handler.py. Minor refactoring is also applied to client.py, usage_manager.py, and main.py for readability.

Key changes:

KilocodeProvider implements ProviderInterface with get_models and parse_quota_error logic that understands the OpenRouter error envelope format
KILOCODE_UNSUPPORTED_PARAMS strips parameters that Kilocode's free models reject (e.g. frequency_penalty, top_p, response_format)
Kilocode is added to STREAM_OPTIONS_UNSUPPORTED_PROVIDERS in client.py so stream_options is never forwarded
Kilocode is added to PROXY_PROVIDERS and PROVIDER_BACKOFF_CONFIGS for appropriate retry behaviour
provider_config.py registers a default KILOCODE_API_BASE so the provider auto-routes as a custom OpenAI-compatible endpoint even without an explicit env var

Critical issues identified:

kilocode_provider.py line 31 uses wrong domain (kilo.ai instead of kilocode.ai) for the get_models call, breaking model discovery
get_models hardcodes the base URL instead of reading KILOCODE_API_BASE, making it inconsistent with how OpencodeProvider handles its env var
Only httpx.RequestError is caught in get_models; httpx.HTTPStatusError (raised by raise_for_status()) is a sibling exception and will propagate uncaught
provider_urls.py falls through to the f"{base_url}/v1/{action}" fallback, producing https://kilocode.ai/api/openrouter/v1/chat/completions instead of the documented /chat/completions path
main.py carries an unused from glob import glob import
error_handler.py docstring example uses wrong domain (kilo.ai)

Confidence Score: 1/5

Not safe to merge — multiple critical bugs prevent Kilocode provider from functioning: wrong domain blocks model discovery, incorrect URL path produces 404s for all requests, and missing error handling could expose HTTPStatusError.
Three critical bugs have been verified that will completely break Kilocode provider functionality: (1) the hardcoded URL uses wrong domain kilo.ai preventing model fetching, (2) the URL construction in provider_urls.py appends an extra /v1/ segment so every chat completion request hits a non-existent endpoint, and (3) unhandled HTTPStatusError exceptions will surface as uncaught errors. These are not edge cases but core functionality breaks.
src/rotator_library/providers/kilocode_provider.py (wrong domain, hardcoded URL, missing error handling), src/proxy_app/provider_urls.py (incorrect URL fallback for kilocode), src/rotator_library/error_handler.py (docstring domain typo), src/proxy_app/main.py (unused import)

Sequence Diagram

sequenceDiagram
    participant Client as API Client
    participant Proxy as proxy_app/main.py
    participant Sanitizer as request_sanitizer.py
    participant RotClient as RotatingClient (client.py)
    participant ProvConfig as provider_config.py
    participant LiteLLM as LiteLLM
    participant Kilocode as kilocode.ai API

    Client->>Proxy: POST /v1/chat/completions\n(model: kilocode/minimax/minimax-m2.1:free)
    Proxy->>Sanitizer: sanitize_request_payload(payload, model)
    Note over Sanitizer: Strips unsupported params for<br/>:free and z-ai/ models<br/>(top_p, frequency_penalty, etc.)
    Sanitizer-->>Proxy: cleaned payload

    Proxy->>RotClient: acompletion(**kwargs)
    Note over RotClient: Detects kilocode provider,<br/>removes stream_options
    RotClient->>ProvConfig: convert_for_litellm(model=kilocode/...)
    Note over ProvConfig: kilocode not in KNOWN_PROVIDERS<br/>→ custom provider path<br/>model = openai/minimax/minimax-m2.1:free<br/>api_base = https://kilocode.ai/api/openrouter<br/>custom_llm_provider = openai
    ProvConfig-->>RotClient: modified kwargs

    RotClient->>LiteLLM: acompletion(model=openai/minimax/..., api_base=...)
    LiteLLM->>Kilocode: POST /api/openrouter/chat/completions\nAuthorization: Bearer {key}
    Kilocode-->>LiteLLM: response / stream
    LiteLLM-->>RotClient: ModelResponse
    RotClient-->>Proxy: response
    Proxy-->>Client: SSE stream / JSON response

Comments Outside Diff (1)

src/proxy_app/provider_urls.py, line 33-78 (link)

Incorrect URL path construction for Kilocode requests

The get_provider_endpoint function constructs URLs by checking if base_url ends with "/v1" or "/v1/openai". Since Kilocode's base URL ("https://kilocode.ai/api/openrouter") matches neither suffix, the function falls through to the generic fallback (line 78):
```
return f"{base_url}/v1/{action}"
```
This produces https://kilocode.ai/api/openrouter/v1/chat/completions, but the correct endpoint (per provider_config.py and Kilocode documentation) is https://kilocode.ai/api/openrouter/chat/completions — no /v1/ segment.

All Kilocode requests will hit a non-existent 404 endpoint. Either:
1. Add a kilocode-specific case before the fallback, or
2. Change the entry to "kilocode": "https://kilocode.ai/api/openrouter/v1" if the /v1 path is actually supported

_{Last reviewed commit: 82bb900}

greptile-apps

_{31 files reviewed, 2 comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-03-03T11:02:02Z

src/rotator_library/providers/kilocode_provider.py

+        """
+        try:
+            response = await client.get(
+                "https://kilo.ai/api/openrouter/models",


Wrong domain - should be kilocode.ai not kilo.ai

Suggested change

"https://kilo.ai/api/openrouter/models",

"https://kilocode.ai/api/openrouter/models",

Prompt To Fix With AI

This is a comment left during a code review. Path: src/rotator_library/providers/kilocode_provider.py Line: 34 Comment: Wrong domain - should be `kilocode.ai` not `kilo.ai` ```suggestion "https://kilocode.ai/api/openrouter/models", ``` How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-03-03T11:02:03Z

src/rotator_library/providers/kilocode_provider.py

+    async def get_models(self, api_key: str, client: httpx.AsyncClient) -> List[str]:
+        """
+        Fetches the list of available models from the Kilocode API.
+        """
+        try:
+            response = await client.get(
+                "https://kilo.ai/api/openrouter/models",
+                headers={"Authorization": f"Bearer {api_key}"},
+            )
+            response.raise_for_status()
+            return [
+                f"kilocode/{model['id']}" for model in response.json().get("data", [])
+            ]
+        except httpx.RequestError as e:
+            lib_logger.error(f"Failed to fetch Kilocode models: {e}")
+            return []


Inconsistent with OpencodeProvider which uses configurable OPENCODE_API_BASE. Consider using KILOCODE_API_BASE env var here for consistency:

import os api_base = os.getenv("KILOCODE_API_BASE", "https://kilocode.ai/api/openrouter").rstrip("/") response = await client.get( f"{api_base}/models", headers={"Authorization": f"Bearer {api_key}"}, )

Prompt To Fix With AI

This is a comment left during a code review. Path: src/rotator_library/providers/kilocode_provider.py Line: 28-43 Comment: Inconsistent with `OpencodeProvider` which uses configurable `OPENCODE_API_BASE`. Consider using `KILOCODE_API_BASE` env var here for consistency: ```python import os api_base = os.getenv("KILOCODE_API_BASE", "https://kilocode.ai/api/openrouter").rstrip("/") response = await client.get( f"{api_base}/models", headers={"Authorization": f"Bearer {api_key}"}, ) ``` How can I resolve this? If you propose a fix, please make it concise.

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

mirrobot-agent

Follow-up Review: Critical Issues Still Unaddressed

This follow-up review focuses on previously identified issues that remain unfixed in the current commits.

Critical Issues (Blocking)

Kilocode Provider Domain Bug (kilocode_provider.py:34)
- Status: Still unfixed from previous review
- Issue: Uses https://kilo.ai/api/openrouter/models instead of https://kilocode.ai/api/openrouter/models
- Impact: The models endpoint will fail completely with the wrong domain

High Priority Issues

Missing NVIDIA Fix in Streaming Path (client.py)
- Status: Still unfixed from previous review by mirrobot-agent
- Issue: The nvidia provider fix (setting custom_llm_provider and stripping model prefix) is applied in the non-streaming path (line 2242-2244) but missing from the streaming path (~line 2825)
- Impact: Inconsistent behavior between streaming and non-streaming requests for nvidia provider
Configuration Inconsistency (kilocode_provider.py)
- OpencodeProvider supports OPENCODE_API_BASE environment variable for configurable API base
- KilocodeProvider hardcodes the URL (with the wrong domain), lacking KILOCODE_API_BASE support

Medium Priority Issues

Duplicated Logic in UsageManager (usage_manager.py)
- The waiting logic block (lines 2501-2513) is duplicated at lines 2727-2740
- Previously suggested refactoring into a helper method remains unaddressed
Suboptimal Condition Variable Waiting (usage_manager.py)
- Waiting only on available_keys[0] means tasks won't wake up when other keys are released until the 0.5s timeout expires

Previously Fixed

Windows console encoding fix in main.py - appears to be in place

Summary

The critical domain bug in the Kilocode provider must be fixed before merging - it will cause immediate failures. The missing NVIDIA streaming fix should also be addressed for consistency.

This follow-up review was generated by an AI assistant.

mirrobot-agent · 2026-03-03T11:13:09Z

src/rotator_library/providers/kilocode_provider.py

+        """
+        try:
+            response = await client.get(
+                "https://kilo.ai/api/openrouter/models",


Critical Bug (Still Unaddressed): Wrong domain - uses kilo.ai instead of kilocode.ai. This was flagged in the previous review by greptile-apps but remains unfixed. The models endpoint will fail with this incorrect domain.

Suggested fix:

Suggested change

"https://kilo.ai/api/openrouter/models",

"https://kilocode.ai/api/openrouter/models",

mirrobot-agent · 2026-03-03T11:13:09Z

src/rotator_library/providers/kilocode_provider.py

+    - And other provider/model combinations
+    """
+
+    async def get_models(self, api_key: str, client: httpx.AsyncClient) -> List[str]:


Inconsistency: Unlike OpencodeProvider which uses os.getenv("OPENCODE_API_BASE", ...), this provider hardcodes the URL (with the wrong domain). Consider adding KILOCODE_API_BASE environment variable support for consistency and flexibility.

api_base = os.getenv("KILOCODE_API_BASE", "https://kilocode.ai/api/openrouter").rstrip("/") response = await client.get( f"{api_base}/models", ... )

- Add consecutive quota failure tracking per credential in RotatingClient - Add ClassifiedError quota_value and quota_id fields for Gemini errors - Add Inception Labs error classification and backoff configuration - Add AsyncResilientStateWriter for atomic async state persistence - Add strip_provider option to endpoint configuration - Improve provider_config exception logging for debugging Includes refactoring in litellm_providers.py and formatting cleanup. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

mirrobot-agent

Follow-up Review

I've analyzed the incremental changes in this PR. While the diff shows some good improvements (quota failure tracking, Inception Labs support, better error classification), a previously identified high-priority issue remains unaddressed:

Unaddressed Issue from Previous Reviews

Missing NVIDIA Fix in Streaming Path

Location: src/rotator_library/client.py around line 3346-3350 (streaming request handling)
Issue: The nvidia provider fix (setting custom_llm_provider to nvidia_nim and stripping the model prefix) exists in the non-streaming path (lines 2320-2324) but is missing from the streaming path. Only qwen_code handling is present.
Impact: Inconsistent behavior between streaming and non-streaming requests for the nvidia provider
Required Fix: Add the following after line 3350 in the streaming path:

# If the provider is 'nvidia', set the custom provider to 'nvidia_nim'
# and strip the prefix from the model name for LiteLLM.
if provider == "nvidia":
    litellm_kwargs["custom_llm_provider"] = "nvidia_nim"
    litellm_kwargs["model"] = model.split("/", 1)[1]

# Inception Labs also requires model name without prefix
if provider == "inception":
    litellm_kwargs["model"] = model.rsplit("/", 1)[1]

New Changes Assessment

The incremental diff shows valuable additions:

Quota failure tracking with consecutive failure counters and automatic rotation
Inception Labs provider support with proper error handling
Enhanced error classification for Google/Gemini API errors with quota details
Improved streaming error handling for mid-stream failures

Previously Identified Critical Issues

Note: The critical Kilocode provider domain bug (kilo.ai vs kilocode.ai) and configuration inconsistency (missing KILOCODE_API_BASE support) were identified in earlier reviews. While not part of this incremental diff, these must still be addressed before merging.

This follow-up review was generated by an AI assistant.

src/proxy_app/provider_urls.py

src/rotator_library/client.py

…reaming retry

coderabbitai · 2026-03-05T18:09:19Z

📝 Walkthrough

Summary by CodeRabbit

New Features
- Kilocode, Trybons and Opencode provider integrations; new provider discovery
- Batched background persistence, HTTP client pool, credential weight cache, and async locking primitives
- Per-credential cooldowns and consolidated token manager support
Improvements
- Faster Anthropic-compatible streaming and streaming normalization; improved token/max_tokens calculation and request sanitization
- Circuit breaker and IP-throttle detection for more resilient provider handling
Documentation
- README badges simplified; CONTRIBUTING removed
Chores
- Updated .gitignore and added Windows start script

Walkthrough

Adds multiple new subsystems (async locks, HTTP client pool, persistence, token calculator, circuit breaker, IP-throttle detection, credential weight cache), new providers, refactors token managers, expands error/throttle handling, adjusts logging and config defaults, and updates repository docs/.gitignore.

Changes

Cohort / File(s)	Summary
Repo config & docs `\.gitignore`, `README.md`, `CONTRIBUTING.md`, `start_proxy.bat`	`.gitignore` updated (removed `start_proxy.bat`, added agentic directories); README badges trimmed; `CONTRIBUTING.md` deleted; `start_proxy.bat` added.
Root exports `src/rotator_library/__init__.py`	Expanded lazy-loaded public exports via `__getattr__` and extended `__all__` to expose many new modules/classes.
Config defaults & exports `src/rotator_library/config/__init__.py`, `src/rotator_library/config/defaults.py`	Added circuit-breaker and IP-throttle defaults, Kilocode backoff defaults, provider overrides, and is_cooldown_disabled(env) helper; new constants exported.
Async primitives `src/rotator_library/async_locks.py`	New async locking utilities: `ReadWriteLock`, `AsyncSemaphore`, `RateLimitedLock`, `LazyLock`, lock-pool/striping helpers.
Batched persistence & resilient I/O `src/rotator_library/batched_persistence.py`, `src/rotator_library/utils/resilient_io.py`	Added `BatchedPersistence` and `UsagePersistenceManager` for background batched writes; added `AsyncResilientStateWriter` for atomic async writes.
Credential weight cache `src/rotator_library/credential_weight_cache.py`	New `CredentialWeightCache` (TTL, invalidation, warmup) with singleton accessor `get_weight_cache`.
Cooldown & sharded locks `src/rotator_library/cooldown_manager.py`	CooldownManager reworked to per-credential cooldowns with per-provider lock sharding; method parameter renamed to `credential`.
Circuit breaker `src/rotator_library/circuit_breaker.py`	New async `ProviderCircuitBreaker` with per-provider circuit states, thresholds, recoveries and query/reset APIs.
IP throttle detection `src/rotator_library/ip_throttle_detector.py`	New `IPThrottleDetector` that buffers 429s, hashes error bodies, correlates across credentials, and returns `ThrottleAssessment`. Singleton accessors added.
Error handling & backoff `src/rotator_library/error_handler.py`	Major expansion: unified 429 handling, `ThrottleAction`/types, streaming error classifiers (`is_provider_abort`, `classify_stream_error`), provider backoff configs, `AllProviders` helper, extended `ClassifiedError`, and `get_retry_backoff`.
HTTP client pool & timeouts `src/rotator_library/http_client_pool.py`, `src/rotator_library/timeout_config.py`	Added `HttpClientPool` singleton (streaming vs non-streaming clients, warmup, health/recover, stats); reduced pool timeout default from 60s to 15s.
Token calculation `src/rotator_library/token_calculator.py`	New token-calculation utilities: context windows, token counting, `calculate_max_tokens`, and `adjust_max_tokens_in_payload`.
Request sanitization `src/rotator_library/request_sanitizer.py`	Sanitizer enhanced for Kilocode unsupported params; signature changed to return `(payload, should_reject)`.
Anthropic streaming & translator `src/rotator_library/anthropic_compat/streaming.py`, `src/rotator_library/anthropic_compat/streaming_fast.py`, `src/rotator_library/anthropic_compat/translator.py`, `src/rotator_library/anthropic_compat/__init__.py`	Added high-performance `streaming_fast` (exposed as `anthropic_streaming_wrapper`), orjson fallback, `precomputed_input_tokens` param; translator updated to use empty-string content instead of `None`.
Providers: new & updates `src/rotator_library/providers/*`, `src/proxy_app/provider_urls.py`	Added `KilocodeProvider`, `OpencodeProvider`, `TrybonsProvider`; added kilocode/trybons to provider URL map; many providers updated (thinking/reasoning support for iFlow, logging propagation removals, NVIDIA updates).
Provider metadata & config `src/rotator_library/litellm_providers.py`, `src/rotator_library/provider_config.py`	Normalized/expanded provider metadata, added defaults for custom providers, and exposed `api_bases`, `is_custom_provider`, `get_api_base` on `ProviderConfig`.
Background & token manager refactor `src/rotator_library/background_refresher.py`, `src/rotator_library/providers/base_token_manager.py`, `src/rotator_library/providers/{google_oauth_base,iflow_auth_base,qwen_auth_base}.py`	Parallelized provider credential initialization; introduced `BaseTokenManager` and made Google/Qwen/IFlow auth classes inherit and delegate to it.
Utilities & patches `src/rotator_library/utils/__init__.py`, `src/rotator_library/utils/patch_litellm_finish_reason.py`, `src/rotator_library/utils/suppress_litellm_warnings.py`	Exposed `AsyncResilientStateWriter`, added patch to normalize LiteLLM `finish_reason`, and added `suppress_litellm_prints()` context manager.
Logging cleanup many `src/rotator_library/providers/*`, `src/rotator_library/model_definitions.py`	Removed explicit `lib_logger.propagate = False` and conditional `NullHandler` insertions across multiple modules, restoring default logger propagation.
Misc & small fixes assorted files	Minor refactors: parallelized credential file loads, qwen chunk emission adjustment, timeout tweaks, README badge trimming, and other small edits.

Sequence Diagram(s)

sequenceDiagram
    participant Client as Client
    participant Handler as ErrorHandler
    participant Detector as IPThrottleDetector
    participant CB as ProviderCircuitBreaker
    participant Cred as CooldownManager

    Client->>Handler: report 429(provider, credential, body)
    Handler->>Detector: record_429(provider, credential, body, retry_after)
    Detector-->>Handler: ThrottleAssessment(scope, confidence, cooldown)
    Handler->>CB: record_ip_throttle(...) / open_immediately(...)
    Handler->>Cred: start_cooldown(credential, duration)
    Handler-->>Client: ThrottleAction(result)

sequenceDiagram
    participant App as Application
    participant Pool as HttpClientPool
    participant Warm as WarmupTask
    participant ClientHTTP as httpx.AsyncClient

    App->>Pool: get_http_pool()/initialize(warmup_hosts)
    Pool->>Pool: create streaming & non-streaming clients
    Pool->>Warm: warmup_connections(hosts)
    Warm->>ClientHTTP: probe host
    ClientHTTP-->>Warm: response
    Warm-->>Pool: warmup complete
    App->>Pool: get_client_async(streaming=True)
    Pool-->>App: AsyncClient

sequenceDiagram
    participant SSE as OpenAI SSE stream
    participant Wrapper as anthropic_streaming_wrapper_fast
    participant Proc as StateProcessor
    participant Out as Anthropic SSE output

    SSE->>Wrapper: stream chunk
    Wrapper->>Proc: parse chunk (delta/usage/tool)
    Proc->>Out: emit Anthropic SSE events (message_start / content deltas / tool_use / final usage)
    Out-->>Client: Anthropic SSE messages

Estimated Code Review Effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Poem

🐇 I hopped through circuits, locks, and streams,

cached weights and tended async dreams,
I sniffed for 429s across the net,
wrote state in batches, warmed connections yet,
kilocode and opencode joined the team — hop, hooray!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The description covers the main changes (new KilocodeProvider class, modified provider_urls.py and provider_config.py), but lacks Testing Done section and checklist items which are required by the template.	Complete the Testing Done section with specific test results and check all required items in the checklist, including license headers, documentation updates, and issue references.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'Add Kilocode provider support' directly and clearly describes the main objective of the PR, which is to add a new provider implementation.
Docstring Coverage	✅ Passed	Docstring coverage is 86.52% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

greptile-apps · 2026-03-05T18:14:26Z

src/rotator_library/providers/kilocode_provider.py

+        """
+        try:
+            response = await client.get(
+                "https://kilo.ai/api/openrouter/models",


Wrong domain in endpoint URL

Line 31 has https://kilo.ai/api/openrouter/models, but the correct domain is kilocode.ai (not kilo.ai). This will cause the models endpoint to fail.

Suggested change

"https://kilo.ai/api/openrouter/models",

response = await client.get(

"https://kilocode.ai/api/openrouter/models",

headers={"Authorization": f"Bearer {api_key}"},

)

greptile-apps · 2026-03-05T18:14:27Z

src/rotator_library/providers/kilocode_provider.py

+    async def get_models(self, api_key: str, client: httpx.AsyncClient) -> List[str]:
+        """
+        Fetches the list of available models from the Kilocode API.
+        """
+        try:
+            response = await client.get(
+                "https://kilo.ai/api/openrouter/models",
+                headers={"Authorization": f"Bearer {api_key}"},
+            )


Hardcoded URL ignores KILOCODE_API_BASE environment variable

The get_models method hardcodes the base URL instead of respecting the KILOCODE_API_BASE environment variable that is already configured in provider_config.py. For consistency with OpencodeProvider and to allow users to override the endpoint, use the env var with a sensible default:

Suggested change

async def get_models(self, api_key: str, client: httpx.AsyncClient) -> List[str]:

"""

Fetches the list of available models from the Kilocode API.

"""

try:

response = await client.get(

"https://kilo.ai/api/openrouter/models",

headers={"Authorization": f"Bearer {api_key}"},

)

async def get_models(self, api_key: str, client: httpx.AsyncClient) -> List[str]:

"""

Fetches the list of available models from the Kilocode API.

"""

import os

api_base = os.getenv("KILOCODE_API_BASE", "https://kilocode.ai/api/openrouter").rstrip("/")

try:

response = await client.get(

f"{api_base}/models",

headers={"Authorization": f"Bearer {api_key}"},

)

greptile-apps · 2026-03-05T18:14:31Z

src/rotator_library/providers/kilocode_provider.py

+        try:
+            response = await client.get(
+                "https://kilo.ai/api/openrouter/models",
+                headers={"Authorization": f"Bearer {api_key}"},
+            )
+            response.raise_for_status()
+            return [
+                f"kilocode/{model['id']}" for model in response.json().get("data", [])
+            ]
+        except httpx.RequestError as e:
+            lib_logger.error(f"Failed to fetch Kilocode models: {e}")
+            return []


Uncaught httpx.HTTPStatusError exception

response.raise_for_status() (line 34) raises httpx.HTTPStatusError for non-2xx HTTP responses (401, 429, 500, etc.). However, only httpx.RequestError is caught on line 38. HTTPStatusError is not a subclass of RequestError — it descends from HTTPError via a different branch. Any HTTP error response will propagate as an unhandled exception instead of being caught and returning an empty list.

Suggested change

try:

response = await client.get(

"https://kilo.ai/api/openrouter/models",

headers={"Authorization": f"Bearer {api_key}"},

)

response.raise_for_status()

return [

f"kilocode/{model['id']}" for model in response.json().get("data", [])

]

except httpx.RequestError as e:

lib_logger.error(f"Failed to fetch Kilocode models: {e}")

return []

try:

response = await client.get(

"https://kilocode.ai/api/openrouter/models",

headers={"Authorization": f"Bearer {api_key}"},

)

response.raise_for_status()

return [

f"kilocode/{model['id']}" for model in response.json().get("data", [])

]

except (httpx.RequestError, httpx.HTTPStatusError) as e:

lib_logger.error(f"Failed to fetch Kilocode models: {e}")

return []

- Extract common token refresh queue state into BaseTokenManager (shared by GoogleOAuthBase, IFlowAuthBase, QwenAuthBase) - Extract _prepare_request_context method in RotatingClient to eliminate code duplication between streaming and non-streaming paths - Switch anthropic_compat to streaming_fast module - Mark legacy streaming.py as deprecated - Remove CONTRIBUTING.md (documentation cleanup) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

greptile-apps · 2026-03-06T03:39:20Z

src/rotator_library/providers/kilocode_provider.py

+            response = await client.get(
+                "https://kilo.ai/api/openrouter/models",
+                headers={"Authorization": f"Bearer {api_key}"},


Wrong domain in models endpoint

The domain is kilo.ai but should be kilocode.ai. This will cause requests to the models endpoint to fail.

Suggested change

response = await client.get(

"https://kilo.ai/api/openrouter/models",

headers={"Authorization": f"Bearer {api_key}"},

response = await client.get(

"https://kilocode.ai/api/openrouter/models",

greptile-apps · 2026-03-06T03:39:21Z

src/rotator_library/providers/kilocode_provider.py

+    async def get_models(self, api_key: str, client: httpx.AsyncClient) -> List[str]:
+        """
+        Fetches the list of available models from the Kilocode API.
+        """
+        try:
+            response = await client.get(
+                "https://kilo.ai/api/openrouter/models",
+                headers={"Authorization": f"Bearer {api_key}"},
+            )
+            response.raise_for_status()
+            return [


Hardcoded URL ignores KILOCODE_API_BASE environment variable

The get_models method hardcodes the base URL instead of respecting the KILOCODE_API_BASE environment variable that is configured in provider_config.py. For consistency with other providers like OpencodeProvider, use the env var with a sensible default:

Suggested change

async def get_models(self, api_key: str, client: httpx.AsyncClient) -> List[str]:

"""

Fetches the list of available models from the Kilocode API.

"""

try:

response = await client.get(

"https://kilo.ai/api/openrouter/models",

headers={"Authorization": f"Bearer {api_key}"},

)

response.raise_for_status()

return [

async def get_models(self, api_key: str, client: httpx.AsyncClient) -> List[str]:

"""

Fetches the list of available models from the Kilocode API.

"""

import os

api_base = os.getenv("KILOCODE_API_BASE", "https://kilocode.ai/api/openrouter").rstrip("/")

try:

response = await client.get(

f"{api_base}/models",

headers={"Authorization": f"Bearer {api_key}"},

)

greptile-apps · 2026-03-06T03:39:22Z

src/rotator_library/providers/kilocode_provider.py

+                f"kilocode/{model['id']}" for model in response.json().get("data", [])
+            ]
+        except httpx.RequestError as e:


Uncaught httpx.HTTPStatusError exception

response.raise_for_status() raises httpx.HTTPStatusError for non-2xx HTTP responses (401, 429, 500, etc.), but only httpx.RequestError is caught on line 38. HTTPStatusError is not a subclass of RequestError — they descend from different branches of the exception hierarchy. Any HTTP error response will propagate as an unhandled exception instead of being caught gracefully.

Suggested change

f"kilocode/{model['id']}" for model in response.json().get("data", [])

]

except httpx.RequestError as e:

except (httpx.RequestError, httpx.HTTPStatusError) as e:

lib_logger.error(f"Failed to fetch Kilocode models: {e}")

return []

greptile-apps · 2026-03-06T03:39:27Z

src/proxy_app/main.py

+from dotenv import load_dotenv
+from glob import glob


Unused glob import

from glob import glob is imported here but never called as a function. All .env file discovery now uses _root_dir.glob("*.env"), which is Path.glob() — a method on the pathlib.Path object, not the glob module function. Remove the unused stdlib import.

Suggested change

from dotenv import load_dotenv

from glob import glob

from dotenv import load_dotenv

greptile-apps · 2026-03-06T05:14:01Z

src/rotator_library/request_sanitizer.py

+        model_without_prefix = normalized_model.split("/", 1)[1] if "/" in normalized_model else normalized_model
+
+        # Free models have stricter parameter requirements
+        if ":free" in model_without_prefix or model_without_prefix.startswith("z-ai/"):


Overly broad z-ai/ parameter stripping

The comment says "Free models have stricter parameter requirements," but the startswith("z-ai/") branch catches all z-ai/ models — including any future non-free z-ai models. This means parameters like top_p, frequency_penalty, and stop would be silently dropped for a paid kilocode/z-ai/some-model even if that model fully supports them.

Since all currently listed z-ai models already contain :free (e.g. z-ai/glm-4.7:free), the startswith("z-ai/") guard is redundant in the happy path but will silently misbehave when a non-free z-ai variant is introduced. Consider removing the overly broad prefix check and relying solely on the :free suffix to stay consistent with the comment:

Suggested change

if ":free" in model_without_prefix or model_without_prefix.startswith("z-ai/"):

if ":free" in model_without_prefix:

greptile-apps · 2026-03-06T05:14:03Z

src/rotator_library/error_handler.py

+    Supports custom OpenAI-compatible providers via PROVIDERNAME_API_BASE env vars.
+
+    Usage:
+        export KILOCODE_API_BASE=https://kilo.ai/api/openrouter


Wrong domain in docstring example

The example command uses kilo.ai instead of the correct kilocode.ai. Any user who copies this snippet verbatim will configure a non-working endpoint.

Suggested change

export KILOCODE_API_BASE=https://kilo.ai/api/openrouter

export KILOCODE_API_BASE=https://kilocode.ai/api/openrouter

Add TryBons provider (go.trybons.ai) that speaks native Anthropic Messages API. Routes through LiteLLM's anthropic support with custom api_base, so no message conversion is needed. Uses x-api-key auth and balanced key rotation by default. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…king support Extract constants for DeepSeek thinking models and unsupported Anthropic fields. Add _sanitize_messages() to strip cache_control and collapse list-format system messages. Add _strip_anthropic_fields() for top-level Anthropic params (thinking, betas). Improve handle_thinking_parameter() with per-model-family logic and better logging. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…ation Add INCEPTION_BACKOFF_BASE and INCEPTION_MAX_BACKOFF env var overrides for inception provider backoff config. Classify inception connection errors with 'server had an error' message as 503 with 5s retry_after for better transient overload recovery. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…rors Before rotating to a new credential on transient server errors during streaming, retry the same key with exponential backoff if budget allows. This avoids unnecessary key rotation for recoverable mid-stream failures. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

greptile-apps · 2026-03-06T20:17:03Z

src/rotator_library/providers/kilocode_provider.py

+    async def get_models(self, api_key: str, client: httpx.AsyncClient) -> List[str]:
+        """
+        Fetches the list of available models from the Kilocode API.
+        """
+        try:
+            response = await client.get(
+                "https://kilo.ai/api/openrouter/models",
+                headers={"Authorization": f"Bearer {api_key}"},
+            )
+            response.raise_for_status()
+            return [
+                f"kilocode/{model['id']}" for model in response.json().get("data", [])
+            ]
+        except httpx.RequestError as e:
+            lib_logger.error(f"Failed to fetch Kilocode models: {e}")
+            return []


Wrong domain in get_models endpoint and hardcoded URL ignores KILOCODE_API_BASE env var

Line 31 uses https://kilo.ai/api/openrouter/models (wrong domain — should be kilocode.ai), and the URL is hardcoded instead of respecting the KILOCODE_API_BASE environment variable that is already configured in provider_config.py.

This breaks model discovery and ignores user overrides. For consistency with OpencodeProvider, use:

import os api_base = os.getenv("KILOCODE_API_BASE", "https://kilocode.ai/api/openrouter").rstrip("/") response = await client.get( f"{api_base}/models", headers={"Authorization": f"Bearer {api_key}"}, )

greptile-apps · 2026-03-06T20:17:04Z

src/rotator_library/providers/kilocode_provider.py

+        try:
+            response = await client.get(
+                "https://kilo.ai/api/openrouter/models",
+                headers={"Authorization": f"Bearer {api_key}"},
+            )
+            response.raise_for_status()
+            return [
+                f"kilocode/{model['id']}" for model in response.json().get("data", [])
+            ]
+        except httpx.RequestError as e:
+            lib_logger.error(f"Failed to fetch Kilocode models: {e}")
+            return []


Uncaught httpx.HTTPStatusError exception

response.raise_for_status() (line 34) raises httpx.HTTPStatusError for non-2xx responses, but only httpx.RequestError is caught on line 38. HTTPStatusError is a sibling exception (both descend from HTTPError via different branches), so it will propagate uncaught for any API errors (401, 429, 500, etc.).

Add httpx.HTTPError to the except clause:

except (httpx.RequestError, httpx.HTTPError) as e:

greptile-apps · 2026-03-06T20:17:08Z

src/proxy_app/main.py

+
+# Load all .env files from root folder (main .env first, then any additional *.env files)
+from dotenv import load_dotenv
+from glob import glob


Unused glob import

Line 72 imports from glob import glob, but this function is never called. Line 85 uses _root_dir.glob("*.env"), which is the Path.glob() method, not the module function. Remove the unused import to clean up the code.

greptile-apps · 2026-03-06T20:17:09Z

src/rotator_library/error_handler.py

+    Supports custom OpenAI-compatible providers via PROVIDERNAME_API_BASE env vars.
+
+    Usage:
+        export KILOCODE_API_BASE=https://kilo.ai/api/openrouter


Wrong domain in docstring example

Line 643 shows export KILOCODE_API_BASE=https://kilo.ai/api/openrouter as an example, but the correct domain is kilocode.ai (not kilo.ai). This will mislead users who copy the snippet verbatim.

Change to:

export KILOCODE_API_BASE=https://kilocode.ai/api/openrouter

coderabbitai

Actionable comments posted: 41

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

src/rotator_library/providers/qwen_code_provider.py (1)
257-277: ⚠️ Potential issue | 🟠 Major

Don’t bypass the normal chunk normalization when usage shares the final SSE event.

Line 257 returns before the <think> → reasoning_content conversion below runs. If Qwen puts usage on the same chunk as the last content delta, the final chunk will leak raw <think> markers and skip any other normalization handled later in this method. Run the delta through the normal conversion path and attach usage to the last emitted normalized chunk instead of short-circuiting here.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/rotator_library/providers/qwen_code_provider.py` around lines 257 - 277,
The current early return inside the SSE handling block bypasses the normal
delta-to-content normalization and causes raw <think> markers to leak when
usage_data arrives with the final delta; instead of yielding a combined chunk
and returning in the block that inspects choices/usage_data (where variables
like choice, delta, finish_reason, chunk_id, chunk_created, model_id are
available), run the delta through the existing normalization/conversion path
used later in this method (the same routine that converts choice["delta"] ->
reasoning_content and applies <think> → reasoning_content transforms), emit the
normalized chunk(s) as usual, and attach the usage object to the last emitted
normalized chunk (preserving finish_reason and ids) rather than short-circuiting
and returning.
src/rotator_library/providers/nvidia_provider.py (1)
39-49: ⚠️ Potential issue | 🟠 Major

Catch HTTPStatusError in get_models().

response.raise_for_status() raises httpx.HTTPStatusError for non-2xx responses, which is not a subclass of httpx.RequestError (both are siblings under httpx.HTTPError). This means 4xx/5xx responses bypass the current exception handler and leak an unhandled exception instead of logging and returning [].
Suggested fix
-        except httpx.RequestError as e:
-            lib_logger.error(f"Failed to fetch NVIDIA models: {e}")
+        except httpx.HTTPError as e:
+            lib_logger.error("Failed to fetch NVIDIA models: %s", e)
             return []
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/rotator_library/providers/nvidia_provider.py` around lines 39 - 49, The
get_models() method currently only catches httpx.RequestError so HTTPStatusError
raised by response.raise_for_status() will escape; update the exception handling
in get_models() to also catch httpx.HTTPStatusError (or httpx.HTTPError)
alongside httpx.RequestError around the client.get()/response.raise_for_status()
call, and in that handler call lib_logger.error with the error details and
return [] to match existing behavior; reference the client.get call,
response.raise_for_status(), httpx.HTTPStatusError/httpx.HTTPError,
lib_logger.error, and the existing return [] to locate and modify the try/except
block.

♻️ Duplicate comments (1)

src/rotator_library/providers/kilocode_provider.py (1)
29-33: ⚠️ Potential issue | 🔴 Critical

Use the configured Kilocode API base here instead of hardcoding kilo.ai.

This is still pointing model discovery at the wrong host and still bypasses the KILOCODE_API_BASE override added elsewhere in the PR, so Kilocode model listing can fail or hit the wrong deployment.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/rotator_library/providers/kilocode_provider.py` around lines 29 - 33, The
model-listing call in kilocode_provider is hardcoded to
"https://kilo.ai/api/openrouter/models" and bypasses the configured base; update
the client.get call in the code that builds the request (the try block where
response = await client.get(...)) to use the configured Kilocode base variable
(e.g., KILOCODE_API_BASE or the existing kilocode_api_base config) and append
the "/api/openrouter/models" path so the request honors the KILOCODE_API_BASE
override while still passing the Authorization header with api_key.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/proxy_app/provider_urls.py`:
- Line 34: Update the TryBons base URL in the provider URLs mapping so requests
hit the /v1 path: change the mapping value for the "trybons" key from
"https://go.trybons.ai" to "https://go.trybons.ai/v1" in the provider_urls.py
mapping (and make the same change for the other TryBons entries present later in
that file), so code in trybons_provider.py that targets /v1/models routes to the
correct endpoint.

In `@src/rotator_library/anthropic_compat/streaming_fast.py`:
- Around line 369-381: The exception handler currently starts a new text content
block without closing any previously opened content/message blocks; update the
except block in streaming_fast.py (the logic around message_started,
current_block_index and the yields of
_make_content_block_start_event/_make_text_delta_event) to first close any open
content blocks and the current message turn: yield
_make_content_block_stop_event(current_block_index) if a block was started, and
yield the appropriate _make_message_delta_event("end_turn", 0, cached_tokens)
and _make_message_stop_event() as needed before emitting the fallback text block
and the error delta; ensure you only open the fallback text block after these
stops so the SSE sequence stays valid.

In `@src/rotator_library/anthropic_compat/translator.py`:
- Around line 15-35: Extract the duplicated JSON shim (_json_dumps, _json_loads,
_json_decode_error) into a single shared helper module (e.g., json_shim or
json_utils) that encapsulates the orjson import/fallback logic and exposes the
same symbols; replace the local definitions in translator.py (functions
_json_dumps, _json_loads and symbol _json_decode_error) and in streaming.py to
import these symbols from the new helper so behavior and exception semantics
remain identical; ensure the helper returns/decodes strings the same way
(orjson.dumps(...).decode('utf-8') vs json.dumps) and re-exports the correct
JSONDecodeError class so existing code that catches _json_decode_error continues
to work.

In `@src/rotator_library/async_locks.py`:
- Around line 272-304: The acquire() method must not release self._lock before
the protected block; instead, move lock acquisition into __aenter__ so the lock
is held across the protected section and release it in __aexit__. Concretely:
remove the async with self._lock: wrapper from acquire() and make acquire() only
perform rate-limit checks/updates on _last_release, _burst_remaining,
_min_interval, _burst_size and _stats (and await sleeps as needed); implement
__aenter__ to await self._lock.acquire() then call await self.acquire() (or
inline the rate-limit logic) so the lock remains held for the critical section;
implement __aexit__ to set _last_release/time and decrement/adjust burst state
as needed and call self._lock.release() (or have release() call
self._lock.release() and update _last_release) so mutual exclusion is enforced.
- Around line 186-211: The release() method must avoid incrementing _value when
directly handing off to a waiter: change logic in release() so that if _waiters
is non-empty you pop a waiter and call waiter.set() without doing self._value +=
1 and without decrementing self._stats["current_concurrent"] (the permit is
transferred), otherwise (no waiters) do self._value += 1 and decrement
self._stats["current_concurrent"] and increment self._stats["releases"]; ensure
references to release(), _value, _waiters, waiter.set(), and
self._stats["current_concurrent"] are updated accordingly so the semaphore
doesn't double-count a permit on direct handoff.
- Around line 61-89: The release paths deadlock because release_read() acquires
_read_ready then _write_ready while release_write() does the opposite; fix by
making both Conditions share the same underlying lock or by enforcing a single
global acquisition order so nested acquires never invert. Concretely, create one
asyncio.Lock (or one asyncio.Condition) and initialize both _read_ready and
_write_ready with that same lock (or replace both with a single Condition and
adjust logic), then update release_read, release_write and acquire_write to use
that shared lock/condition so notify_all() calls no longer require nested,
inversed acquisitions; ensure functions named release_read, release_write and
acquire_write reference the shared _lock/_condition.

In `@src/rotator_library/batched_persistence.py`:
- Around line 142-150: The file I/O in async methods _load_from_disk and
_write_to_disk is blocking (uses open(), read(), mkdir(), safe_write_json()), so
move those blocking operations off the event loop by running them in a thread
executor (e.g., asyncio.to_thread or loop.run_in_executor); refactor
_load_from_disk to call a synchronous helper that opens/reads/parses the file
via to_thread and assign _state afterwards, and refactor _write_to_disk to
perform mkdir and safe_write_json inside to_thread (or call a sync helper via
to_thread) so the coroutine yields while disk I/O happens.
- Around line 183-192: The current code serializes state into content via
self._serializer(self._state) but then calls safe_write_json(self._file_path,
self._state if isinstance(self._state, dict) else {"data": self._state}, ...),
which ignores custom serializers and changes the on-disk shape; change the write
so the exact serialized shape is persisted: if content is a
dict/JSON-serializable mapping, pass that dict to safe_write_json, otherwise
perform an atomic write of the raw serialized string (e.g., use existing atomic
file-write helper or extend safe_write_json to accept pre-serialized JSON) so
that what _load_from_disk() reads back matches the original serialized form from
_serializer and no artificial {"data": ...} wrapper is introduced (refer to
_serializer, _state, safe_write_json, and _load_from_disk).
- Around line 165-169: The code uses self._last_change for both freshness and
max_dirty_age, so under steady updates max_dirty_age never triggers; introduce a
separate timestamp (e.g. self._first_dirty_change) that is set only when the
buffer first becomes dirty (set it where the code currently resets _last_change
on every update — the block around the update at lines ~219-221) and do not
overwrite it on subsequent updates; then change the flush check in the block
that currently reads self._last_change (the if block around lines 165-169) to
compute age_since_first = time.time() - self._first_dirty_change and compare
max_dirty_age against age_since_first while keeping the write_interval
comparison against time.time() - self._last_change; finally, reset
self._first_dirty_change = None whenever you successfully write/clear the dirty
state in _write_to_disk().

In `@src/rotator_library/circuit_breaker.py`:
- Around line 187-205: When transitioning from OPEN to HALF_OPEN in
can_attempt(), the first probe isn't counted because you set
circuit.half_open_attempts = 0 then return True, allowing one extra probe;
change the transition to initialize circuit.half_open_attempts = 1 so the probe
that triggered the transition is counted (update both the OPEN->HALF_OPEN block
in can_attempt() where CircuitState.OPEN is handled and the equivalent block
around lines 215-224). Ensure any logging remains the same and that the rest of
the HALF_OPEN handling (checks against max_half_open_attempts) uses this
incremented count consistently.

In `@src/rotator_library/config/defaults.py`:
- Around line 203-220: The current is_cooldown_disabled function builds env var
names by uppercasing the raw provider (e.g., DISABLE_COOLDOWN_NANO-GPT) which
produces hyphens and other invalid shell identifier characters; update
is_cooldown_disabled to normalize the provider before composing the env var name
by replacing any non-alphanumeric characters with underscores and then
uppercasing (e.g., use re.sub(r'[^0-9A-Za-z]', '_', provider).upper()), add the
necessary import (re), and then call os.environ.get with the normalized key so
DISABLE_COOLDOWN_NANO_GPT and similar toggles work correctly.

In `@src/rotator_library/cooldown_manager.py`:
- Around line 29-41: The current _extract_provider in CooldownManager
incorrectly takes the first segment before '_' (truncating provider names that
themselves contain underscores); update _extract_provider to derive the provider
by splitting from the right on the last underscore (e.g., parts =
credential.rsplit("_", 1) and return parts[0] if len(parts) == 2 else
credential) so credentials like "azure_ai_key_1" yield "azure_ai", or even
better add an optional explicit provider parameter to the CooldownManager
interface and use that when provided (falling back to the new rsplit logic) to
avoid inferring provider names from credential strings.

In `@src/rotator_library/credential_weight_cache.py`:
- Around line 144-149: CachedWeights is being constructed with references to
mutable inputs (weights, credentials, usage_snapshot) in set(), so later
mutations by the caller will mutate the cached entry and break
check_usage_change() invalidation logic; fix it by snapshotting those inputs
before storing them (e.g., shallow copy dicts/lists or deep copy if nested) and
compute total_weight from the copied weights so CachedWeights receives immutable
copies of weights, credentials, and usage_snapshot instead of the original
objects.
- Around line 239-248: The cache-validity check misses credentials that were
removed: in the method using current_usage and cached.usage_snapshot (and fields
like cached.credentials, weights, and self._threshold), add a check that
iterates the keys of cached.usage_snapshot and returns True if any credential
present there is missing from current_usage (i.e., if any cached key not in
current_usage), so the cache is invalidated when credentials disappear.

In `@src/rotator_library/error_handler.py`:
- Around line 1265-1299: The code is incorrectly applying provider-wide (IP)
throttling for router/multiplexing providers (e.g., those in PROXY_PROVIDERS);
update the logic so that 429s from those providers are not escalated to
ThrottleScope.IP: in both handle_429_error() and
classify_429_with_throttle_detection() stop feeding events from providers in
PROXY_PROVIDERS into the shared/provider-level detector (or explicitly
ignore/downgrade an assessment with ThrottleScope.IP to per-credential scope),
and in the block that currently handles assessment.scope == ThrottleScope.IP
(the ThrottleAction with ThrottleActionType.PROVIDER_COOLDOWN and the
circuit_breaker.open_immediately call) add a guard that skips or converts this
provider-wide action when provider is in PROXY_PROVIDERS so router providers
aren’t blocked cluster-wide. Ensure checks reference PROXY_PROVIDERS,
_detect_ip_throttle(), ThrottleScope.IP, handle_429_error(),
classify_429_with_throttle_detection(), and the circuit_breaker usage.

In `@src/rotator_library/http_client_pool.py`:
- Around line 243-270: Ensure client recreation is serialized with the existing
_client_lock: wrap the closed-check and any mutation of self._streaming_client /
self._non_streaming_client inside "async with self._client_lock" in
_ensure_client(streaming: bool) so that initialize() and close() cannot race
with client creation, and increment self._stats["reconnects"] only while holding
the lock; apply the same change to the other similar routine that mutates
clients (the get_client_async/_ensure_client duplicate at the other location) so
no newly created client can be immediately overwritten to None by close().
- Around line 272-295: get_client currently returns ephemeral clients from
_get_lazy_client without storing them, causing socket/resource leaks and missing
HTTP/2 settings; change the lazy-path so that when
_get_lazy_client(streaming=True/False) is used its result is assigned to the
corresponding attribute (_streaming_client or _non_streaming_client) before
being returned, or refactor _get_lazy_client to call _create_client and set the
attribute internally; ensure close() only needs to clean up those tracked
attributes and that the created client uses _create_client so HTTP/2/config is
applied.
- Around line 227-241: _is_client_closed currently inspects the private
client._client attribute (HTTPX implementation detail); replace that check to
use the public closed-state API on httpx.AsyncClient (e.g., use getattr(client,
"is_closed", None) and treat True as closed) and, if the public attribute is
missing for the project’s httpx version, fail fast with a clear error
instructing to upgrade or handle the new API rather than relying on a private
attribute; update the function _is_client_closed to use client.is_closed when
present and stop referencing client._client.

In `@src/rotator_library/ip_throttle_detector.py`:
- Around line 239-285: The current IP-level escalation fires whenever
num_unique_credentials >= self.min_credentials even if common_hash is None and
can produce confidence==1.0 for exactly min_credentials (e.g., 2) — change the
IP detection to require a shared error signature and at least two matching error
bodies: only enter the IP branch when common_hash is truthy and
common_hash_count >= 2 (use the existing common_hash and common_hash_count
variables), and when num_unique_credentials == self.min_credentials avoid
forcing confidence to 1.0 (cap the base confidence below 1.0, e.g., min(0.9,
num_unique_credentials / self.min_credentials)) before applying the +0.2/+0.1
boosts; adjust the return from ThrottleAssessment(scope=ThrottleScope.IP, ...)
accordingly so unrelated per-key 429s do not escalate into a full-confidence IP
throttle.

In `@src/rotator_library/litellm_providers.py`:
- Line 79: The Amazon Nova provider entry in SCRAPED_PROVIDERS has an endpoint
string "v1/responses" missing a leading slash; update that array element to
"/v1/responses" so it matches the other slash-prefixed endpoints (locate the
provider entry that contains "endpoints": ["/chat/completions", "v1/responses"]
in litellm_providers.py and change the second item), and also fix the generator
source in the scraper/sync pipeline so the generated file retains the corrected
"/v1/responses" endpoint.

In `@src/rotator_library/providers/iflow_provider.py`:
- Around line 362-368: The current check in the streaming loop inspects content
for complete "<think>" / "</think>" substrings and calls
_parse_thinking_content, but it will miss tags split across chunks (e.g., "<thi"
+ "nk>...") causing thinking content to be emitted as normal; to fix, implement
a small per-stream buffer (e.g., on the provider instance) that appends each
delta.get("content") before scanning, only consume/parse when a full open or
close tag is detected, and pass the reconstructed content into
_parse_thinking_content (preserving finish_reason, model_id, chunk_id,
chunk_created); update the same logic used at the other check around the same
function (the 418-424 area) to use the buffered approach or explicitly document
the limitation if you cannot buffer.
- Around line 331-332: REASONING_START_MARKER is defined mid-class between
methods which is inconsistent with the other constants (HARDCODED_MODELS,
SUPPORTED_PARAMS, THINKING_SUPPORTED_MODELS, REASONING_BUDGETS); move
REASONING_START_MARKER to the top alongside those module/class-level constants
and update any references that use self.REASONING_START_MARKER to reference the
module/class constant directly (REASONING_START_MARKER) so parsing logic for the
"<think>" tags uses the unified constant placement.
- Around line 435-438: The _parse_thinking_content method includes an unused
delta parameter; remove delta from the method signature (def
_parse_thinking_content(self, content: str, finish_reason: Any, model_id: str,
chunk_id: str, chunk_created: int)) and update every call site that passes delta
(the callers that currently call _parse_thinking_content(..., delta, ...)) to
stop passing that argument and pass the remaining parameters in the new order;
also update any type hints or docstrings referencing delta to keep signatures
consistent (search for _parse_thinking_content and adjust its callers
accordingly).
- Around line 468-476: The yielded chunk currently always includes finish_reason
which causes intermediate chunks to appear terminal; update the generator in
iflow_provider.py so that finish_reason is only included on the final yielded
chunk (use the loop index or a boolean is_last when splitting content), i.e. set
"finish_reason": finish_reason if is_last else None (or omit the key when not
last) in the dict that currently uses new_delta, model_id, chunk_id,
chunk_created; ensure new_delta is unchanged and only the finish_reason field is
conditional.
- Around line 284-287: The code sets payload["thinking"] = {"type": "enabled",
"budget_tokens": -1} using an undocumented sentinel; change this so we no longer
send budget_tokens: -1—either omit the budget_tokens field entirely to let the
upstream API use its default (i.e., set payload["thinking"] = {"type":
"enabled"}) or set budget_tokens to an explicit, documented positive integer
less than the model's max tokens if you need to enforce a budget; update the
branch around payload["thinking"] (see payload["thinking"] and lib_logger.debug
referencing model_id) accordingly and remove the -1 sentinel from all places
that construct thinking payloads.

In `@src/rotator_library/providers/kilocode_provider.py`:
- Around line 34-40: The exception handler in the Kilocode model fetch block
only catches httpx.RequestError but response.raise_for_status() raises
httpx.HTTPStatusError; update the except clause in the function that calls
response.raise_for_status() (the Kilocode model discovery block in
kilocode_provider.py) to catch httpx.HTTPError (the parent of HTTPStatusError
and RequestError) so that 4xx/5xx responses are handled, keep the existing
lib_logger.error(...) logging and return [] fallback behavior.

In `@src/rotator_library/providers/nvidia_provider.py`:
- Around line 157-170: The provider currently strips Anthropic "thinking" before
checking reasoning_effort, losing the signal; change the logic so you detect
Anthropic's thinking before calling _strip_anthropic_fields (or preserve the
original payload) and if model_name is in _NVIDIA_DEEPSEEK_THINKING_MODELS and
payload has a "thinking" object but no reasoning_effort, set
payload["reasoning_effort"] = "medium" (or another chosen default) so the later
block that sets extra_body.chat_template_kwargs.thinking=True still runs; update
the code around _strip_anthropic_fields, _sanitize_messages and the
reasoning_effort check to reflect this fallback.

In `@src/rotator_library/providers/opencode_provider.py`:
- Around line 52-63: The code calls response.raise_for_status() after client.get
which can raise httpx.HTTPStatusError (not a subclass of httpx.RequestError), so
update the exception handling around the client.get / response.raise_for_status
block to catch httpx.HTTPStatusError as well (or catch httpx.HTTPError) and log
the error via lib_logger.error including the exception (same pattern used for
httpx.RequestError), then return an empty list; target the try/except
surrounding client.get, response.raise_for_status, and the list comprehension
that builds f"opencode/{model['id']}" so non-2xx responses fall back to [].

In `@src/rotator_library/request_sanitizer.py`:
- Around line 51-76: The sanitizer is using normalized_model (which may include
an outer provider prefix like "kilocode/" or "openrouter/") for capability
checks, causing misclassification; change the logic to compute an inner_model by
stripping a single leading provider prefix (if present) — e.g., inner_model =
normalized_model.split("/",1)[1] when "/" in normalized_model else
normalized_model — and then use inner_model for the "dimensions" check, the
thinking-enabled model whitelist, and the Kilocode free-model detection (replace
existing uses of normalized_model and the ad-hoc model_without_prefix with
inner_model); keep payload, KILOCODE_UNSUPPORTED_PARAMS, auto_adjust_max_tokens
and adjust_max_tokens_in_payload behavior unchanged.

In `@src/rotator_library/timeout_config.py`:
- Line 33: The module docstring's description of the pool timeout is
inconsistent with the runtime default: update the module-level docstring to
reflect the new default of 15 seconds so it matches the _POOL constant (symbol:
_POOL) and any env-var documentation in the docstring; ensure the text that
mentions the default timeout value is changed from "60s" to "15s" and keep
wording consistent with the _POOL assignment and any related env var
explanation.

In `@src/rotator_library/token_calculator.py`:
- Around line 116-121: The model normalization currently strips suffixes then
matches fallback patterns in an order that causes a generic key like "gpt-4" to
match before more specific aliases (e.g., "gpt-4-turbo" or "gpt-4-0125"), so
alias models get the wrong window; fix by matching explicit/specific aliases
first: in the matching logic that looks up model names (the code that iterates
over your mapping/patterns around the region handling model -> window,
referencing the variable model and the model-to-window mapping/pattern list),
either (A) check exact/explicit aliases first (look for full-name keys like
"gpt-4-turbo" or entries in an alias dict) before applying generic regexes, or
(B) sort the fallback pattern keys by specificity (e.g., descending length or
number of tokens) so longer/more specific patterns are tested first; apply the
same change in the other matching block noted (the second model->window match
region) to ensure gpt-4-turbo and other explicit aliases return their correct
token window.
- Around line 288-297: The current logic returns MIN_MAX_TOKENS even when
available_for_output is <= 0 which will create impossible requests; change the
block around available_for_output so that if available_for_output <= 0 you fail
fast (log an error via logger.error and return 0 with an error code like
"input_exceeds_context_no_budget" or raise a ValueError), and only when 0 <
available_for_output < MIN_MAX_TOKENS return the true remaining budget
(available_for_output) with the existing "input_exceeds_context_minimal_output"
code; keep the logger message but use logger.error for the no-budget case and
logger.warning for the capped-but-positive case.

In `@src/rotator_library/utils/patch_litellm_finish_reason.py`:
- Around line 130-134: Replace the bare logger.error calls that handle generic
exceptions with logger.exception so stack traces are preserved: in
src/rotator_library/utils/patch_litellm_finish_reason.py locate the except
Exception handlers around the LiteLLM patch operations (the block that logs
"Unexpected error patching LiteLLM Choices" and the similar handlers referenced
at the other occurrences around the patching logic at the noted locations) and
change logger.error(...) to logger.exception(...); do this for each identical
catch-all except block (including the ones around the Choices patch and the
other two occurrences) so the traceback is included in the logs.
- Around line 154-165: The patch replaces ChatCompletionChunk.model_validate
with a staticmethod which breaks the original classmethod descriptor and risks
incorrect unpatching; instead, save the original descriptor (store
_original_chat_completion_chunk_model_validate =
ChatCompletionChunk.model_validate), then define patched_model_validate as a
classmethod that accepts (cls, obj, *args, **kwargs), calls
_normalize_chunk_data(obj) and forwards the call to the original via the
descriptor (so it is invoked with proper class binding), assign
ChatCompletionChunk.model_validate = classmethod(patched_model_validate) and
when unpatching restore ChatCompletionChunk.model_validate =
_original_chat_completion_chunk_model_validate to preserve original behavior.

In `@src/rotator_library/utils/resilient_io.py`:
- Around line 688-706: The AsyncResilientStateWriter has three issues to fix:
add the missing return type annotation "-> None" to the __init__ method; replace
the Russian docstrings on AsyncResilientStateWriter, schedule_write,
_delayed_flush (and any other Russian docstrings in that class) with English
equivalents to match the file; and narrow the bare "except Exception" in
_delayed_flush to a filesystem-related exception (e.g., except OSError or except
(OSError, IOError)) to match the sync ResilientStateWriter behavior so you don't
swallow programming errors.
- Around line 688-697: AsyncResilientStateWriter currently holds pending state
in _pending and an async _flush_task but never registers with
BufferedWriteRegistry, so if close() (or task cancellation) isn't awaited
pending writes can be lost; update AsyncResilientStateWriter (constructor and
close()/flush/failure paths) to register pending data with BufferedWriteRegistry
as a synchronous fallback when disk writes fail or when the event loop is
shutting down (e.g., on exception/cancel), ensuring _pending is handed to
BufferedWriteRegistry and _flush_task is canceled/cleared; alternatively, add
clear documentation on requiring callers to await close()—mention
AsyncResilientStateWriter, close(), _pending, _flush_task and
BufferedWriteRegistry in your changes.
- Around line 710-733: The flush() method performs blocking filesystem ops
(file_path.with_suffix(...).write_text, replace, parent.mkdir, write_text)
inside an async function, which will block the event loop; change those sync
calls to run in background threads via asyncio.to_thread: call await
asyncio.to_thread(...) for temp_path.write_text, temp_path.replace,
self.file_path.parent.mkdir, and the fallback self.file_path.write_text, and
ensure asyncio is imported; keep the same try/except recovery logic and the use
of self._lock, _pending, temp_path = self.file_path.with_suffix('.tmp') and
preserve JSON serialization behavior.
- Around line 719-733: The current ResilientStateWriter write path uses a
predictable temp path (self.file_path.with_suffix('.tmp')) and the exception
recovery writes directly to the final path, risking collisions and non-atomic
writes; change the implementation in the write routine (the block around
temp_path creation and the exception handler in ResilientStateWriter.write or
the method containing that try/except) to create a unique temp file in the same
directory using tempfile.mkstemp or
tempfile.NamedTemporaryFile(dir=self.file_path.parent, delete=False), write the
JSON to that unique temp file, fsync the file descriptor and parent directory,
then atomically replace the target using os.replace; in the exception/recovery
branch ensure the parent directory is created first and perform the same
unique-temp -> fsync -> os.replace sequence (not a direct write to the target)
so recovery remains atomic and avoids predictable-temp collisions.

In `@start_proxy.bat`:
- Around line 2-7: The batch file contains non-ASCII Russian text (e.g., the
echo string "LLM API Key Proxy - Запуск" and other lines) and must be saved
using UTF-8 with a BOM so the console displays those characters correctly after
running chcp 65001; reopen start_proxy.bat in your editor/IDE and resave it as
"UTF-8 with BOM" (or "UTF-8 signature") to add the BOM, then verify the file
still contains the lines with chcp 65001 and the echo statements unchanged.
- Around line 1-23: Replace the direct "python src/proxy_app/main.py --host
127.0.0.1 --port 8000" invocation with a Windows-friendly launcher: use "py -3
src/proxy_app/main.py --host 127.0.0.1 --port 8000" to ensure the correct Python
installation is used; alternatively, add a pre-check using "where python" and
bail with a clear message if Python is not found before invoking the existing
"python src/proxy_app/main.py --host 127.0.0.1 --port 8000" command so the
script works on systems without python on PATH.

---

Outside diff comments:
In `@src/rotator_library/providers/nvidia_provider.py`:
- Around line 39-49: The get_models() method currently only catches
httpx.RequestError so HTTPStatusError raised by response.raise_for_status() will
escape; update the exception handling in get_models() to also catch
httpx.HTTPStatusError (or httpx.HTTPError) alongside httpx.RequestError around
the client.get()/response.raise_for_status() call, and in that handler call
lib_logger.error with the error details and return [] to match existing
behavior; reference the client.get call, response.raise_for_status(),
httpx.HTTPStatusError/httpx.HTTPError, lib_logger.error, and the existing return
[] to locate and modify the try/except block.

In `@src/rotator_library/providers/qwen_code_provider.py`:
- Around line 257-277: The current early return inside the SSE handling block
bypasses the normal delta-to-content normalization and causes raw <think>
markers to leak when usage_data arrives with the final delta; instead of
yielding a combined chunk and returning in the block that inspects
choices/usage_data (where variables like choice, delta, finish_reason, chunk_id,
chunk_created, model_id are available), run the delta through the existing
normalization/conversion path used later in this method (the same routine that
converts choice["delta"] -> reasoning_content and applies <think> →
reasoning_content transforms), emit the normalized chunk(s) as usual, and attach
the usage object to the last emitted normalized chunk (preserving finish_reason
and ids) rather than short-circuiting and returning.

---

Duplicate comments:
In `@src/rotator_library/providers/kilocode_provider.py`:
- Around line 29-33: The model-listing call in kilocode_provider is hardcoded to
"https://kilo.ai/api/openrouter/models" and bypasses the configured base; update
the client.get call in the code that builds the request (the try block where
response = await client.get(...)) to use the configured Kilocode base variable
(e.g., KILOCODE_API_BASE or the existing kilocode_api_base config) and append
the "/api/openrouter/models" path so the request honors the KILOCODE_API_BASE
override while still passing the Authorization header with api_key.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: e8a9549a-6fbd-486c-ac0b-fa5e7dea24c3

📥 Commits

Reviewing files that changed from the base of the PR and between bf7ab7e and 82bb900.

📒 Files selected for processing (53)

.gitignore
CONTRIBUTING.md
README.md
src/proxy_app/main.py
src/proxy_app/provider_urls.py
src/rotator_library/__init__.py
src/rotator_library/anthropic_compat/__init__.py
src/rotator_library/anthropic_compat/streaming.py
src/rotator_library/anthropic_compat/streaming_fast.py
src/rotator_library/anthropic_compat/translator.py
src/rotator_library/async_locks.py
src/rotator_library/background_refresher.py
src/rotator_library/batched_persistence.py
src/rotator_library/circuit_breaker.py
src/rotator_library/client.py
src/rotator_library/config/__init__.py
src/rotator_library/config/defaults.py
src/rotator_library/cooldown_manager.py
src/rotator_library/credential_weight_cache.py
src/rotator_library/error_handler.py
src/rotator_library/http_client_pool.py
src/rotator_library/ip_throttle_detector.py
src/rotator_library/litellm_providers.py
src/rotator_library/model_definitions.py
src/rotator_library/provider_config.py
src/rotator_library/providers/base_token_manager.py
src/rotator_library/providers/cohere_provider.py
src/rotator_library/providers/gemini_provider.py
src/rotator_library/providers/google_oauth_base.py
src/rotator_library/providers/groq_provider.py
src/rotator_library/providers/iflow_auth_base.py
src/rotator_library/providers/iflow_provider.py
src/rotator_library/providers/kilocode_provider.py
src/rotator_library/providers/mistral_provider.py
src/rotator_library/providers/nanogpt_provider.py
src/rotator_library/providers/nvidia_provider.py
src/rotator_library/providers/openai_compatible_provider.py
src/rotator_library/providers/openai_provider.py
src/rotator_library/providers/opencode_provider.py
src/rotator_library/providers/openrouter_provider.py
src/rotator_library/providers/qwen_auth_base.py
src/rotator_library/providers/qwen_code_provider.py
src/rotator_library/providers/trybons_provider.py
src/rotator_library/providers/utilities/gemini_credential_manager.py
src/rotator_library/request_sanitizer.py
src/rotator_library/timeout_config.py
src/rotator_library/token_calculator.py
src/rotator_library/usage_manager.py
src/rotator_library/utils/__init__.py
src/rotator_library/utils/patch_litellm_finish_reason.py
src/rotator_library/utils/resilient_io.py
src/rotator_library/utils/suppress_litellm_warnings.py
start_proxy.bat

💤 Files with no reviewable changes (11)

src/rotator_library/providers/openai_compatible_provider.py
src/rotator_library/providers/mistral_provider.py
src/rotator_library/model_definitions.py
src/rotator_library/providers/cohere_provider.py
CONTRIBUTING.md
src/rotator_library/providers/nanogpt_provider.py
src/rotator_library/providers/openai_provider.py
src/rotator_library/providers/groq_provider.py
README.md
src/rotator_library/providers/gemini_provider.py
src/rotator_library/providers/openrouter_provider.py

coderabbitai · 2026-03-06T20:17:38Z

src/proxy_app/provider_urls.py

    "bedrock": "https://bedrock-runtime.us-east-1.amazonaws.com",
    "openrouter": "https://openrouter.ai/api/v1",
+    "kilocode": "https://kilocode.ai/api/openrouter",
+    "trybons": "https://go.trybons.ai",


⚠️ Potential issue | 🔴 Critical

Add /v1 to the TryBons base URL.

With the current map, chat/completions resolves to https://go.trybons.ai/messages, but src/rotator_library/providers/trybons_provider.py already talks to /v1/models. As written, TryBons chat traffic is routed to the wrong path.

Suggested fix

- "trybons": "https://go.trybons.ai", + "trybons": "https://go.trybons.ai/v1",

Also applies to: 62-64

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/proxy_app/provider_urls.py` at line 34, Update the TryBons base URL in the provider URLs mapping so requests hit the /v1 path: change the mapping value for the "trybons" key from "https://go.trybons.ai" to "https://go.trybons.ai/v1" in the provider_urls.py mapping (and make the same change for the other TryBons entries present later in that file), so code in trybons_provider.py that targets /v1/models routes to the correct endpoint.

coderabbitai · 2026-03-06T20:17:38Z

src/rotator_library/anthropic_compat/streaming_fast.py

+    except Exception as e:
+        logger.error(f"Error in Anthropic streaming wrapper: {e}")
+
+        # Send error as visible text
+        if not message_started:
+            yield _make_message_start_event(request_id, original_model, input_tokens, cached_tokens)
+
+        error_message = f"Error: {str(e)}"
+        yield _make_content_block_start_event(current_block_index, "text")
+        yield _make_text_delta_event(current_block_index, error_message)
+        yield _make_content_block_stop_event(current_block_index)
+        yield _make_message_delta_event("end_turn", 0, cached_tokens)
+        yield _make_message_stop_event()


⚠️ Potential issue | 🟠 Major

Close open blocks before emitting the fallback error block.

If an exception is raised after a thinking/text/tool block has already started, this handler opens a new text block without stopping the existing one(s). That produces an invalid Anthropic SSE sequence right when the client is already in the error path.

Suggested fix

except Exception as e: logger.error(f"Error in Anthropic streaming wrapper: {e}") + if thinking_block_started: + yield _make_content_block_stop_event(current_block_index) + current_block_index += 1 + thinking_block_started = False + + if content_block_started: + yield _make_content_block_stop_event(current_block_index) + current_block_index += 1 + content_block_started = False + + for tc_index in sorted(tool_block_indices.keys()): + yield _make_content_block_stop_event(tool_block_indices[tc_index]) + # Send error as visible text if not message_started: yield _make_message_start_event(request_id, original_model, input_tokens, cached_tokens)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

except Exception as e:

logger.error(f"Error in Anthropic streaming wrapper: {e}")

# Send error as visible text

if not message_started:

yield _make_message_start_event(request_id, original_model, input_tokens, cached_tokens)

error_message = f"Error: {str(e)}"

yield _make_content_block_start_event(current_block_index, "text")

yield _make_text_delta_event(current_block_index, error_message)

yield _make_content_block_stop_event(current_block_index)

yield _make_message_delta_event("end_turn", 0, cached_tokens)

yield _make_message_stop_event()

except Exception as e:

logger.error(f"Error in Anthropic streaming wrapper: {e}")

if thinking_block_started:

yield _make_content_block_stop_event(current_block_index)

current_block_index += 1

thinking_block_started = False

if content_block_started:

yield _make_content_block_stop_event(current_block_index)

current_block_index += 1

content_block_started = False

for tc_index in sorted(tool_block_indices.keys()):

yield _make_content_block_stop_event(tool_block_indices[tc_index])

# Send error as visible text

if not message_started:

yield _make_message_start_event(request_id, original_model, input_tokens, cached_tokens)

error_message = f"Error: {str(e)}"

yield _make_content_block_start_event(current_block_index, "text")

yield _make_text_delta_event(current_block_index, error_message)

yield _make_content_block_stop_event(current_block_index)

yield _make_message_delta_event("end_turn", 0, cached_tokens)

yield _make_message_stop_event()

🧰 Tools

🪛 Ruff (0.15.4)

[warning] 369-369: Do not catch blind exception: Exception

(BLE001)

[warning] 370-370: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

[warning] 370-370: Logging statement uses f-string

(G004)

[warning] 376-376: Use explicit conversion flag

Replace with conversion flag

(RUF010)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/rotator_library/anthropic_compat/streaming_fast.py` around lines 369 - 381, The exception handler currently starts a new text content block without closing any previously opened content/message blocks; update the except block in streaming_fast.py (the logic around message_started, current_block_index and the yields of _make_content_block_start_event/_make_text_delta_event) to first close any open content blocks and the current message turn: yield _make_content_block_stop_event(current_block_index) if a block was started, and yield the appropriate _make_message_delta_event("end_turn", 0, cached_tokens) and _make_message_stop_event() as needed before emitting the fallback text block and the error delta; ensure you only open the fallback text block after these stops so the SSE sequence stays valid.

coderabbitai · 2026-03-06T20:17:38Z

src/rotator_library/anthropic_compat/translator.py

+# Try to import orjson for faster JSON handling (3-5x faster than stdlib json)
+try:
+    import orjson
+
+    def _json_dumps(obj) -> str:
+        return orjson.dumps(obj).decode('utf-8')
+
+    def _json_loads(s: str):
+        return orjson.loads(s)
+
+    _json_decode_error = orjson.JSONDecodeError
+except ImportError:
+    import json as _json_fallback
+
+    def _json_dumps(obj) -> str:
+        return _json_fallback.dumps(obj)
+
+    def _json_loads(s: str):
+        return _json_fallback.loads(s)
+
+    _json_decode_error = _json_fallback.JSONDecodeError


🧹 Nitpick | 🔵 Trivial

Extract the JSON shim into a shared helper.

This _json_dumps/_json_loads/_json_decode_error block is now duplicated with src/rotator_library/anthropic_compat/streaming.py. Keeping the orjson fallback logic in two places will drift quickly; one shared helper keeps serialization behavior and exception handling consistent.

🧰 Tools

🪛 Ruff (0.15.4)

[warning] 22-22: Missing return type annotation for private function _json_loads

(ANN202)

[warning] 32-32: Missing return type annotation for private function _json_loads

(ANN202)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/rotator_library/anthropic_compat/translator.py` around lines 15 - 35, Extract the duplicated JSON shim (_json_dumps, _json_loads, _json_decode_error) into a single shared helper module (e.g., json_shim or json_utils) that encapsulates the orjson import/fallback logic and exposes the same symbols; replace the local definitions in translator.py (functions _json_dumps, _json_loads and symbol _json_decode_error) and in streaming.py to import these symbols from the new helper so behavior and exception semantics remain identical; ensure the helper returns/decodes strings the same way (orjson.dumps(...).decode('utf-8') vs json.dumps) and re-exports the correct JSONDecodeError class so existing code that catches _json_decode_error continues to work.

coderabbitai · 2026-03-06T20:17:38Z

src/rotator_library/async_locks.py

+    async def release_read(self) -> None:
+        """Release read lock."""
+        async with self._read_ready:
+            self._readers -= 1
+            if self._readers == 0:
+                # Notify waiting writers
+                async with self._write_ready:
+                    self._write_ready.notify_all()
+
+    async def acquire_write(self) -> None:
+        """Acquire write lock."""
+        async with self._write_ready:
+            self._writer_waiting += 1
+            try:
+                # Wait until no readers or writers active
+                while self._readers > 0 or self._writer_active:
+                    await self._write_ready.wait()
+                self._writer_active = True
+            finally:
+                self._writer_waiting -= 1
+
+    async def release_write(self) -> None:
+        """Release write lock."""
+        async with self._write_ready:
+            self._writer_active = False
+            # Notify both readers and writers
+            self._write_ready.notify_all()
+            async with self._read_ready:
+                self._read_ready.notify_all()


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

cat -n src/rotator_library/async_locks.py

Repository: Mirrowel/LLM-API-Key-Proxy

Length of output: 14098

🌐 Web query:

asyncio.Condition reentrant lock semantics Python

💡 Result:

asyncio.Condition is not reentrant. A Condition wraps an underlying asyncio.Lock (either the one you pass in, or a new one it creates), and its acquire()/release() are just for that lock. The lock argument must be an asyncio.Lock (or None), not an RLock. [1]

asyncio.Lock is non-reentrant, so if the same task tries to acquire it again (e.g., nested async with cond: / await cond.acquire()), it will block forever (deadlock) until it’s released. [1]

Condition.wait() semantics: you must hold the lock; wait() releases the lock while waiting and, when notified, re-acquires the lock before returning (so the caller resumes while holding it). [1][2]

There is no stdlib asyncio.RLock, so if you truly need reentrancy you must redesign to avoid nested acquisition or use a third-party async RLock implementation. [3]

Sources: [1] Python asyncio synchronization primitives docs, Condition/Lock sections; [2] Stack Overflow explanation of Condition.wait() releasing and re-acquiring the lock; [3] fair-async-rlock (PyPI) noting asyncio lacks an RLock.

The read/write release paths can deadlock each other.

release_read() acquires _read_ready and then _write_ready, while release_write() does the reverse. Two concurrent releases can each block on the other's condition lock forever. Since asyncio.Condition is non-reentrant and each holds a separate underlying lock, concurrent releases will deadlock: Task A holds _read_ready.lock and waits for _write_ready.lock, while Task B holds _write_ready.lock and waits for _read_ready.lock. Both conditions need to share one underlying lock, or these paths need a single acquisition order.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/rotator_library/async_locks.py` around lines 61 - 89, The release paths deadlock because release_read() acquires _read_ready then _write_ready while release_write() does the opposite; fix by making both Conditions share the same underlying lock or by enforcing a single global acquisition order so nested acquires never invert. Concretely, create one asyncio.Lock (or one asyncio.Condition) and initialize both _read_ready and _write_ready with that same lock (or replace both with a single Condition and adjust logic), then update release_read, release_write and acquire_write to use that shared lock/condition so notify_all() calls no longer require nested, inversed acquisitions; ensure functions named release_read, release_write and acquire_write reference the shared _lock/_condition.

coderabbitai · 2026-03-06T20:17:38Z

src/rotator_library/async_locks.py

+            async with self._lock:
+                self._stats["acquires"] += 1
+                self._stats["current_concurrent"] += 1
+                self._stats["peak_concurrent"] = max(
+                    self._stats["peak_concurrent"],
+                    self._stats["current_concurrent"]
+                )
+            return True
+
+        except Exception:
+            async with self._lock:
+                if waiter in self._waiters:
+                    self._waiters.remove(waiter)
+            raise
+
+    def release(self) -> None:
+        """Release the semaphore."""
+        # Note: This is sync for compatibility with context manager
+        self._value += 1
+        self._stats["releases"] += 1
+        self._stats["current_concurrent"] -= 1
+
+        # Wake up next waiter if any
+        if self._waiters:
+            next_waiter = self._waiters.pop(0)
+            next_waiter.set()


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

wc -l src/rotator_library/async_locks.py

Repository: Mirrowel/LLM-API-Key-Proxy

Length of output: 108

🏁 Script executed:

cat -n src/rotator_library/async_locks.py | head -220

Repository: Mirrowel/LLM-API-Key-Proxy

Length of output: 8184

Fix direct handoff logic in semaphore release.

The release() method has a critical bug: it increments _value unconditionally, then wakes a waiter if one exists. The waiter continues without decrementing _value, causing the same permit to be counted twice—once as the incremented _value, and once as the acquired permit held by the waiter. This allows concurrent usage to exceed the configured limit.

The fix should only increment _value when no waiters exist; when handing off directly to a waiter, skip the increment and let the waiter consume the permit without double-counting.

Proposed fix

def release(self) -> None: """Release the semaphore.""" # Note: This is sync for compatibility with context manager - self._value += 1 self._stats["releases"] += 1 self._stats["current_concurrent"] -= 1 # Wake up next waiter if any if self._waiters: - next_waiter = self._waiters.pop(0) - next_waiter.set() + self._waiters.pop(0).set() + else: + self._value += 1

🧰 Tools

🪛 Ruff (0.15.4)

[warning] 193-193: Consider moving this statement to an else block

(TRY300)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/rotator_library/async_locks.py` around lines 186 - 211, The release() method must avoid incrementing _value when directly handing off to a waiter: change logic in release() so that if _waiters is non-empty you pop a waiter and call waiter.set() without doing self._value += 1 and without decrementing self._stats["current_concurrent"] (the permit is transferred), otherwise (no waiters) do self._value += 1 and decrement self._stats["current_concurrent"] and increment self._stats["releases"]; ensure references to release(), _value, _waiters, waiter.set(), and self._stats["current_concurrent"] are updated accordingly so the semaphore doesn't double-count a permit on direct handoff.

coderabbitai · 2026-03-06T20:17:40Z

src/rotator_library/utils/resilient_io.py

+    async def flush(self) -> None:
+        """Немедленная запись."""
+        async with self._lock:
+            if self._pending is None:
+                return
+
+            data = self._pending
+            self._pending = None
+
+            try:
+                # Atomic write: temp -> rename
+                temp_path = self.file_path.with_suffix('.tmp')
+                temp_path.write_text(
+                    json.dumps(data, indent=2, default=str),
+                    encoding='utf-8'
+                )
+                temp_path.replace(self.file_path)
+            except Exception:
+                # Recovery: recreate directory if deleted
+                self.file_path.parent.mkdir(parents=True, exist_ok=True)
+                self.file_path.write_text(
+                    json.dumps(data, indent=2, default=str),
+                    encoding='utf-8'
+                )


⚠️ Potential issue | 🟠 Major

Blocking I/O operations in async context will block the event loop.

The flush() method uses synchronous write_text() and replace() calls. In an async class, these blocking operations will stall the entire event loop during disk I/O, negating the benefits of async design.

Consider wrapping blocking I/O with asyncio.to_thread():

♻️ Proposed fix using asyncio.to_thread()

async def flush(self) -> None: - """Немедленная запись.""" + """Immediate write.""" async with self._lock: if self._pending is None: return data = self._pending self._pending = None + # Run blocking I/O in thread pool + await asyncio.to_thread(self._write_to_disk, data) + + def _write_to_disk(self, data: Dict[str, Any]) -> None: + """Synchronous disk write (called via to_thread).""" + try: + temp_path = self.file_path.with_suffix('.tmp') + temp_path.write_text( + json.dumps(data, indent=2, default=str), + encoding='utf-8' + ) + temp_path.replace(self.file_path) + except OSError: + self.file_path.parent.mkdir(parents=True, exist_ok=True) + self.file_path.write_text( + json.dumps(data, indent=2, default=str), + encoding='utf-8' + ) - try: - # Atomic write: temp -> rename - temp_path = self.file_path.with_suffix('.tmp') - temp_path.write_text( - json.dumps(data, indent=2, default=str), - encoding='utf-8' - ) - temp_path.replace(self.file_path) - except Exception: - # Recovery: recreate directory if deleted - self.file_path.parent.mkdir(parents=True, exist_ok=True) - self.file_path.write_text( - json.dumps(data, indent=2, default=str), - encoding='utf-8' - )

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

async def flush(self) -> None:

"""Немедленная запись."""

async with self._lock:

if self._pending is None:

return

data = self._pending

self._pending = None

try:

# Atomic write: temp -> rename

temp_path = self.file_path.with_suffix('.tmp')

temp_path.write_text(

json.dumps(data, indent=2, default=str),

encoding='utf-8'

)

temp_path.replace(self.file_path)

except Exception:

# Recovery: recreate directory if deleted

self.file_path.parent.mkdir(parents=True, exist_ok=True)

self.file_path.write_text(

json.dumps(data, indent=2, default=str),

encoding='utf-8'

)

async def flush(self) -> None:

"""Immediate write."""

async with self._lock:

if self._pending is None:

return

data = self._pending

self._pending = None

# Run blocking I/O in thread pool

await asyncio.to_thread(self._write_to_disk, data)

def _write_to_disk(self, data: Dict[str, Any]) -> None:

"""Synchronous disk write (called via to_thread)."""

try:

# Atomic write: temp -> rename

temp_path = self.file_path.with_suffix('.tmp')

temp_path.write_text(

json.dumps(data, indent=2, default=str),

encoding='utf-8'

)

temp_path.replace(self.file_path)

except OSError:

# Recovery: recreate directory if deleted

self.file_path.parent.mkdir(parents=True, exist_ok=True)

self.file_path.write_text(

json.dumps(data, indent=2, default=str),

encoding='utf-8'

)

🧰 Tools

🪛 Ruff (0.15.4)

[warning] 727-727: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/rotator_library/utils/resilient_io.py` around lines 710 - 733, The flush() method performs blocking filesystem ops (file_path.with_suffix(...).write_text, replace, parent.mkdir, write_text) inside an async function, which will block the event loop; change those sync calls to run in background threads via asyncio.to_thread: call await asyncio.to_thread(...) for temp_path.write_text, temp_path.replace, self.file_path.parent.mkdir, and the fallback self.file_path.write_text, and ensure asyncio is imported; keep the same try/except recovery logic and the use of self._lock, _pending, temp_path = self.file_path.with_suffix('.tmp') and preserve JSON serialization behavior.

coderabbitai · 2026-03-06T20:17:40Z

src/rotator_library/utils/resilient_io.py

+            try:
+                # Atomic write: temp -> rename
+                temp_path = self.file_path.with_suffix('.tmp')
+                temp_path.write_text(
+                    json.dumps(data, indent=2, default=str),
+                    encoding='utf-8'
+                )
+                temp_path.replace(self.file_path)
+            except Exception:
+                # Recovery: recreate directory if deleted
+                self.file_path.parent.mkdir(parents=True, exist_ok=True)
+                self.file_path.write_text(
+                    json.dumps(data, indent=2, default=str),
+                    encoding='utf-8'
+                )


⚠️ Potential issue | 🟠 Major

Predictable temp file path and non-atomic recovery path risk data integrity.

Two issues:

Line 721: Using .with_suffix('.tmp') creates a predictable temp path. If two writers target the same file simultaneously, or a stale temp file exists from a crash, this could cause collisions. The synchronous ResilientStateWriter uses tempfile.mkstemp() for unique paths.

Lines 729-733: The recovery fallback writes directly to the final path without the atomic temp→rename pattern. If this write is interrupted, the file could be left corrupted.

♻️ Proposed fix for safer temp file handling

+import tempfile +import shutil + # In _write_to_disk or flush: - temp_path = self.file_path.with_suffix('.tmp') - temp_path.write_text( - json.dumps(data, indent=2, default=str), - encoding='utf-8' - ) - temp_path.replace(self.file_path) + fd, tmp_path_str = tempfile.mkstemp( + dir=self.file_path.parent, + prefix=".tmp_", + suffix=".json" + ) + try: + with os.fdopen(fd, "w", encoding="utf-8") as f: + json.dump(data, f, indent=2, default=str) + shutil.move(tmp_path_str, self.file_path) + except Exception: + if os.path.exists(tmp_path_str): + os.unlink(tmp_path_str) + raise

🧰 Tools

🪛 Ruff (0.15.4)

[warning] 727-727: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/rotator_library/utils/resilient_io.py` around lines 719 - 733, The current ResilientStateWriter write path uses a predictable temp path (self.file_path.with_suffix('.tmp')) and the exception recovery writes directly to the final path, risking collisions and non-atomic writes; change the implementation in the write routine (the block around temp_path creation and the exception handler in ResilientStateWriter.write or the method containing that try/except) to create a unique temp file in the same directory using tempfile.mkstemp or tempfile.NamedTemporaryFile(dir=self.file_path.parent, delete=False), write the JSON to that unique temp file, fsync the file descriptor and parent directory, then atomically replace the target using os.replace; in the exception/recovery branch ensure the parent directory is created first and perform the same unique-temp -> fsync -> os.replace sequence (not a direct write to the target) so recovery remains atomic and avoids predictable-temp collisions.

coderabbitai · 2026-03-06T20:17:40Z

src/rotator_library/utils/suppress_litellm_warnings.py

+    old_stdout = sys.stdout
+    old_stderr = sys.stderr
+    sys.stdout = StringIO()
+    sys.stderr = StringIO()
+    try:
+        yield
+    finally:
+        sys.stdout = old_stdout
+        sys.stderr = old_stderr


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash set -euo pipefail # Inspect every call site using this context manager and show surrounding context. python - <<'PY' from pathlib import Path import re use_pat = re.compile(r'with\s+suppress_litellm_prints\s*\(') marker_pat = re.compile(r'^\s*(async\s+def\b|threading\.Thread|ThreadPoolExecutor|FastAPI|APIRouter)\b') for path in sorted(Path(".").rglob("*.py")): text = path.read_text(encoding="utf-8", errors="ignore").splitlines() hit_lines = [i for i, line in enumerate(text, 1) if use_pat.search(line)] if not hit_lines: continue print(f"\n== {path} ==") for ln in hit_lines: start = max(1, ln - 3) end = min(len(text), ln + 3) for i in range(start, end + 1): print(f"{i}:{text[i-1]}") markers = [i for i, line in enumerate(text, 1) if marker_pat.search(line)] if markers: print("-- concurrency/request markers in same file --") for i in markers[:20]: print(f"{i}:{text[i-1]}") PY

Repository: Mirrowel/LLM-API-Key-Proxy

Length of output: 1737

Avoid replacing global sys.stdout/sys.stderr here.

Lines 37–40 mutate interpreter-global streams. This context manager is called from async def record_failure() in usage_manager.py (line 3065), where multiple concurrent tasks may execute simultaneously. When one coroutine redirects stdout/stderr, any other task writing during that window gets silenced too. Redirecting stderr also risks hiding unrelated tracebacks and can break code expecting a real stream object. Filter the specific LiteLLM spam at the print boundary instead of black-holing process-wide output.

coderabbitai · 2026-03-06T20:17:40Z

start_proxy.bat

+@echo off
+chcp 65001 >nul
+setlocal
+
+echo ========================================
+echo LLM API Key Proxy - Запуск
+echo ========================================
+echo.
+
+cd /d "%~dp0"
+
+
+echo.
+echo Запуск прокси-сервера на http://127.0.0.1:8000
+echo.
+echo.
+echo Для остановки нажмите Ctrl+C
+echo ========================================
+echo.
+
+python src/proxy_app/main.py --host 127.0.0.1 --port 8000
+
+pause


🧹 Nitpick | 🔵 Trivial

Consider using the py launcher for better Windows compatibility.

On Windows, the py launcher (PEP 397) is more reliable than calling python directly, as it handles multiple Python installations and doesn't require Python to be in the system PATH.

♻️ Suggested improvement

-python src/proxy_app/main.py --host 127.0.0.1 --port 8000 +py src/proxy_app/main.py --host 127.0.0.1 --port 8000

Alternatively, check for Python availability first:

where python >nul 2>&1 || (echo Python not found in PATH & pause & exit /b 1) python src/proxy_app/main.py --host 127.0.0.1 --port 8000

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

@echo off

chcp 65001 >nul

setlocal

echo ========================================

echo LLM API Key Proxy - Запуск

echo ========================================

echo.

cd /d "%~dp0"

echo.

echo Запуск прокси-сервера на http://127.0.0.1:8000

echo.

echo.

echo Для остановки нажмите Ctrl+C

echo ========================================

echo.

python src/proxy_app/main.py --host 127.0.0.1 --port 8000

pause

`@echo` off

chcp 65001 >nul

setlocal

echo ========================================

echo LLM API Key Proxy - Запуск

echo ========================================

echo.

cd /d "%~dp0"

echo.

echo Запуск прокси-сервера на http://127.0.0.1:8000

echo.

echo.

echo Для остановки нажмите Ctrl+C

echo ========================================

echo.

py src/proxy_app/main.py --host 127.0.0.1 --port 8000

pause

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@start_proxy.bat` around lines 1 - 23, Replace the direct "python src/proxy_app/main.py --host 127.0.0.1 --port 8000" invocation with a Windows-friendly launcher: use "py -3 src/proxy_app/main.py --host 127.0.0.1 --port 8000" to ensure the correct Python installation is used; alternatively, add a pre-check using "where python" and bail with a clear message if Python is not found before invoking the existing "python src/proxy_app/main.py --host 127.0.0.1 --port 8000" command so the script works on systems without python on PATH.

coderabbitai · 2026-03-06T20:17:40Z

start_proxy.bat

+chcp 65001 >nul
+setlocal
+
+echo ========================================
+echo LLM API Key Proxy - Запуск
+echo ========================================


🧹 Nitpick | 🔵 Trivial

Ensure the file is saved with UTF-8 encoding (with BOM recommended for Windows batch).

The Russian text on lines 6, 14, and 17 requires this file to be saved as UTF-8. For Windows batch files with non-ASCII characters, saving with UTF-8 BOM (Byte Order Mark) is typically more reliable to ensure proper display in the console after chcp 65001.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@start_proxy.bat` around lines 2 - 7, The batch file contains non-ASCII Russian text (e.g., the echo string "LLM API Key Proxy - Запуск" and other lines) and must be saved using UTF-8 with a BOM so the console displays those characters correctly after running chcp 65001; reopen start_proxy.bat in your editor/IDE and resave it as "UTF-8 with BOM" (or "UTF-8 signature") to add the BOM, then verify the file still contains the lines with chcp 65001 and the echo statements unchanged.

coderabbitai

Review continued from previous batch...

coderabbitai · 2026-03-06T20:18:53Z

src/rotator_library/providers/nvidia_provider.py

+        # --- Step 1: Strip all Anthropic-specific top-level fields ---
+        self._strip_anthropic_fields(payload)
+
+        # --- Step 2: Strip cache_control and normalize messages ---
+        self._sanitize_messages(payload)
+
+        # --- Step 3: Handle reasoning/thinking for DeepSeek models ---
        reasoning_effort = payload.get("reasoning_effort")

-        if model_name in deepseek_models and reasoning_effort in ["low", "medium", "high"]:
-            if "extra_body" not in payload:
-                payload["extra_body"] = {}
-            if "chat_template_kwargs" not in payload["extra_body"]:
-                payload["extra_body"]["chat_template_kwargs"] = {}
-
-            payload["extra_body"]["chat_template_kwargs"]["thinking"] = True
-            lib_logger.info(f"Enabled 'thinking' parameter for model: {model_name} due to reasoning_effort: '{reasoning_effort}'")
+        if model_name in _NVIDIA_DEEPSEEK_THINKING_MODELS:
+            if reasoning_effort in ("low", "medium", "high"):
+                payload.setdefault("extra_body", {})
+                payload["extra_body"].setdefault("chat_template_kwargs", {})
+                payload["extra_body"]["chat_template_kwargs"]["thinking"] = True


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash set -euo pipefail # Inspect where the NVIDIA provider hook is invoked. rg -nP --type=py -C2 '\bhandle_thinking_parameter\s*\(' src echo # Look for any normalization that converts Anthropic `thinking` # into `reasoning_effort` before the provider hook runs. rg -nP --type=py -C3 '\bthinking\b|\breasoning_effort\b' src

Repository: Mirrowel/LLM-API-Key-Proxy

Length of output: 50383

🏁 Script executed:

# Check the context around handle_thinking_parameter calls in client.py sed -n '2300,2340p' src/rotator_library/client.py echo "---" # Check the second location sed -n '3230,3270p' src/rotator_library/client.py

Repository: Mirrowel/LLM-API-Key-Proxy

Length of output: 4065

🏁 Script executed:

# Look at the full handle_thinking_parameter method in nvidia_provider.py sed -n '129,190p' src/rotator_library/providers/nvidia_provider.py

Repository: Mirrowel/LLM-API-Key-Proxy

Length of output: 2994

🏁 Script executed:

# Search for any code that converts or maps thinking to reasoning_effort rg -n 'thinking.*reasoning_effort|reasoning_effort.*thinking' --type=py src echo "---" # Look for any preprocessing of Anthropic payloads before provider handler rg -n 'normalize|convert.*anthropic|anthropic.*convert' --type=py src -i | head -40

Repository: Mirrowel/LLM-API-Key-Proxy

Length of output: 7221

🏁 Script executed:

# Look at the translator.py code around those lines sed -n '60,70p' src/rotator_library/anthropic_compat/translator.py sed -n '630,650p' src/rotator_library/anthropic_compat/translator.py echo "---" # Check where/how this translator is invoked rg -n 'translator|translate_anthropic|from anthropic_compat' --type=py src | grep -v test | head -30

Repository: Mirrowel/LLM-API-Key-Proxy

Length of output: 2178

🏁 Script executed:

# Check around line 4436 in client.py to see the context sed -n '4400,4450p' src/rotator_library/client.py echo "---" # Also check if translate_anthropic_request is called in the same code path as handle_thinking_parameter # Look for where the Anthropic path is handled rg -n 'translate_anthropic_request\|anthropic_compat' src/rotator_library/client.py | head -20

Repository: Mirrowel/LLM-API-Key-Proxy

Length of output: 2235

🏁 Script executed:

# Get more context from the Anthropic messages handler sed -n '4430,4500p' src/rotator_library/client.py echo "---" # Check if there's a separate code path for Anthropic vs OpenAI rg -n 'async def messages|async def acompletion' src/rotator_library/client.py | head -10

Repository: Mirrowel/LLM-API-Key-Proxy

Length of output: 3469

🏁 Script executed:

# Check the nvidia_provider class docstring and any comments about Anthropic format sed -n '1,120p' src/rotator_library/providers/nvidia_provider.py

Repository: Mirrowel/LLM-API-Key-Proxy

Length of output: 4663

🏁 Script executed:

# Check if translate_anthropic_request properly converts thinking to reasoning_effort sed -n '585,660p' src/rotator_library/anthropic_compat/translator.py

Repository: Mirrowel/LLM-API-Key-Proxy

Length of output: 2725

🏁 Script executed:

# Verify that ALL Anthropic requests go through translate_anthropic_request # by checking if there are any other callers of acompletion with anthropic-style payloads rg -n 'def messages\|async def messages' src/rotator_library/client.py -A 5 echo "---" # Verify the nvidia_provider docstring says it handles "Anthropic-format requests" # but check if it's ever called with actual Anthropic format (before translation) sed -n '25,35p' src/rotator_library/providers/nvidia_provider.py

Repository: Mirrowel/LLM-API-Key-Proxy

Length of output: 433

🏁 Script executed:

# Check if nvidia_provider.handle_thinking_parameter is called directly anywhere else rg -n 'provider.*handle_thinking_parameter\|NvidiaProvider' src/rotator_library/client.py | head -20 echo "---" # Verify all acompletion calls that go to nvidia_nim provider rg -n 'nvidia_nim|nvidia\|' src/rotator_library/client.py | grep -E 'provider|model' | head -10

Repository: Mirrowel/LLM-API-Key-Proxy

Length of output: 403

🏁 Script executed:

# Let me trace the complete flow one more time to be absolutely certain # Check if there's any path where an OpenAI-format request could have Anthropic thinking # First, confirm that messages() endpoint uses translate_anthropic_request rg -n 'async def messages' src/rotator_library/client.py -A 50 | grep -E 'translate_anthropic_request|acompletion' | head -5 echo "---" # Check the acompletion signature to see if it can receive Anthropic thinking rg -n 'async def acompletion' src/rotator_library/client.py -A 2

Repository: Mirrowel/LLM-API-Key-Proxy

Length of output: 73

🏁 Script executed:

# Let me check if there's any way an OpenAI-format request could have Anthropic thinking field # by looking at how requests are validated or constructed # Check the request models/schemas rg -n 'class.*Request|thinking.*field' src/rotator_library/anthropic_compat/ --type=py | head -20 echo "---" # Verify by looking at the actual acompletion signature sed -n '1900,1950p' src/rotator_library/client.py | grep -A 50 'async def acompletion'

Repository: Mirrowel/LLM-API-Key-Proxy

Length of output: 403

🏁 Script executed:

# Look at the _budget_to_reasoning_effort function to understand the mapping sed -n '60,100p' src/rotator_library/anthropic_compat/translator.py echo "---" # Check if there's a comment explaining why thinking without budget_tokens is not converted sed -n '636,650p' src/rotator_library/anthropic_compat/translator.py

Repository: Mirrowel/LLM-API-Key-Proxy

Length of output: 2358

🏁 Script executed:

# Check what the default thinking structure looks like from Anthropic rg -n 'thinking.*enabled\|type.*enabled' src/rotator_library/anthropic_compat/ -B2 -A2 | head -30 echo "---" # Verify if there's a test or example showing thinking without budget_tokens rg -n 'budget_tokens.*None\|thinking.*{.*type' src/ --type=py -A2 | head -20

Repository: Mirrowel/LLM-API-Key-Proxy

Length of output: 243

🏁 Script executed:

# Check the Anthropic model to see the structure of thinking parameter sed -n '80,130p' src/rotator_library/anthropic_compat/models.py echo "---" # Check if there are any tests or documentation showing thinking without budget_tokens find src -name '*test*' -o -name '*example*' | xargs grep -l 'thinking' 2>/dev/null | head -5

Repository: Mirrowel/LLM-API-Key-Proxy

Length of output: 1486

🏁 Script executed:

# Verify the exact scenario: Anthropic thinking without budget_tokens # Check what happens in the translator when budget_tokens is None sed -n '636,655p' src/rotator_library/anthropic_compat/translator.py echo "---" # Now check nvidia provider again - does it have any fallback when reasoning_effort is missing? sed -n '155,180p' src/rotator_library/providers/nvidia_provider.py

Repository: Mirrowel/LLM-API-Key-Proxy

Length of output: 2051

thinking without budget_tokens loses its signal on NVIDIA DeepSeek models.

When Anthropic requests have thinking: {type: "enabled"} but no budget_tokens, translate_anthropic_request() intentionally skips setting reasoning_effort (line 644: "don't set anything"). The NVIDIA provider then strips the thinking field at Step 1 and finds no reasoning_effort at Step 3, silently disabling reasoning on DeepSeek models.

The translator assumes such cases should "let the provider decide the default," but the provider has no way to know thinking was intended. Either:

Set a default reasoning_effort (e.g., "medium") when thinking.type == "enabled" regardless of budget_tokens, or

Add fallback logic in the NVIDIA provider to enable thinking by default for DeepSeek when thinking was present but reasoning_effort is absent.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/rotator_library/providers/nvidia_provider.py` around lines 157 - 170, The provider currently strips Anthropic "thinking" before checking reasoning_effort, losing the signal; change the logic so you detect Anthropic's thinking before calling _strip_anthropic_fields (or preserve the original payload) and if model_name is in _NVIDIA_DEEPSEEK_THINKING_MODELS and payload has a "thinking" object but no reasoning_effort, set payload["reasoning_effort"] = "medium" (or another chosen default) so the later block that sets extra_body.chat_template_kwargs.thinking=True still runs; update the code around _strip_anthropic_fields, _sanitize_messages and the reasoning_effort check to reflect this fallback.

Add Kilocode provider support

e11ea15

ShmidtS requested a review from Mirrowel as a code owner January 31, 2026 18:24

mirrobot-agent bot reviewed Jan 31, 2026

View reviewed changes

ShmidtS and others added 20 commits February 8, 2026 16:53

same fix

52ebc4d

Revise README content and remove unnecessary badges

dbec473

Updated README to remove badge links and clarify functionality.

Added start batch file

b458835

b7a5346

greptile-apps bot reviewed Mar 3, 2026

View reviewed changes

Mirrowel added enhancement New feature or request Agent Monitored Monitored for AI Agent to review PR's and commits labels Mar 3, 2026

Mirrowel added the Priority label Mar 3, 2026

mirrobot-agent bot requested changes Mar 3, 2026

View reviewed changes

mirrobot-agent bot reviewed Mar 5, 2026

View reviewed changes

greptile-apps bot reviewed Mar 5, 2026

View reviewed changes

src/proxy_app/provider_urls.py Show resolved Hide resolved

src/rotator_library/client.py Outdated Show resolved Hide resolved

refactor(client): extract _strip_client_headers method and improve st…

6e34ed8

…reaming retry

greptile-apps bot reviewed Mar 5, 2026

View reviewed changes

greptile-apps bot reviewed Mar 6, 2026

View reviewed changes

perf: maximize parallelism and throughput for API provider requests

c58d07c

greptile-apps bot reviewed Mar 6, 2026

View reviewed changes

ShmidtS and others added 4 commits March 7, 2026 00:59

greptile-apps bot reviewed Mar 6, 2026

View reviewed changes

coderabbitai bot reviewed Mar 6, 2026

View reviewed changes

	"https://kilo.ai/api/openrouter/models",
	"https://kilocode.ai/api/openrouter/models",

	from dotenv import load_dotenv
	from glob import glob
	from dotenv import load_dotenv

	if ":free" in model_without_prefix or model_without_prefix.startswith("z-ai/"):
	if ":free" in model_without_prefix:

	export KILOCODE_API_BASE=https://kilo.ai/api/openrouter
	export KILOCODE_API_BASE=https://kilocode.ai/api/openrouter

Uh oh!

Conversation

ShmidtS commented Jan 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Changes

Provider Features

Testing

Uh oh!

mirrobot-agent bot commented Jan 31, 2026

Uh oh!

mirrobot-agent bot left a comment

Choose a reason for hiding this comment

Overall Assessment

Architectural Feedback

Key Suggestions

Nitpicks and Minor Points

Questions for the Author

Uh oh!

mirrobot-agent bot Jan 31, 2026

Choose a reason for hiding this comment

Uh oh!

mirrobot-agent bot Jan 31, 2026

Choose a reason for hiding this comment

Uh oh!

mirrobot-agent bot Jan 31, 2026

Choose a reason for hiding this comment

Uh oh!

mirrobot-agent bot Jan 31, 2026

Choose a reason for hiding this comment

Uh oh!

mirrobot-agent bot Jan 31, 2026

Choose a reason for hiding this comment

Uh oh!

Mirrowel commented Mar 3, 2026

Uh oh!

Mirrowel commented Mar 3, 2026

Uh oh!

greptile-apps bot commented Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 1/5

Sequence Diagram

Comments Outside Diff (1)

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

mirrobot-agent bot left a comment

Choose a reason for hiding this comment

Follow-up Review: Critical Issues Still Unaddressed

Critical Issues (Blocking)

High Priority Issues

Medium Priority Issues

Previously Fixed

Summary

Uh oh!

mirrobot-agent bot Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

mirrobot-agent bot Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

mirrobot-agent bot left a comment

Choose a reason for hiding this comment

Follow-up Review

Unaddressed Issue from Previous Reviews

New Changes Assessment

Previously Identified Critical Issues

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

ShmidtS commented Jan 31, 2026 •

edited

Loading

greptile-apps bot commented Mar 3, 2026 •

edited

Loading

coderabbitai bot commented Mar 5, 2026 •

edited

Loading