Skip to content

Add Kilocode provider support#115

Open
ShmidtS wants to merge 29 commits intoMirrowel:mainfrom
ShmidtS:main
Open

Add Kilocode provider support#115
ShmidtS wants to merge 29 commits intoMirrowel:mainfrom
ShmidtS:main

Conversation

@ShmidtS
Copy link

@ShmidtS ShmidtS commented Jan 31, 2026

Description

This PR adds support for the Kilocode LLM provider (https://kilocode.ai).

Kilocode is an OpenRouter-compatible API that routes requests to various LLM providers through a unified interface.

Changes

  • New file: src/rotator_library/providers/kilocode_provider.py

    • Implements KilocodeProvider class inheriting from ProviderInterface
    • Supports endpoint: https://kilocode.ai/api/openrouter/chat/completions
    • Handles model routing with provider prefixes (e.g., minimax/, moonshotai/, z-ai/)
  • Modified: src/proxy_app/provider_urls.py

    • Added Kilocode endpoint configuration
  • Modified: src/rotator_library/provider_config.py

    • Registered Kilocode provider in LITELLM_PROVIDERS with category popular
    • Added support for KILOCODE_API_BASE environment variable

Provider Features

  • API Endpoint: https://kilocode.ai/api/openrouter/chat/completions
  • Model Routing: Supports models with provider prefixes:
    • minimax/minimax-m2.1:free
    • moonshotai/kimi-k2.5:free
    • z-ai/glm-4.7:free
    • Other models in provider/model format

Testing

  • Provider correctly registers via the plugin system
  • Endpoint resolves properly for API requests
  • Model routing with prefixes works as expected

Important

Adds support for Kilocode provider with new class and configuration updates, plus minor refactoring for readability.

  • Kilocode Provider Support:
    • Adds KilocodeProvider class in kilocode_provider.py for handling Kilocode API interactions.
    • Supports model routing with prefixes like minimax/, moonshotai/, z-ai/.
  • Configuration Changes:
    • Registers Kilocode in provider_config.py with category popular and supports KILOCODE_API_BASE env variable.
    • Adds Kilocode endpoint to provider_urls.py.
  • Miscellaneous:
    • Refactors code in client.py and usage_manager.py for better readability and consistency.
    • Fixes Windows console encoding issues in main.py.

This description was created by Ellipsis for e11ea15. You can customize this summary. It will automatically update as commits are pushed.

@ShmidtS ShmidtS requested a review from Mirrowel as a code owner January 31, 2026 18:24
@mirrobot-agent
Copy link
Contributor

I'm starting my review of the Kilocode provider support. Diving into the provider implementation and the usage manager updates now!

Copy link
Contributor

@mirrobot-agent mirrobot-agent bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall Assessment

This PR successfully adds support for the Kilocode provider and introduces some valuable general improvements, such as the Windows console encoding fix and an optimized waiting mechanism in the UsageManager. The implementation follows the existing patterns for providers.

Architectural Feedback

The addition of the kilocode provider is straightforward. However, there are a few points regarding configuration consistency and concurrency efficiency that should be addressed to ensure the proxy remains robust under high load.

Key Suggestions

  • Configuration Consistency: Ensure kilocode_provider.py honors the KILOCODE_API_BASE environment variable.
  • Concurrency Optimization: Refine the waiting logic in UsageManager to avoid unnecessary delays when multiple keys are busy.
  • Request Path Consistency: Verify the /v1 suffix for Kilocode endpoints to ensure full OpenRouter compatibility as described.
  • Incomplete Fix: Apply the nvidia provider routing fix to the streaming request path as well.

Nitpicks and Minor Points

  • The UsageManager wait logic is duplicated; refactoring this into a helper would improve maintainability.

Questions for the Author

  • Does Kilocode definitely NOT require /v1 in the base URL for the models and chat/completions endpoints? The PR description and code seem slightly inconsistent on this point.

This review was generated by an AI assistant.

"""
try:
response = await client.get(
"https://kilocode.ai/api/openrouter/models",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This URL is hardcoded and ignores the KILOCODE_API_BASE environment variable defined in provider_config.py. Consider using the environment variable if available, falling back to this URL as the default.

Comment on lines +2399 to +2407
wait_condition = self.key_states[available_keys[0]]["condition"]
try:
async with wait_condition:
remaining_budget = deadline - time.time()
if remaining_budget <= 0:
break
await asyncio.wait_for(
wait_condition.wait(), timeout=min(0.5, remaining_budget)
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Waiting on only the first available key's condition variable is suboptimal. If any other key in available_keys is released, this task will not wake up until the 0.5s timeout. While an improvement over a hard sleep, it could still lead to unnecessary delays under high concurrency. A better approach would be to wait on a shared condition variable for the model or provider.

await asyncio.sleep(1)
# Pick any available key to wait on (they're all locked)
if available_keys:
wait_condition = self.key_states[available_keys[0]]["condition"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This block of logic for waiting on keys is duplicated from earlier in the file (around line 2399). Consider refactoring it into a helper method.

"cohere": "https://api.cohere.ai/v1",
"bedrock": "https://bedrock-runtime.us-east-1.amazonaws.com",
"openrouter": "https://openrouter.ai/api/v1",
"kilocode": "https://kilocode.ai/api/openrouter",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description mentions https://kilocode.ai/api/openrouter/v1/chat/completions as the endpoint, but here /v1 is missing. Please verify if Kilocode requires /v1 in the path for OpenRouter compatibility.

Comment on lines +1667 to +1671
# If the provider is 'nvidia', set the custom provider to 'nvidia_nim'
# and strip the prefix from the model name for LiteLLM.
if provider == "nvidia":
litellm_kwargs["custom_llm_provider"] = "nvidia_nim"
litellm_kwargs["model"] = model.split("/", 1)[1]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This nvidia provider fix (setting custom_llm_provider and stripping the model prefix) should also be applied to _make_streaming_request (around line 2445) to ensure consistent behavior across both request types.

ShmidtS and others added 20 commits February 8, 2026 16:53
…cooldowns

- cooldown_manager: rename provider parameter to credential, update docstrings
- client: replace all start_cooldown(provider) calls with start_cooldown(current_cred)
- translator: use empty string instead of None for missing content

Allows other credentials from the same provider to be used while one is cooling down.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ders

Add _apply_provider_headers() method to strip problematic authorization
headers from client requests and apply correct provider-specific headers
from environment variables ({PROVIDER}_API_HEADERS).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…very

- Expand problematic headers list with Anthropic-specific headers
  (anthropic-version, anthropic-beta, x-anthropic-*, etc.)
- Refactor header removal with case-insensitive prefix matching
- Remove raw_request passing to LiteLLM to prevent header leakage
- Treat all errors as recoverable via credential rotation
  (BadRequestError/InvalidRequestError may come from provider issues)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…r handling

- Add _normalize_model_string() and _extract_provider_from_model() helpers
- Replace direct model.split("/") calls with safe extraction methods
- Improve 400 error classification: upstream transient errors now trigger rotation (503)
- Keep strict fail-fast for policy/safety violations (invalid_request)
- Normalize model strings in request_sanitizer for reliable comparisons
- Reduce log noise: change info→debug for key release notifications

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ider support

- Add token_calculator.py module for context window-aware max_tokens adjustment
- Integrate ModelRegistry lookup to prevent context overflow errors
- Add Kilocode provider with default API base and unsupported param filtering
- Extend stream_options unsupported providers list (iflow, kilocode)
- Handle BadRequestError/InvalidRequestError as transient errors for credential rotation

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Updated README to remove badge links and clarify functionality.
- Implement lazy HTTP client initialization with automatic recovery
  when client is closed, preventing "Client is closed" errors
- Add early error detection in streaming to catch provider errors
  sooner with chunk index tracking for better diagnostics
- Add get_retry_backoff() with error-type-specific backoff strategies:
  api_connection gets aggressive retry, server_error exponential backoff
- Reduce default rate limit cooldown from 60s to 10s
- Add parse_quota_error() in KilocodeProvider for OpenRouter format
  with retry_after extraction support

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… handling

Performance optimizations:
- Add streaming_fast.py with orjson for 3-5x faster JSON parsing
- Add http_client_pool.py with connection warmup and HTTP/2 support
- Add credential_weight_cache.py for optimized credential selection
- Add batched_persistence.py for background batched disk writes
- Add async_locks.py with ReadWriteLock, AsyncSemaphore, RateLimitedLock

Context overflow handling:
- Add ContextOverflowError for pre-emptive request rejection
- Update sanitize_request_payload to return (payload, should_reject)
- Reject requests when input tokens exceed model context window

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ety buffers

- Increase default safety buffer from 100 to 1000 tokens
- Add MAX_INPUT_RATIO (0.90) to reject oversized inputs
- Add provider-specific safety buffers (kilocode: 2000, openrouter: 1500, etc.)
- Add get_provider_safety_buffer() for auto-detection
- calculate_max_tokens now returns None when input exceeds context
- adjust_max_tokens_in_payload returns (payload, should_reject) tuple

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ompatibility

Add precomputed_input_tokens parameter to streaming wrappers to handle
providers that don't return usage in stream (e.g., Kilocode without
stream_options). This ensures Claude Code's context management works
correctly across all providers.

- Add precomputed_input_tokens param to streaming.py and streaming_fast.py
- Pre-compute input tokens in client.py before streaming starts
- Provider-returned usage still takes precedence when available
- Increase MAX_INPUT_RATIO from 0.90 to 1.0 for full context utilization

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Major performance optimizations for high-throughput scenarios:

- HttpClientPool integration: Replace per-instance httpx.AsyncClient with
  singleton pool supporting HTTP/2, connection pre-warming, and automatic
  recovery. Separate pools for streaming/non-streaming requests.

- Connection pre-warming: Auto-detect provider API endpoints and establish
  TCP+TLS connections at startup, reducing first-request latency by ~10x.

- Sharded locks: Add per-provider lock infrastructure in UsageManager for
  parallel access to different providers' data, reducing lock contention.

- Credential priority caching: Cache priority/tier lookups per provider,
  eliminating repeated provider_plugin calls during request processing.

- Fast path in acquire_key: Single-credential case bypasses complex priority
  and fair cycle logic for immediate lock acquisition.

- Batch persistence option: Integrate UsagePersistenceManager for debounced
  disk writes (enabled via USAGE_BATCH_PERSISTENCE=true env var).

Configurable via environment variables:
- HTTP_MAX_KEEPALIVE (default: 50)
- HTTP_MAX_CONNECTIONS (default: 200)
- USAGE_BATCH_PERSISTENCE (default: false)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add _reset_litellm_client_cache() method to clear LiteLLM's internal
async HTTP client cache when encountering "Cannot send a request, as
the client has been closed" errors. This ensures fresh client creation
on retry attempts.

Also add api_bases property to ProviderConfig for read-only access to
configured API bases.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Previously, the HTTP client warmup used hardcoded provider URLs that
could differ from the actual API_BASE configured via environment
variables. This caused connection warmup to hit wrong endpoints
(e.g., api.kilocode.ai instead of kilocode.ai/api/openrouter).

Now the warmup endpoint resolution follows this priority:
1. Custom API_BASE from environment variables
2. Hardcoded defaults as fallback

Also fixed _provider_endpoints cache to store resolved endpoints
correctly based on configuration.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Some providers (e.g., Z.AI) return finish_reason values like 'error',
'unknown', 'abort' that don't match LiteLLM's Pydantic schema, causing
ValidationError. Add monkey-patch to normalize these values before
validation runs.

Also update .gitignore for tool directories and simplify start_proxy.bat
by removing venv activation step.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Implement provider resilience mechanisms to prevent cascade exhaustion
during IP-level throttling:

- Add ProviderCircuitBreaker with CLOSED/OPEN/HALF_OPEN states
- Add IPThrottleDetector to correlate 429 errors across credentials
- Integrate circuit breaker into client for fail-fast on blocked providers
- Add configurable failure thresholds and recovery timeouts
- Support custom recovery duration from retry-after headers

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Increase HTTP keepalive from 30s to 60s for better connection reuse
- Make warmup async to not block initialization
- Add memory limit (100 records) to IP throttle detector
- Minor formatting and code improvements across modules

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Integrate OpenCode as a new provider with:
- OpencodeProvider class with auth headers (HTTP-Referer, X-Title)
- Quota/rate-limit error parsing (codes 1113, 429)
- Provider-specific header injection in RotatingClient
- Configuration in provider_config.py

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@Mirrowel
Copy link
Owner

Mirrowel commented Mar 3, 2026

@greptile

@Mirrowel
Copy link
Owner

Mirrowel commented Mar 3, 2026

@ShmidtS I didn't notice before, but the base branch here is main - it should be Dev branch instead. Main is never directly modified and is outdated

@greptile-apps
Copy link

greptile-apps bot commented Mar 3, 2026

Greptile Summary

This PR introduces support for the Kilocode LLM provider, which acts as an OpenRouter-compatible API gateway. The implementation spans a new provider class (kilocode_provider.py), endpoint registration in provider_urls.py, provider config registration, request sanitization for Kilocode's free-model parameter restrictions, and backoff/throttle-detection tuning in error_handler.py. Minor refactoring is also applied to client.py, usage_manager.py, and main.py for readability.

Key changes:

  • KilocodeProvider implements ProviderInterface with get_models and parse_quota_error logic that understands the OpenRouter error envelope format
  • KILOCODE_UNSUPPORTED_PARAMS strips parameters that Kilocode's free models reject (e.g. frequency_penalty, top_p, response_format)
  • Kilocode is added to STREAM_OPTIONS_UNSUPPORTED_PROVIDERS in client.py so stream_options is never forwarded
  • Kilocode is added to PROXY_PROVIDERS and PROVIDER_BACKOFF_CONFIGS for appropriate retry behaviour
  • provider_config.py registers a default KILOCODE_API_BASE so the provider auto-routes as a custom OpenAI-compatible endpoint even without an explicit env var

Critical issues identified:

  • kilocode_provider.py line 31 uses wrong domain (kilo.ai instead of kilocode.ai) for the get_models call, breaking model discovery
  • get_models hardcodes the base URL instead of reading KILOCODE_API_BASE, making it inconsistent with how OpencodeProvider handles its env var
  • Only httpx.RequestError is caught in get_models; httpx.HTTPStatusError (raised by raise_for_status()) is a sibling exception and will propagate uncaught
  • provider_urls.py falls through to the f"{base_url}/v1/{action}" fallback, producing https://kilocode.ai/api/openrouter/v1/chat/completions instead of the documented /chat/completions path
  • main.py carries an unused from glob import glob import
  • error_handler.py docstring example uses wrong domain (kilo.ai)

Confidence Score: 1/5

  • Not safe to merge — multiple critical bugs prevent Kilocode provider from functioning: wrong domain blocks model discovery, incorrect URL path produces 404s for all requests, and missing error handling could expose HTTPStatusError.
  • Three critical bugs have been verified that will completely break Kilocode provider functionality: (1) the hardcoded URL uses wrong domain kilo.ai preventing model fetching, (2) the URL construction in provider_urls.py appends an extra /v1/ segment so every chat completion request hits a non-existent endpoint, and (3) unhandled HTTPStatusError exceptions will surface as uncaught errors. These are not edge cases but core functionality breaks.
  • src/rotator_library/providers/kilocode_provider.py (wrong domain, hardcoded URL, missing error handling), src/proxy_app/provider_urls.py (incorrect URL fallback for kilocode), src/rotator_library/error_handler.py (docstring domain typo), src/proxy_app/main.py (unused import)

Sequence Diagram

sequenceDiagram
    participant Client as API Client
    participant Proxy as proxy_app/main.py
    participant Sanitizer as request_sanitizer.py
    participant RotClient as RotatingClient (client.py)
    participant ProvConfig as provider_config.py
    participant LiteLLM as LiteLLM
    participant Kilocode as kilocode.ai API

    Client->>Proxy: POST /v1/chat/completions\n(model: kilocode/minimax/minimax-m2.1:free)
    Proxy->>Sanitizer: sanitize_request_payload(payload, model)
    Note over Sanitizer: Strips unsupported params for<br/>:free and z-ai/ models<br/>(top_p, frequency_penalty, etc.)
    Sanitizer-->>Proxy: cleaned payload

    Proxy->>RotClient: acompletion(**kwargs)
    Note over RotClient: Detects kilocode provider,<br/>removes stream_options
    RotClient->>ProvConfig: convert_for_litellm(model=kilocode/...)
    Note over ProvConfig: kilocode not in KNOWN_PROVIDERS<br/>→ custom provider path<br/>model = openai/minimax/minimax-m2.1:free<br/>api_base = https://kilocode.ai/api/openrouter<br/>custom_llm_provider = openai
    ProvConfig-->>RotClient: modified kwargs

    RotClient->>LiteLLM: acompletion(model=openai/minimax/..., api_base=...)
    LiteLLM->>Kilocode: POST /api/openrouter/chat/completions\nAuthorization: Bearer {key}
    Kilocode-->>LiteLLM: response / stream
    LiteLLM-->>RotClient: ModelResponse
    RotClient-->>Proxy: response
    Proxy-->>Client: SSE stream / JSON response
Loading

Comments Outside Diff (1)

  1. src/proxy_app/provider_urls.py, line 33-78 (link)

    Incorrect URL path construction for Kilocode requests

    The get_provider_endpoint function constructs URLs by checking if base_url ends with "/v1" or "/v1/openai". Since Kilocode's base URL ("https://kilocode.ai/api/openrouter") matches neither suffix, the function falls through to the generic fallback (line 78):

    return f"{base_url}/v1/{action}"

    This produces https://kilocode.ai/api/openrouter/v1/chat/completions, but the correct endpoint (per provider_config.py and Kilocode documentation) is https://kilocode.ai/api/openrouter/chat/completions — no /v1/ segment.

    All Kilocode requests will hit a non-existent 404 endpoint. Either:

    1. Add a kilocode-specific case before the fallback, or
    2. Change the entry to "kilocode": "https://kilocode.ai/api/openrouter/v1" if the /v1 path is actually supported

Last reviewed commit: 82bb900

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

31 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

"""
try:
response = await client.get(
"https://kilo.ai/api/openrouter/models",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrong domain - should be kilocode.ai not kilo.ai

Suggested change
"https://kilo.ai/api/openrouter/models",
"https://kilocode.ai/api/openrouter/models",
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/rotator_library/providers/kilocode_provider.py
Line: 34

Comment:
Wrong domain - should be `kilocode.ai` not `kilo.ai`

```suggestion
                "https://kilocode.ai/api/openrouter/models",
```

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +28 to +43
async def get_models(self, api_key: str, client: httpx.AsyncClient) -> List[str]:
"""
Fetches the list of available models from the Kilocode API.
"""
try:
response = await client.get(
"https://kilo.ai/api/openrouter/models",
headers={"Authorization": f"Bearer {api_key}"},
)
response.raise_for_status()
return [
f"kilocode/{model['id']}" for model in response.json().get("data", [])
]
except httpx.RequestError as e:
lib_logger.error(f"Failed to fetch Kilocode models: {e}")
return []
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent with OpencodeProvider which uses configurable OPENCODE_API_BASE. Consider using KILOCODE_API_BASE env var here for consistency:

import os
api_base = os.getenv("KILOCODE_API_BASE", "https://kilocode.ai/api/openrouter").rstrip("/")
response = await client.get(
    f"{api_base}/models",
    headers={"Authorization": f"Bearer {api_key}"},
)
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/rotator_library/providers/kilocode_provider.py
Line: 28-43

Comment:
Inconsistent with `OpencodeProvider` which uses configurable `OPENCODE_API_BASE`. Consider using `KILOCODE_API_BASE` env var here for consistency:

```python
import os
api_base = os.getenv("KILOCODE_API_BASE", "https://kilocode.ai/api/openrouter").rstrip("/")
response = await client.get(
    f"{api_base}/models",
    headers={"Authorization": f"Bearer {api_key}"},
)
```

How can I resolve this? If you propose a fix, please make it concise.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

@Mirrowel Mirrowel added enhancement New feature or request Agent Monitored Monitored for AI Agent to review PR's and commits labels Mar 3, 2026
Copy link
Contributor

@mirrobot-agent mirrobot-agent bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up Review: Critical Issues Still Unaddressed

This follow-up review focuses on previously identified issues that remain unfixed in the current commits.

Critical Issues (Blocking)

  1. Kilocode Provider Domain Bug (kilocode_provider.py:34)
    • Status: Still unfixed from previous review
    • Issue: Uses https://kilo.ai/api/openrouter/models instead of https://kilocode.ai/api/openrouter/models
    • Impact: The models endpoint will fail completely with the wrong domain

High Priority Issues

  1. Missing NVIDIA Fix in Streaming Path (client.py)

    • Status: Still unfixed from previous review by mirrobot-agent
    • Issue: The nvidia provider fix (setting custom_llm_provider and stripping model prefix) is applied in the non-streaming path (line 2242-2244) but missing from the streaming path (~line 2825)
    • Impact: Inconsistent behavior between streaming and non-streaming requests for nvidia provider
  2. Configuration Inconsistency (kilocode_provider.py)

    • OpencodeProvider supports OPENCODE_API_BASE environment variable for configurable API base
    • KilocodeProvider hardcodes the URL (with the wrong domain), lacking KILOCODE_API_BASE support

Medium Priority Issues

  1. Duplicated Logic in UsageManager (usage_manager.py)

    • The waiting logic block (lines 2501-2513) is duplicated at lines 2727-2740
    • Previously suggested refactoring into a helper method remains unaddressed
  2. Suboptimal Condition Variable Waiting (usage_manager.py)

    • Waiting only on available_keys[0] means tasks won't wake up when other keys are released until the 0.5s timeout expires

Previously Fixed

  • Windows console encoding fix in main.py - appears to be in place

Summary

The critical domain bug in the Kilocode provider must be fixed before merging - it will cause immediate failures. The missing NVIDIA streaming fix should also be addressed for consistency.

This follow-up review was generated by an AI assistant.

"""
try:
response = await client.get(
"https://kilo.ai/api/openrouter/models",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Critical Bug (Still Unaddressed): Wrong domain - uses kilo.ai instead of kilocode.ai. This was flagged in the previous review by greptile-apps but remains unfixed. The models endpoint will fail with this incorrect domain.

Suggested fix:

Suggested change
"https://kilo.ai/api/openrouter/models",
"https://kilocode.ai/api/openrouter/models",

- And other provider/model combinations
"""

async def get_models(self, api_key: str, client: httpx.AsyncClient) -> List[str]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistency: Unlike OpencodeProvider which uses os.getenv("OPENCODE_API_BASE", ...), this provider hardcodes the URL (with the wrong domain). Consider adding KILOCODE_API_BASE environment variable support for consistency and flexibility.

api_base = os.getenv("KILOCODE_API_BASE", "https://kilocode.ai/api/openrouter").rstrip("/")
response = await client.get(
    f"{api_base}/models",
    ...
)

- Add consecutive quota failure tracking per credential in RotatingClient
- Add ClassifiedError quota_value and quota_id fields for Gemini errors
- Add Inception Labs error classification and backoff configuration
- Add AsyncResilientStateWriter for atomic async state persistence
- Add strip_provider option to endpoint configuration
- Improve provider_config exception logging for debugging

Includes refactoring in litellm_providers.py and formatting cleanup.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Contributor

@mirrobot-agent mirrobot-agent bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up Review

I've analyzed the incremental changes in this PR. While the diff shows some good improvements (quota failure tracking, Inception Labs support, better error classification), a previously identified high-priority issue remains unaddressed:

Unaddressed Issue from Previous Reviews

Missing NVIDIA Fix in Streaming Path

  • Location: src/rotator_library/client.py around line 3346-3350 (streaming request handling)
  • Issue: The nvidia provider fix (setting custom_llm_provider to nvidia_nim and stripping the model prefix) exists in the non-streaming path (lines 2320-2324) but is missing from the streaming path. Only qwen_code handling is present.
  • Impact: Inconsistent behavior between streaming and non-streaming requests for the nvidia provider
  • Required Fix: Add the following after line 3350 in the streaming path:
# If the provider is 'nvidia', set the custom provider to 'nvidia_nim'
# and strip the prefix from the model name for LiteLLM.
if provider == "nvidia":
    litellm_kwargs["custom_llm_provider"] = "nvidia_nim"
    litellm_kwargs["model"] = model.split("/", 1)[1]

# Inception Labs also requires model name without prefix
if provider == "inception":
    litellm_kwargs["model"] = model.rsplit("/", 1)[1]

New Changes Assessment

The incremental diff shows valuable additions:

  • Quota failure tracking with consecutive failure counters and automatic rotation
  • Inception Labs provider support with proper error handling
  • Enhanced error classification for Google/Gemini API errors with quota details
  • Improved streaming error handling for mid-stream failures

Previously Identified Critical Issues

Note: The critical Kilocode provider domain bug (kilo.ai vs kilocode.ai) and configuration inconsistency (missing KILOCODE_API_BASE support) were identified in earlier reviews. While not part of this incremental diff, these must still be addressed before merging.

This follow-up review was generated by an AI assistant.

@coderabbitai
Copy link

coderabbitai bot commented Mar 5, 2026

📝 Walkthrough

Summary by CodeRabbit

  • New Features

    • Kilocode, Trybons and Opencode provider integrations; new provider discovery
    • Batched background persistence, HTTP client pool, credential weight cache, and async locking primitives
    • Per-credential cooldowns and consolidated token manager support
  • Improvements

    • Faster Anthropic-compatible streaming and streaming normalization; improved token/max_tokens calculation and request sanitization
    • Circuit breaker and IP-throttle detection for more resilient provider handling
  • Documentation

    • README badges simplified; CONTRIBUTING removed
  • Chores

    • Updated .gitignore and added Windows start script

Walkthrough

Adds multiple new subsystems (async locks, HTTP client pool, persistence, token calculator, circuit breaker, IP-throttle detection, credential weight cache), new providers, refactors token managers, expands error/throttle handling, adjusts logging and config defaults, and updates repository docs/.gitignore.

Changes

Cohort / File(s) Summary
Repo config & docs
\.gitignore, README.md, CONTRIBUTING.md, start_proxy.bat
.gitignore updated (removed start_proxy.bat, added agentic directories); README badges trimmed; CONTRIBUTING.md deleted; start_proxy.bat added.
Root exports
src/rotator_library/__init__.py
Expanded lazy-loaded public exports via __getattr__ and extended __all__ to expose many new modules/classes.
Config defaults & exports
src/rotator_library/config/__init__.py, src/rotator_library/config/defaults.py
Added circuit-breaker and IP-throttle defaults, Kilocode backoff defaults, provider overrides, and is_cooldown_disabled(env) helper; new constants exported.
Async primitives
src/rotator_library/async_locks.py
New async locking utilities: ReadWriteLock, AsyncSemaphore, RateLimitedLock, LazyLock, lock-pool/striping helpers.
Batched persistence & resilient I/O
src/rotator_library/batched_persistence.py, src/rotator_library/utils/resilient_io.py
Added BatchedPersistence and UsagePersistenceManager for background batched writes; added AsyncResilientStateWriter for atomic async writes.
Credential weight cache
src/rotator_library/credential_weight_cache.py
New CredentialWeightCache (TTL, invalidation, warmup) with singleton accessor get_weight_cache.
Cooldown & sharded locks
src/rotator_library/cooldown_manager.py
CooldownManager reworked to per-credential cooldowns with per-provider lock sharding; method parameter renamed to credential.
Circuit breaker
src/rotator_library/circuit_breaker.py
New async ProviderCircuitBreaker with per-provider circuit states, thresholds, recoveries and query/reset APIs.
IP throttle detection
src/rotator_library/ip_throttle_detector.py
New IPThrottleDetector that buffers 429s, hashes error bodies, correlates across credentials, and returns ThrottleAssessment. Singleton accessors added.
Error handling & backoff
src/rotator_library/error_handler.py
Major expansion: unified 429 handling, ThrottleAction/types, streaming error classifiers (is_provider_abort, classify_stream_error), provider backoff configs, AllProviders helper, extended ClassifiedError, and get_retry_backoff.
HTTP client pool & timeouts
src/rotator_library/http_client_pool.py, src/rotator_library/timeout_config.py
Added HttpClientPool singleton (streaming vs non-streaming clients, warmup, health/recover, stats); reduced pool timeout default from 60s to 15s.
Token calculation
src/rotator_library/token_calculator.py
New token-calculation utilities: context windows, token counting, calculate_max_tokens, and adjust_max_tokens_in_payload.
Request sanitization
src/rotator_library/request_sanitizer.py
Sanitizer enhanced for Kilocode unsupported params; signature changed to return (payload, should_reject).
Anthropic streaming & translator
src/rotator_library/anthropic_compat/streaming.py, src/rotator_library/anthropic_compat/streaming_fast.py, src/rotator_library/anthropic_compat/translator.py, src/rotator_library/anthropic_compat/__init__.py
Added high-performance streaming_fast (exposed as anthropic_streaming_wrapper), orjson fallback, precomputed_input_tokens param; translator updated to use empty-string content instead of None.
Providers: new & updates
src/rotator_library/providers/*, src/proxy_app/provider_urls.py
Added KilocodeProvider, OpencodeProvider, TrybonsProvider; added kilocode/trybons to provider URL map; many providers updated (thinking/reasoning support for iFlow, logging propagation removals, NVIDIA updates).
Provider metadata & config
src/rotator_library/litellm_providers.py, src/rotator_library/provider_config.py
Normalized/expanded provider metadata, added defaults for custom providers, and exposed api_bases, is_custom_provider, get_api_base on ProviderConfig.
Background & token manager refactor
src/rotator_library/background_refresher.py, src/rotator_library/providers/base_token_manager.py, src/rotator_library/providers/{google_oauth_base,iflow_auth_base,qwen_auth_base}.py
Parallelized provider credential initialization; introduced BaseTokenManager and made Google/Qwen/IFlow auth classes inherit and delegate to it.
Utilities & patches
src/rotator_library/utils/__init__.py, src/rotator_library/utils/patch_litellm_finish_reason.py, src/rotator_library/utils/suppress_litellm_warnings.py
Exposed AsyncResilientStateWriter, added patch to normalize LiteLLM finish_reason, and added suppress_litellm_prints() context manager.
Logging cleanup
many src/rotator_library/providers/*, src/rotator_library/model_definitions.py
Removed explicit lib_logger.propagate = False and conditional NullHandler insertions across multiple modules, restoring default logger propagation.
Misc & small fixes
assorted files
Minor refactors: parallelized credential file loads, qwen chunk emission adjustment, timeout tweaks, README badge trimming, and other small edits.

Sequence Diagram(s)

sequenceDiagram
    participant Client as Client
    participant Handler as ErrorHandler
    participant Detector as IPThrottleDetector
    participant CB as ProviderCircuitBreaker
    participant Cred as CooldownManager

    Client->>Handler: report 429(provider, credential, body)
    Handler->>Detector: record_429(provider, credential, body, retry_after)
    Detector-->>Handler: ThrottleAssessment(scope, confidence, cooldown)
    Handler->>CB: record_ip_throttle(...) / open_immediately(...)
    Handler->>Cred: start_cooldown(credential, duration)
    Handler-->>Client: ThrottleAction(result)
Loading
sequenceDiagram
    participant App as Application
    participant Pool as HttpClientPool
    participant Warm as WarmupTask
    participant ClientHTTP as httpx.AsyncClient

    App->>Pool: get_http_pool()/initialize(warmup_hosts)
    Pool->>Pool: create streaming & non-streaming clients
    Pool->>Warm: warmup_connections(hosts)
    Warm->>ClientHTTP: probe host
    ClientHTTP-->>Warm: response
    Warm-->>Pool: warmup complete
    App->>Pool: get_client_async(streaming=True)
    Pool-->>App: AsyncClient
Loading
sequenceDiagram
    participant SSE as OpenAI SSE stream
    participant Wrapper as anthropic_streaming_wrapper_fast
    participant Proc as StateProcessor
    participant Out as Anthropic SSE output

    SSE->>Wrapper: stream chunk
    Wrapper->>Proc: parse chunk (delta/usage/tool)
    Proc->>Out: emit Anthropic SSE events (message_start / content deltas / tool_use / final usage)
    Out-->>Client: Anthropic SSE messages
Loading

Estimated Code Review Effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Poem

🐇 I hopped through circuits, locks, and streams,

cached weights and tended async dreams,
I sniffed for 429s across the net,
wrote state in batches, warmed connections yet,
kilocode and opencode joined the team — hop, hooray!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Description check ⚠️ Warning The description covers the main changes (new KilocodeProvider class, modified provider_urls.py and provider_config.py), but lacks Testing Done section and checklist items which are required by the template. Complete the Testing Done section with specific test results and check all required items in the checklist, including license headers, documentation updates, and issue references.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Add Kilocode provider support' directly and clearly describes the main objective of the PR, which is to add a new provider implementation.
Docstring Coverage ✅ Passed Docstring coverage is 86.52% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Comment @coderabbitai help to get the list of available commands and usage tips.

"""
try:
response = await client.get(
"https://kilo.ai/api/openrouter/models",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrong domain in endpoint URL

Line 31 has https://kilo.ai/api/openrouter/models, but the correct domain is kilocode.ai (not kilo.ai). This will cause the models endpoint to fail.

Suggested change
"https://kilo.ai/api/openrouter/models",
response = await client.get(
"https://kilocode.ai/api/openrouter/models",
headers={"Authorization": f"Bearer {api_key}"},
)

Comment on lines +25 to +33
async def get_models(self, api_key: str, client: httpx.AsyncClient) -> List[str]:
"""
Fetches the list of available models from the Kilocode API.
"""
try:
response = await client.get(
"https://kilo.ai/api/openrouter/models",
headers={"Authorization": f"Bearer {api_key}"},
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hardcoded URL ignores KILOCODE_API_BASE environment variable

The get_models method hardcodes the base URL instead of respecting the KILOCODE_API_BASE environment variable that is already configured in provider_config.py. For consistency with OpencodeProvider and to allow users to override the endpoint, use the env var with a sensible default:

Suggested change
async def get_models(self, api_key: str, client: httpx.AsyncClient) -> List[str]:
"""
Fetches the list of available models from the Kilocode API.
"""
try:
response = await client.get(
"https://kilo.ai/api/openrouter/models",
headers={"Authorization": f"Bearer {api_key}"},
)
async def get_models(self, api_key: str, client: httpx.AsyncClient) -> List[str]:
"""
Fetches the list of available models from the Kilocode API.
"""
import os
api_base = os.getenv("KILOCODE_API_BASE", "https://kilocode.ai/api/openrouter").rstrip("/")
try:
response = await client.get(
f"{api_base}/models",
headers={"Authorization": f"Bearer {api_key}"},
)

Comment on lines +29 to +40
try:
response = await client.get(
"https://kilo.ai/api/openrouter/models",
headers={"Authorization": f"Bearer {api_key}"},
)
response.raise_for_status()
return [
f"kilocode/{model['id']}" for model in response.json().get("data", [])
]
except httpx.RequestError as e:
lib_logger.error(f"Failed to fetch Kilocode models: {e}")
return []
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uncaught httpx.HTTPStatusError exception

response.raise_for_status() (line 34) raises httpx.HTTPStatusError for non-2xx HTTP responses (401, 429, 500, etc.). However, only httpx.RequestError is caught on line 38. HTTPStatusError is not a subclass of RequestError — it descends from HTTPError via a different branch. Any HTTP error response will propagate as an unhandled exception instead of being caught and returning an empty list.

Suggested change
try:
response = await client.get(
"https://kilo.ai/api/openrouter/models",
headers={"Authorization": f"Bearer {api_key}"},
)
response.raise_for_status()
return [
f"kilocode/{model['id']}" for model in response.json().get("data", [])
]
except httpx.RequestError as e:
lib_logger.error(f"Failed to fetch Kilocode models: {e}")
return []
try:
response = await client.get(
"https://kilocode.ai/api/openrouter/models",
headers={"Authorization": f"Bearer {api_key}"},
)
response.raise_for_status()
return [
f"kilocode/{model['id']}" for model in response.json().get("data", [])
]
except (httpx.RequestError, httpx.HTTPStatusError) as e:
lib_logger.error(f"Failed to fetch Kilocode models: {e}")
return []

- Extract common token refresh queue state into BaseTokenManager
  (shared by GoogleOAuthBase, IFlowAuthBase, QwenAuthBase)
- Extract _prepare_request_context method in RotatingClient to
  eliminate code duplication between streaming and non-streaming paths
- Switch anthropic_compat to streaming_fast module
- Mark legacy streaming.py as deprecated
- Remove CONTRIBUTING.md (documentation cleanup)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Comment on lines +30 to +32
response = await client.get(
"https://kilo.ai/api/openrouter/models",
headers={"Authorization": f"Bearer {api_key}"},
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrong domain in models endpoint

The domain is kilo.ai but should be kilocode.ai. This will cause requests to the models endpoint to fail.

Suggested change
response = await client.get(
"https://kilo.ai/api/openrouter/models",
headers={"Authorization": f"Bearer {api_key}"},
response = await client.get(
"https://kilocode.ai/api/openrouter/models",

Comment on lines +25 to +35
async def get_models(self, api_key: str, client: httpx.AsyncClient) -> List[str]:
"""
Fetches the list of available models from the Kilocode API.
"""
try:
response = await client.get(
"https://kilo.ai/api/openrouter/models",
headers={"Authorization": f"Bearer {api_key}"},
)
response.raise_for_status()
return [
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hardcoded URL ignores KILOCODE_API_BASE environment variable

The get_models method hardcodes the base URL instead of respecting the KILOCODE_API_BASE environment variable that is configured in provider_config.py. For consistency with other providers like OpencodeProvider, use the env var with a sensible default:

Suggested change
async def get_models(self, api_key: str, client: httpx.AsyncClient) -> List[str]:
"""
Fetches the list of available models from the Kilocode API.
"""
try:
response = await client.get(
"https://kilo.ai/api/openrouter/models",
headers={"Authorization": f"Bearer {api_key}"},
)
response.raise_for_status()
return [
async def get_models(self, api_key: str, client: httpx.AsyncClient) -> List[str]:
"""
Fetches the list of available models from the Kilocode API.
"""
import os
api_base = os.getenv("KILOCODE_API_BASE", "https://kilocode.ai/api/openrouter").rstrip("/")
try:
response = await client.get(
f"{api_base}/models",
headers={"Authorization": f"Bearer {api_key}"},
)

Comment on lines +36 to +38
f"kilocode/{model['id']}" for model in response.json().get("data", [])
]
except httpx.RequestError as e:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uncaught httpx.HTTPStatusError exception

response.raise_for_status() raises httpx.HTTPStatusError for non-2xx HTTP responses (401, 429, 500, etc.), but only httpx.RequestError is caught on line 38. HTTPStatusError is not a subclass of RequestError — they descend from different branches of the exception hierarchy. Any HTTP error response will propagate as an unhandled exception instead of being caught gracefully.

Suggested change
f"kilocode/{model['id']}" for model in response.json().get("data", [])
]
except httpx.RequestError as e:
except (httpx.RequestError, httpx.HTTPStatusError) as e:
lib_logger.error(f"Failed to fetch Kilocode models: {e}")
return []

Comment on lines +71 to +72
from dotenv import load_dotenv
from glob import glob
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused glob import

from glob import glob is imported here but never called as a function. All .env file discovery now uses _root_dir.glob("*.env"), which is Path.glob() — a method on the pathlib.Path object, not the glob module function. Remove the unused stdlib import.

Suggested change
from dotenv import load_dotenv
from glob import glob
from dotenv import load_dotenv

model_without_prefix = normalized_model.split("/", 1)[1] if "/" in normalized_model else normalized_model

# Free models have stricter parameter requirements
if ":free" in model_without_prefix or model_without_prefix.startswith("z-ai/"):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overly broad z-ai/ parameter stripping

The comment says "Free models have stricter parameter requirements," but the startswith("z-ai/") branch catches all z-ai/ models — including any future non-free z-ai models. This means parameters like top_p, frequency_penalty, and stop would be silently dropped for a paid kilocode/z-ai/some-model even if that model fully supports them.

Since all currently listed z-ai models already contain :free (e.g. z-ai/glm-4.7:free), the startswith("z-ai/") guard is redundant in the happy path but will silently misbehave when a non-free z-ai variant is introduced. Consider removing the overly broad prefix check and relying solely on the :free suffix to stay consistent with the comment:

Suggested change
if ":free" in model_without_prefix or model_without_prefix.startswith("z-ai/"):
if ":free" in model_without_prefix:

Supports custom OpenAI-compatible providers via PROVIDERNAME_API_BASE env vars.

Usage:
export KILOCODE_API_BASE=https://kilo.ai/api/openrouter
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrong domain in docstring example

The example command uses kilo.ai instead of the correct kilocode.ai. Any user who copies this snippet verbatim will configure a non-working endpoint.

Suggested change
export KILOCODE_API_BASE=https://kilo.ai/api/openrouter
export KILOCODE_API_BASE=https://kilocode.ai/api/openrouter

ShmidtS and others added 4 commits March 7, 2026 00:59
Add TryBons provider (go.trybons.ai) that speaks native Anthropic
Messages API. Routes through LiteLLM's anthropic support with custom
api_base, so no message conversion is needed. Uses x-api-key auth
and balanced key rotation by default.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…king support

Extract constants for DeepSeek thinking models and unsupported Anthropic
fields. Add _sanitize_messages() to strip cache_control and collapse
list-format system messages. Add _strip_anthropic_fields() for top-level
Anthropic params (thinking, betas). Improve handle_thinking_parameter()
with per-model-family logic and better logging.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ation

Add INCEPTION_BACKOFF_BASE and INCEPTION_MAX_BACKOFF env var overrides
for inception provider backoff config. Classify inception connection
errors with 'server had an error' message as 503 with 5s retry_after
for better transient overload recovery.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…rors

Before rotating to a new credential on transient server errors during
streaming, retry the same key with exponential backoff if budget allows.
This avoids unnecessary key rotation for recoverable mid-stream failures.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Comment on lines +25 to +40
async def get_models(self, api_key: str, client: httpx.AsyncClient) -> List[str]:
"""
Fetches the list of available models from the Kilocode API.
"""
try:
response = await client.get(
"https://kilo.ai/api/openrouter/models",
headers={"Authorization": f"Bearer {api_key}"},
)
response.raise_for_status()
return [
f"kilocode/{model['id']}" for model in response.json().get("data", [])
]
except httpx.RequestError as e:
lib_logger.error(f"Failed to fetch Kilocode models: {e}")
return []
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrong domain in get_models endpoint and hardcoded URL ignores KILOCODE_API_BASE env var

Line 31 uses https://kilo.ai/api/openrouter/models (wrong domain — should be kilocode.ai), and the URL is hardcoded instead of respecting the KILOCODE_API_BASE environment variable that is already configured in provider_config.py.

This breaks model discovery and ignores user overrides. For consistency with OpencodeProvider, use:

import os
api_base = os.getenv("KILOCODE_API_BASE", "https://kilocode.ai/api/openrouter").rstrip("/")
response = await client.get(
    f"{api_base}/models",
    headers={"Authorization": f"Bearer {api_key}"},
)

Comment on lines +29 to +40
try:
response = await client.get(
"https://kilo.ai/api/openrouter/models",
headers={"Authorization": f"Bearer {api_key}"},
)
response.raise_for_status()
return [
f"kilocode/{model['id']}" for model in response.json().get("data", [])
]
except httpx.RequestError as e:
lib_logger.error(f"Failed to fetch Kilocode models: {e}")
return []
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uncaught httpx.HTTPStatusError exception

response.raise_for_status() (line 34) raises httpx.HTTPStatusError for non-2xx responses, but only httpx.RequestError is caught on line 38. HTTPStatusError is a sibling exception (both descend from HTTPError via different branches), so it will propagate uncaught for any API errors (401, 429, 500, etc.).

Add httpx.HTTPError to the except clause:

except (httpx.RequestError, httpx.HTTPError) as e:


# Load all .env files from root folder (main .env first, then any additional *.env files)
from dotenv import load_dotenv
from glob import glob
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused glob import

Line 72 imports from glob import glob, but this function is never called. Line 85 uses _root_dir.glob("*.env"), which is the Path.glob() method, not the module function. Remove the unused import to clean up the code.

Supports custom OpenAI-compatible providers via PROVIDERNAME_API_BASE env vars.

Usage:
export KILOCODE_API_BASE=https://kilo.ai/api/openrouter
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrong domain in docstring example

Line 643 shows export KILOCODE_API_BASE=https://kilo.ai/api/openrouter as an example, but the correct domain is kilocode.ai (not kilo.ai). This will mislead users who copy the snippet verbatim.

Change to:

export KILOCODE_API_BASE=https://kilocode.ai/api/openrouter

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 41

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
src/rotator_library/providers/qwen_code_provider.py (1)

257-277: ⚠️ Potential issue | 🟠 Major

Don’t bypass the normal chunk normalization when usage shares the final SSE event.

Line 257 returns before the <think>reasoning_content conversion below runs. If Qwen puts usage on the same chunk as the last content delta, the final chunk will leak raw <think> markers and skip any other normalization handled later in this method. Run the delta through the normal conversion path and attach usage to the last emitted normalized chunk instead of short-circuiting here.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/rotator_library/providers/qwen_code_provider.py` around lines 257 - 277,
The current early return inside the SSE handling block bypasses the normal
delta-to-content normalization and causes raw <think> markers to leak when
usage_data arrives with the final delta; instead of yielding a combined chunk
and returning in the block that inspects choices/usage_data (where variables
like choice, delta, finish_reason, chunk_id, chunk_created, model_id are
available), run the delta through the existing normalization/conversion path
used later in this method (the same routine that converts choice["delta"] ->
reasoning_content and applies <think> → reasoning_content transforms), emit the
normalized chunk(s) as usual, and attach the usage object to the last emitted
normalized chunk (preserving finish_reason and ids) rather than short-circuiting
and returning.
src/rotator_library/providers/nvidia_provider.py (1)

39-49: ⚠️ Potential issue | 🟠 Major

Catch HTTPStatusError in get_models().

response.raise_for_status() raises httpx.HTTPStatusError for non-2xx responses, which is not a subclass of httpx.RequestError (both are siblings under httpx.HTTPError). This means 4xx/5xx responses bypass the current exception handler and leak an unhandled exception instead of logging and returning [].

Suggested fix
-        except httpx.RequestError as e:
-            lib_logger.error(f"Failed to fetch NVIDIA models: {e}")
+        except httpx.HTTPError as e:
+            lib_logger.error("Failed to fetch NVIDIA models: %s", e)
             return []
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/rotator_library/providers/nvidia_provider.py` around lines 39 - 49, The
get_models() method currently only catches httpx.RequestError so HTTPStatusError
raised by response.raise_for_status() will escape; update the exception handling
in get_models() to also catch httpx.HTTPStatusError (or httpx.HTTPError)
alongside httpx.RequestError around the client.get()/response.raise_for_status()
call, and in that handler call lib_logger.error with the error details and
return [] to match existing behavior; reference the client.get call,
response.raise_for_status(), httpx.HTTPStatusError/httpx.HTTPError,
lib_logger.error, and the existing return [] to locate and modify the try/except
block.
♻️ Duplicate comments (1)
src/rotator_library/providers/kilocode_provider.py (1)

29-33: ⚠️ Potential issue | 🔴 Critical

Use the configured Kilocode API base here instead of hardcoding kilo.ai.

This is still pointing model discovery at the wrong host and still bypasses the KILOCODE_API_BASE override added elsewhere in the PR, so Kilocode model listing can fail or hit the wrong deployment.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/rotator_library/providers/kilocode_provider.py` around lines 29 - 33, The
model-listing call in kilocode_provider is hardcoded to
"https://kilo.ai/api/openrouter/models" and bypasses the configured base; update
the client.get call in the code that builds the request (the try block where
response = await client.get(...)) to use the configured Kilocode base variable
(e.g., KILOCODE_API_BASE or the existing kilocode_api_base config) and append
the "/api/openrouter/models" path so the request honors the KILOCODE_API_BASE
override while still passing the Authorization header with api_key.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/proxy_app/provider_urls.py`:
- Line 34: Update the TryBons base URL in the provider URLs mapping so requests
hit the /v1 path: change the mapping value for the "trybons" key from
"https://go.trybons.ai" to "https://go.trybons.ai/v1" in the provider_urls.py
mapping (and make the same change for the other TryBons entries present later in
that file), so code in trybons_provider.py that targets /v1/models routes to the
correct endpoint.

In `@src/rotator_library/anthropic_compat/streaming_fast.py`:
- Around line 369-381: The exception handler currently starts a new text content
block without closing any previously opened content/message blocks; update the
except block in streaming_fast.py (the logic around message_started,
current_block_index and the yields of
_make_content_block_start_event/_make_text_delta_event) to first close any open
content blocks and the current message turn: yield
_make_content_block_stop_event(current_block_index) if a block was started, and
yield the appropriate _make_message_delta_event("end_turn", 0, cached_tokens)
and _make_message_stop_event() as needed before emitting the fallback text block
and the error delta; ensure you only open the fallback text block after these
stops so the SSE sequence stays valid.

In `@src/rotator_library/anthropic_compat/translator.py`:
- Around line 15-35: Extract the duplicated JSON shim (_json_dumps, _json_loads,
_json_decode_error) into a single shared helper module (e.g., json_shim or
json_utils) that encapsulates the orjson import/fallback logic and exposes the
same symbols; replace the local definitions in translator.py (functions
_json_dumps, _json_loads and symbol _json_decode_error) and in streaming.py to
import these symbols from the new helper so behavior and exception semantics
remain identical; ensure the helper returns/decodes strings the same way
(orjson.dumps(...).decode('utf-8') vs json.dumps) and re-exports the correct
JSONDecodeError class so existing code that catches _json_decode_error continues
to work.

In `@src/rotator_library/async_locks.py`:
- Around line 272-304: The acquire() method must not release self._lock before
the protected block; instead, move lock acquisition into __aenter__ so the lock
is held across the protected section and release it in __aexit__. Concretely:
remove the async with self._lock: wrapper from acquire() and make acquire() only
perform rate-limit checks/updates on _last_release, _burst_remaining,
_min_interval, _burst_size and _stats (and await sleeps as needed); implement
__aenter__ to await self._lock.acquire() then call await self.acquire() (or
inline the rate-limit logic) so the lock remains held for the critical section;
implement __aexit__ to set _last_release/time and decrement/adjust burst state
as needed and call self._lock.release() (or have release() call
self._lock.release() and update _last_release) so mutual exclusion is enforced.
- Around line 186-211: The release() method must avoid incrementing _value when
directly handing off to a waiter: change logic in release() so that if _waiters
is non-empty you pop a waiter and call waiter.set() without doing self._value +=
1 and without decrementing self._stats["current_concurrent"] (the permit is
transferred), otherwise (no waiters) do self._value += 1 and decrement
self._stats["current_concurrent"] and increment self._stats["releases"]; ensure
references to release(), _value, _waiters, waiter.set(), and
self._stats["current_concurrent"] are updated accordingly so the semaphore
doesn't double-count a permit on direct handoff.
- Around line 61-89: The release paths deadlock because release_read() acquires
_read_ready then _write_ready while release_write() does the opposite; fix by
making both Conditions share the same underlying lock or by enforcing a single
global acquisition order so nested acquires never invert. Concretely, create one
asyncio.Lock (or one asyncio.Condition) and initialize both _read_ready and
_write_ready with that same lock (or replace both with a single Condition and
adjust logic), then update release_read, release_write and acquire_write to use
that shared lock/condition so notify_all() calls no longer require nested,
inversed acquisitions; ensure functions named release_read, release_write and
acquire_write reference the shared _lock/_condition.

In `@src/rotator_library/batched_persistence.py`:
- Around line 142-150: The file I/O in async methods _load_from_disk and
_write_to_disk is blocking (uses open(), read(), mkdir(), safe_write_json()), so
move those blocking operations off the event loop by running them in a thread
executor (e.g., asyncio.to_thread or loop.run_in_executor); refactor
_load_from_disk to call a synchronous helper that opens/reads/parses the file
via to_thread and assign _state afterwards, and refactor _write_to_disk to
perform mkdir and safe_write_json inside to_thread (or call a sync helper via
to_thread) so the coroutine yields while disk I/O happens.
- Around line 183-192: The current code serializes state into content via
self._serializer(self._state) but then calls safe_write_json(self._file_path,
self._state if isinstance(self._state, dict) else {"data": self._state}, ...),
which ignores custom serializers and changes the on-disk shape; change the write
so the exact serialized shape is persisted: if content is a
dict/JSON-serializable mapping, pass that dict to safe_write_json, otherwise
perform an atomic write of the raw serialized string (e.g., use existing atomic
file-write helper or extend safe_write_json to accept pre-serialized JSON) so
that what _load_from_disk() reads back matches the original serialized form from
_serializer and no artificial {"data": ...} wrapper is introduced (refer to
_serializer, _state, safe_write_json, and _load_from_disk).
- Around line 165-169: The code uses self._last_change for both freshness and
max_dirty_age, so under steady updates max_dirty_age never triggers; introduce a
separate timestamp (e.g. self._first_dirty_change) that is set only when the
buffer first becomes dirty (set it where the code currently resets _last_change
on every update — the block around the update at lines ~219-221) and do not
overwrite it on subsequent updates; then change the flush check in the block
that currently reads self._last_change (the if block around lines 165-169) to
compute age_since_first = time.time() - self._first_dirty_change and compare
max_dirty_age against age_since_first while keeping the write_interval
comparison against time.time() - self._last_change; finally, reset
self._first_dirty_change = None whenever you successfully write/clear the dirty
state in _write_to_disk().

In `@src/rotator_library/circuit_breaker.py`:
- Around line 187-205: When transitioning from OPEN to HALF_OPEN in
can_attempt(), the first probe isn't counted because you set
circuit.half_open_attempts = 0 then return True, allowing one extra probe;
change the transition to initialize circuit.half_open_attempts = 1 so the probe
that triggered the transition is counted (update both the OPEN->HALF_OPEN block
in can_attempt() where CircuitState.OPEN is handled and the equivalent block
around lines 215-224). Ensure any logging remains the same and that the rest of
the HALF_OPEN handling (checks against max_half_open_attempts) uses this
incremented count consistently.

In `@src/rotator_library/config/defaults.py`:
- Around line 203-220: The current is_cooldown_disabled function builds env var
names by uppercasing the raw provider (e.g., DISABLE_COOLDOWN_NANO-GPT) which
produces hyphens and other invalid shell identifier characters; update
is_cooldown_disabled to normalize the provider before composing the env var name
by replacing any non-alphanumeric characters with underscores and then
uppercasing (e.g., use re.sub(r'[^0-9A-Za-z]', '_', provider).upper()), add the
necessary import (re), and then call os.environ.get with the normalized key so
DISABLE_COOLDOWN_NANO_GPT and similar toggles work correctly.

In `@src/rotator_library/cooldown_manager.py`:
- Around line 29-41: The current _extract_provider in CooldownManager
incorrectly takes the first segment before '_' (truncating provider names that
themselves contain underscores); update _extract_provider to derive the provider
by splitting from the right on the last underscore (e.g., parts =
credential.rsplit("_", 1) and return parts[0] if len(parts) == 2 else
credential) so credentials like "azure_ai_key_1" yield "azure_ai", or even
better add an optional explicit provider parameter to the CooldownManager
interface and use that when provided (falling back to the new rsplit logic) to
avoid inferring provider names from credential strings.

In `@src/rotator_library/credential_weight_cache.py`:
- Around line 144-149: CachedWeights is being constructed with references to
mutable inputs (weights, credentials, usage_snapshot) in set(), so later
mutations by the caller will mutate the cached entry and break
check_usage_change() invalidation logic; fix it by snapshotting those inputs
before storing them (e.g., shallow copy dicts/lists or deep copy if nested) and
compute total_weight from the copied weights so CachedWeights receives immutable
copies of weights, credentials, and usage_snapshot instead of the original
objects.
- Around line 239-248: The cache-validity check misses credentials that were
removed: in the method using current_usage and cached.usage_snapshot (and fields
like cached.credentials, weights, and self._threshold), add a check that
iterates the keys of cached.usage_snapshot and returns True if any credential
present there is missing from current_usage (i.e., if any cached key not in
current_usage), so the cache is invalidated when credentials disappear.

In `@src/rotator_library/error_handler.py`:
- Around line 1265-1299: The code is incorrectly applying provider-wide (IP)
throttling for router/multiplexing providers (e.g., those in PROXY_PROVIDERS);
update the logic so that 429s from those providers are not escalated to
ThrottleScope.IP: in both handle_429_error() and
classify_429_with_throttle_detection() stop feeding events from providers in
PROXY_PROVIDERS into the shared/provider-level detector (or explicitly
ignore/downgrade an assessment with ThrottleScope.IP to per-credential scope),
and in the block that currently handles assessment.scope == ThrottleScope.IP
(the ThrottleAction with ThrottleActionType.PROVIDER_COOLDOWN and the
circuit_breaker.open_immediately call) add a guard that skips or converts this
provider-wide action when provider is in PROXY_PROVIDERS so router providers
aren’t blocked cluster-wide. Ensure checks reference PROXY_PROVIDERS,
_detect_ip_throttle(), ThrottleScope.IP, handle_429_error(),
classify_429_with_throttle_detection(), and the circuit_breaker usage.

In `@src/rotator_library/http_client_pool.py`:
- Around line 243-270: Ensure client recreation is serialized with the existing
_client_lock: wrap the closed-check and any mutation of self._streaming_client /
self._non_streaming_client inside "async with self._client_lock" in
_ensure_client(streaming: bool) so that initialize() and close() cannot race
with client creation, and increment self._stats["reconnects"] only while holding
the lock; apply the same change to the other similar routine that mutates
clients (the get_client_async/_ensure_client duplicate at the other location) so
no newly created client can be immediately overwritten to None by close().
- Around line 272-295: get_client currently returns ephemeral clients from
_get_lazy_client without storing them, causing socket/resource leaks and missing
HTTP/2 settings; change the lazy-path so that when
_get_lazy_client(streaming=True/False) is used its result is assigned to the
corresponding attribute (_streaming_client or _non_streaming_client) before
being returned, or refactor _get_lazy_client to call _create_client and set the
attribute internally; ensure close() only needs to clean up those tracked
attributes and that the created client uses _create_client so HTTP/2/config is
applied.
- Around line 227-241: _is_client_closed currently inspects the private
client._client attribute (HTTPX implementation detail); replace that check to
use the public closed-state API on httpx.AsyncClient (e.g., use getattr(client,
"is_closed", None) and treat True as closed) and, if the public attribute is
missing for the project’s httpx version, fail fast with a clear error
instructing to upgrade or handle the new API rather than relying on a private
attribute; update the function _is_client_closed to use client.is_closed when
present and stop referencing client._client.

In `@src/rotator_library/ip_throttle_detector.py`:
- Around line 239-285: The current IP-level escalation fires whenever
num_unique_credentials >= self.min_credentials even if common_hash is None and
can produce confidence==1.0 for exactly min_credentials (e.g., 2) — change the
IP detection to require a shared error signature and at least two matching error
bodies: only enter the IP branch when common_hash is truthy and
common_hash_count >= 2 (use the existing common_hash and common_hash_count
variables), and when num_unique_credentials == self.min_credentials avoid
forcing confidence to 1.0 (cap the base confidence below 1.0, e.g., min(0.9,
num_unique_credentials / self.min_credentials)) before applying the +0.2/+0.1
boosts; adjust the return from ThrottleAssessment(scope=ThrottleScope.IP, ...)
accordingly so unrelated per-key 429s do not escalate into a full-confidence IP
throttle.

In `@src/rotator_library/litellm_providers.py`:
- Line 79: The Amazon Nova provider entry in SCRAPED_PROVIDERS has an endpoint
string "v1/responses" missing a leading slash; update that array element to
"/v1/responses" so it matches the other slash-prefixed endpoints (locate the
provider entry that contains "endpoints": ["/chat/completions", "v1/responses"]
in litellm_providers.py and change the second item), and also fix the generator
source in the scraper/sync pipeline so the generated file retains the corrected
"/v1/responses" endpoint.

In `@src/rotator_library/providers/iflow_provider.py`:
- Around line 362-368: The current check in the streaming loop inspects content
for complete "<think>" / "</think>" substrings and calls
_parse_thinking_content, but it will miss tags split across chunks (e.g., "<thi"
+ "nk>...") causing thinking content to be emitted as normal; to fix, implement
a small per-stream buffer (e.g., on the provider instance) that appends each
delta.get("content") before scanning, only consume/parse when a full open or
close tag is detected, and pass the reconstructed content into
_parse_thinking_content (preserving finish_reason, model_id, chunk_id,
chunk_created); update the same logic used at the other check around the same
function (the 418-424 area) to use the buffered approach or explicitly document
the limitation if you cannot buffer.
- Around line 331-332: REASONING_START_MARKER is defined mid-class between
methods which is inconsistent with the other constants (HARDCODED_MODELS,
SUPPORTED_PARAMS, THINKING_SUPPORTED_MODELS, REASONING_BUDGETS); move
REASONING_START_MARKER to the top alongside those module/class-level constants
and update any references that use self.REASONING_START_MARKER to reference the
module/class constant directly (REASONING_START_MARKER) so parsing logic for the
"<think>" tags uses the unified constant placement.
- Around line 435-438: The _parse_thinking_content method includes an unused
delta parameter; remove delta from the method signature (def
_parse_thinking_content(self, content: str, finish_reason: Any, model_id: str,
chunk_id: str, chunk_created: int)) and update every call site that passes delta
(the callers that currently call _parse_thinking_content(..., delta, ...)) to
stop passing that argument and pass the remaining parameters in the new order;
also update any type hints or docstrings referencing delta to keep signatures
consistent (search for _parse_thinking_content and adjust its callers
accordingly).
- Around line 468-476: The yielded chunk currently always includes finish_reason
which causes intermediate chunks to appear terminal; update the generator in
iflow_provider.py so that finish_reason is only included on the final yielded
chunk (use the loop index or a boolean is_last when splitting content), i.e. set
"finish_reason": finish_reason if is_last else None (or omit the key when not
last) in the dict that currently uses new_delta, model_id, chunk_id,
chunk_created; ensure new_delta is unchanged and only the finish_reason field is
conditional.
- Around line 284-287: The code sets payload["thinking"] = {"type": "enabled",
"budget_tokens": -1} using an undocumented sentinel; change this so we no longer
send budget_tokens: -1—either omit the budget_tokens field entirely to let the
upstream API use its default (i.e., set payload["thinking"] = {"type":
"enabled"}) or set budget_tokens to an explicit, documented positive integer
less than the model's max tokens if you need to enforce a budget; update the
branch around payload["thinking"] (see payload["thinking"] and lib_logger.debug
referencing model_id) accordingly and remove the -1 sentinel from all places
that construct thinking payloads.

In `@src/rotator_library/providers/kilocode_provider.py`:
- Around line 34-40: The exception handler in the Kilocode model fetch block
only catches httpx.RequestError but response.raise_for_status() raises
httpx.HTTPStatusError; update the except clause in the function that calls
response.raise_for_status() (the Kilocode model discovery block in
kilocode_provider.py) to catch httpx.HTTPError (the parent of HTTPStatusError
and RequestError) so that 4xx/5xx responses are handled, keep the existing
lib_logger.error(...) logging and return [] fallback behavior.

In `@src/rotator_library/providers/nvidia_provider.py`:
- Around line 157-170: The provider currently strips Anthropic "thinking" before
checking reasoning_effort, losing the signal; change the logic so you detect
Anthropic's thinking before calling _strip_anthropic_fields (or preserve the
original payload) and if model_name is in _NVIDIA_DEEPSEEK_THINKING_MODELS and
payload has a "thinking" object but no reasoning_effort, set
payload["reasoning_effort"] = "medium" (or another chosen default) so the later
block that sets extra_body.chat_template_kwargs.thinking=True still runs; update
the code around _strip_anthropic_fields, _sanitize_messages and the
reasoning_effort check to reflect this fallback.

In `@src/rotator_library/providers/opencode_provider.py`:
- Around line 52-63: The code calls response.raise_for_status() after client.get
which can raise httpx.HTTPStatusError (not a subclass of httpx.RequestError), so
update the exception handling around the client.get / response.raise_for_status
block to catch httpx.HTTPStatusError as well (or catch httpx.HTTPError) and log
the error via lib_logger.error including the exception (same pattern used for
httpx.RequestError), then return an empty list; target the try/except
surrounding client.get, response.raise_for_status, and the list comprehension
that builds f"opencode/{model['id']}" so non-2xx responses fall back to [].

In `@src/rotator_library/request_sanitizer.py`:
- Around line 51-76: The sanitizer is using normalized_model (which may include
an outer provider prefix like "kilocode/" or "openrouter/") for capability
checks, causing misclassification; change the logic to compute an inner_model by
stripping a single leading provider prefix (if present) — e.g., inner_model =
normalized_model.split("/",1)[1] when "/" in normalized_model else
normalized_model — and then use inner_model for the "dimensions" check, the
thinking-enabled model whitelist, and the Kilocode free-model detection (replace
existing uses of normalized_model and the ad-hoc model_without_prefix with
inner_model); keep payload, KILOCODE_UNSUPPORTED_PARAMS, auto_adjust_max_tokens
and adjust_max_tokens_in_payload behavior unchanged.

In `@src/rotator_library/timeout_config.py`:
- Line 33: The module docstring's description of the pool timeout is
inconsistent with the runtime default: update the module-level docstring to
reflect the new default of 15 seconds so it matches the _POOL constant (symbol:
_POOL) and any env-var documentation in the docstring; ensure the text that
mentions the default timeout value is changed from "60s" to "15s" and keep
wording consistent with the _POOL assignment and any related env var
explanation.

In `@src/rotator_library/token_calculator.py`:
- Around line 116-121: The model normalization currently strips suffixes then
matches fallback patterns in an order that causes a generic key like "gpt-4" to
match before more specific aliases (e.g., "gpt-4-turbo" or "gpt-4-0125"), so
alias models get the wrong window; fix by matching explicit/specific aliases
first: in the matching logic that looks up model names (the code that iterates
over your mapping/patterns around the region handling model -> window,
referencing the variable model and the model-to-window mapping/pattern list),
either (A) check exact/explicit aliases first (look for full-name keys like
"gpt-4-turbo" or entries in an alias dict) before applying generic regexes, or
(B) sort the fallback pattern keys by specificity (e.g., descending length or
number of tokens) so longer/more specific patterns are tested first; apply the
same change in the other matching block noted (the second model->window match
region) to ensure gpt-4-turbo and other explicit aliases return their correct
token window.
- Around line 288-297: The current logic returns MIN_MAX_TOKENS even when
available_for_output is <= 0 which will create impossible requests; change the
block around available_for_output so that if available_for_output <= 0 you fail
fast (log an error via logger.error and return 0 with an error code like
"input_exceeds_context_no_budget" or raise a ValueError), and only when 0 <
available_for_output < MIN_MAX_TOKENS return the true remaining budget
(available_for_output) with the existing "input_exceeds_context_minimal_output"
code; keep the logger message but use logger.error for the no-budget case and
logger.warning for the capped-but-positive case.

In `@src/rotator_library/utils/patch_litellm_finish_reason.py`:
- Around line 130-134: Replace the bare logger.error calls that handle generic
exceptions with logger.exception so stack traces are preserved: in
src/rotator_library/utils/patch_litellm_finish_reason.py locate the except
Exception handlers around the LiteLLM patch operations (the block that logs
"Unexpected error patching LiteLLM Choices" and the similar handlers referenced
at the other occurrences around the patching logic at the noted locations) and
change logger.error(...) to logger.exception(...); do this for each identical
catch-all except block (including the ones around the Choices patch and the
other two occurrences) so the traceback is included in the logs.
- Around line 154-165: The patch replaces ChatCompletionChunk.model_validate
with a staticmethod which breaks the original classmethod descriptor and risks
incorrect unpatching; instead, save the original descriptor (store
_original_chat_completion_chunk_model_validate =
ChatCompletionChunk.model_validate), then define patched_model_validate as a
classmethod that accepts (cls, obj, *args, **kwargs), calls
_normalize_chunk_data(obj) and forwards the call to the original via the
descriptor (so it is invoked with proper class binding), assign
ChatCompletionChunk.model_validate = classmethod(patched_model_validate) and
when unpatching restore ChatCompletionChunk.model_validate =
_original_chat_completion_chunk_model_validate to preserve original behavior.

In `@src/rotator_library/utils/resilient_io.py`:
- Around line 688-706: The AsyncResilientStateWriter has three issues to fix:
add the missing return type annotation "-> None" to the __init__ method; replace
the Russian docstrings on AsyncResilientStateWriter, schedule_write,
_delayed_flush (and any other Russian docstrings in that class) with English
equivalents to match the file; and narrow the bare "except Exception" in
_delayed_flush to a filesystem-related exception (e.g., except OSError or except
(OSError, IOError)) to match the sync ResilientStateWriter behavior so you don't
swallow programming errors.
- Around line 688-697: AsyncResilientStateWriter currently holds pending state
in _pending and an async _flush_task but never registers with
BufferedWriteRegistry, so if close() (or task cancellation) isn't awaited
pending writes can be lost; update AsyncResilientStateWriter (constructor and
close()/flush/failure paths) to register pending data with BufferedWriteRegistry
as a synchronous fallback when disk writes fail or when the event loop is
shutting down (e.g., on exception/cancel), ensuring _pending is handed to
BufferedWriteRegistry and _flush_task is canceled/cleared; alternatively, add
clear documentation on requiring callers to await close()—mention
AsyncResilientStateWriter, close(), _pending, _flush_task and
BufferedWriteRegistry in your changes.
- Around line 710-733: The flush() method performs blocking filesystem ops
(file_path.with_suffix(...).write_text, replace, parent.mkdir, write_text)
inside an async function, which will block the event loop; change those sync
calls to run in background threads via asyncio.to_thread: call await
asyncio.to_thread(...) for temp_path.write_text, temp_path.replace,
self.file_path.parent.mkdir, and the fallback self.file_path.write_text, and
ensure asyncio is imported; keep the same try/except recovery logic and the use
of self._lock, _pending, temp_path = self.file_path.with_suffix('.tmp') and
preserve JSON serialization behavior.
- Around line 719-733: The current ResilientStateWriter write path uses a
predictable temp path (self.file_path.with_suffix('.tmp')) and the exception
recovery writes directly to the final path, risking collisions and non-atomic
writes; change the implementation in the write routine (the block around
temp_path creation and the exception handler in ResilientStateWriter.write or
the method containing that try/except) to create a unique temp file in the same
directory using tempfile.mkstemp or
tempfile.NamedTemporaryFile(dir=self.file_path.parent, delete=False), write the
JSON to that unique temp file, fsync the file descriptor and parent directory,
then atomically replace the target using os.replace; in the exception/recovery
branch ensure the parent directory is created first and perform the same
unique-temp -> fsync -> os.replace sequence (not a direct write to the target)
so recovery remains atomic and avoids predictable-temp collisions.

In `@start_proxy.bat`:
- Around line 2-7: The batch file contains non-ASCII Russian text (e.g., the
echo string "LLM API Key Proxy - Запуск" and other lines) and must be saved
using UTF-8 with a BOM so the console displays those characters correctly after
running chcp 65001; reopen start_proxy.bat in your editor/IDE and resave it as
"UTF-8 with BOM" (or "UTF-8 signature") to add the BOM, then verify the file
still contains the lines with chcp 65001 and the echo statements unchanged.
- Around line 1-23: Replace the direct "python src/proxy_app/main.py --host
127.0.0.1 --port 8000" invocation with a Windows-friendly launcher: use "py -3
src/proxy_app/main.py --host 127.0.0.1 --port 8000" to ensure the correct Python
installation is used; alternatively, add a pre-check using "where python" and
bail with a clear message if Python is not found before invoking the existing
"python src/proxy_app/main.py --host 127.0.0.1 --port 8000" command so the
script works on systems without python on PATH.

---

Outside diff comments:
In `@src/rotator_library/providers/nvidia_provider.py`:
- Around line 39-49: The get_models() method currently only catches
httpx.RequestError so HTTPStatusError raised by response.raise_for_status() will
escape; update the exception handling in get_models() to also catch
httpx.HTTPStatusError (or httpx.HTTPError) alongside httpx.RequestError around
the client.get()/response.raise_for_status() call, and in that handler call
lib_logger.error with the error details and return [] to match existing
behavior; reference the client.get call, response.raise_for_status(),
httpx.HTTPStatusError/httpx.HTTPError, lib_logger.error, and the existing return
[] to locate and modify the try/except block.

In `@src/rotator_library/providers/qwen_code_provider.py`:
- Around line 257-277: The current early return inside the SSE handling block
bypasses the normal delta-to-content normalization and causes raw <think>
markers to leak when usage_data arrives with the final delta; instead of
yielding a combined chunk and returning in the block that inspects
choices/usage_data (where variables like choice, delta, finish_reason, chunk_id,
chunk_created, model_id are available), run the delta through the existing
normalization/conversion path used later in this method (the same routine that
converts choice["delta"] -> reasoning_content and applies <think> →
reasoning_content transforms), emit the normalized chunk(s) as usual, and attach
the usage object to the last emitted normalized chunk (preserving finish_reason
and ids) rather than short-circuiting and returning.

---

Duplicate comments:
In `@src/rotator_library/providers/kilocode_provider.py`:
- Around line 29-33: The model-listing call in kilocode_provider is hardcoded to
"https://kilo.ai/api/openrouter/models" and bypasses the configured base; update
the client.get call in the code that builds the request (the try block where
response = await client.get(...)) to use the configured Kilocode base variable
(e.g., KILOCODE_API_BASE or the existing kilocode_api_base config) and append
the "/api/openrouter/models" path so the request honors the KILOCODE_API_BASE
override while still passing the Authorization header with api_key.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: e8a9549a-6fbd-486c-ac0b-fa5e7dea24c3

📥 Commits

Reviewing files that changed from the base of the PR and between bf7ab7e and 82bb900.

📒 Files selected for processing (53)
  • .gitignore
  • CONTRIBUTING.md
  • README.md
  • src/proxy_app/main.py
  • src/proxy_app/provider_urls.py
  • src/rotator_library/__init__.py
  • src/rotator_library/anthropic_compat/__init__.py
  • src/rotator_library/anthropic_compat/streaming.py
  • src/rotator_library/anthropic_compat/streaming_fast.py
  • src/rotator_library/anthropic_compat/translator.py
  • src/rotator_library/async_locks.py
  • src/rotator_library/background_refresher.py
  • src/rotator_library/batched_persistence.py
  • src/rotator_library/circuit_breaker.py
  • src/rotator_library/client.py
  • src/rotator_library/config/__init__.py
  • src/rotator_library/config/defaults.py
  • src/rotator_library/cooldown_manager.py
  • src/rotator_library/credential_weight_cache.py
  • src/rotator_library/error_handler.py
  • src/rotator_library/http_client_pool.py
  • src/rotator_library/ip_throttle_detector.py
  • src/rotator_library/litellm_providers.py
  • src/rotator_library/model_definitions.py
  • src/rotator_library/provider_config.py
  • src/rotator_library/providers/base_token_manager.py
  • src/rotator_library/providers/cohere_provider.py
  • src/rotator_library/providers/gemini_provider.py
  • src/rotator_library/providers/google_oauth_base.py
  • src/rotator_library/providers/groq_provider.py
  • src/rotator_library/providers/iflow_auth_base.py
  • src/rotator_library/providers/iflow_provider.py
  • src/rotator_library/providers/kilocode_provider.py
  • src/rotator_library/providers/mistral_provider.py
  • src/rotator_library/providers/nanogpt_provider.py
  • src/rotator_library/providers/nvidia_provider.py
  • src/rotator_library/providers/openai_compatible_provider.py
  • src/rotator_library/providers/openai_provider.py
  • src/rotator_library/providers/opencode_provider.py
  • src/rotator_library/providers/openrouter_provider.py
  • src/rotator_library/providers/qwen_auth_base.py
  • src/rotator_library/providers/qwen_code_provider.py
  • src/rotator_library/providers/trybons_provider.py
  • src/rotator_library/providers/utilities/gemini_credential_manager.py
  • src/rotator_library/request_sanitizer.py
  • src/rotator_library/timeout_config.py
  • src/rotator_library/token_calculator.py
  • src/rotator_library/usage_manager.py
  • src/rotator_library/utils/__init__.py
  • src/rotator_library/utils/patch_litellm_finish_reason.py
  • src/rotator_library/utils/resilient_io.py
  • src/rotator_library/utils/suppress_litellm_warnings.py
  • start_proxy.bat
💤 Files with no reviewable changes (11)
  • src/rotator_library/providers/openai_compatible_provider.py
  • src/rotator_library/providers/mistral_provider.py
  • src/rotator_library/model_definitions.py
  • src/rotator_library/providers/cohere_provider.py
  • CONTRIBUTING.md
  • src/rotator_library/providers/nanogpt_provider.py
  • src/rotator_library/providers/openai_provider.py
  • src/rotator_library/providers/groq_provider.py
  • README.md
  • src/rotator_library/providers/gemini_provider.py
  • src/rotator_library/providers/openrouter_provider.py

"bedrock": "https://bedrock-runtime.us-east-1.amazonaws.com",
"openrouter": "https://openrouter.ai/api/v1",
"kilocode": "https://kilocode.ai/api/openrouter",
"trybons": "https://go.trybons.ai",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Add /v1 to the TryBons base URL.

With the current map, chat/completions resolves to https://go.trybons.ai/messages, but src/rotator_library/providers/trybons_provider.py already talks to /v1/models. As written, TryBons chat traffic is routed to the wrong path.

Suggested fix
-    "trybons": "https://go.trybons.ai",
+    "trybons": "https://go.trybons.ai/v1",

Also applies to: 62-64

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/proxy_app/provider_urls.py` at line 34, Update the TryBons base URL in
the provider URLs mapping so requests hit the /v1 path: change the mapping value
for the "trybons" key from "https://go.trybons.ai" to "https://go.trybons.ai/v1"
in the provider_urls.py mapping (and make the same change for the other TryBons
entries present later in that file), so code in trybons_provider.py that targets
/v1/models routes to the correct endpoint.

Comment on lines +369 to +381
except Exception as e:
logger.error(f"Error in Anthropic streaming wrapper: {e}")

# Send error as visible text
if not message_started:
yield _make_message_start_event(request_id, original_model, input_tokens, cached_tokens)

error_message = f"Error: {str(e)}"
yield _make_content_block_start_event(current_block_index, "text")
yield _make_text_delta_event(current_block_index, error_message)
yield _make_content_block_stop_event(current_block_index)
yield _make_message_delta_event("end_turn", 0, cached_tokens)
yield _make_message_stop_event()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Close open blocks before emitting the fallback error block.

If an exception is raised after a thinking/text/tool block has already started, this handler opens a new text block without stopping the existing one(s). That produces an invalid Anthropic SSE sequence right when the client is already in the error path.

Suggested fix
     except Exception as e:
         logger.error(f"Error in Anthropic streaming wrapper: {e}")
 
+        if thinking_block_started:
+            yield _make_content_block_stop_event(current_block_index)
+            current_block_index += 1
+            thinking_block_started = False
+
+        if content_block_started:
+            yield _make_content_block_stop_event(current_block_index)
+            current_block_index += 1
+            content_block_started = False
+
+        for tc_index in sorted(tool_block_indices.keys()):
+            yield _make_content_block_stop_event(tool_block_indices[tc_index])
+
         # Send error as visible text
         if not message_started:
             yield _make_message_start_event(request_id, original_model, input_tokens, cached_tokens)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
except Exception as e:
logger.error(f"Error in Anthropic streaming wrapper: {e}")
# Send error as visible text
if not message_started:
yield _make_message_start_event(request_id, original_model, input_tokens, cached_tokens)
error_message = f"Error: {str(e)}"
yield _make_content_block_start_event(current_block_index, "text")
yield _make_text_delta_event(current_block_index, error_message)
yield _make_content_block_stop_event(current_block_index)
yield _make_message_delta_event("end_turn", 0, cached_tokens)
yield _make_message_stop_event()
except Exception as e:
logger.error(f"Error in Anthropic streaming wrapper: {e}")
if thinking_block_started:
yield _make_content_block_stop_event(current_block_index)
current_block_index += 1
thinking_block_started = False
if content_block_started:
yield _make_content_block_stop_event(current_block_index)
current_block_index += 1
content_block_started = False
for tc_index in sorted(tool_block_indices.keys()):
yield _make_content_block_stop_event(tool_block_indices[tc_index])
# Send error as visible text
if not message_started:
yield _make_message_start_event(request_id, original_model, input_tokens, cached_tokens)
error_message = f"Error: {str(e)}"
yield _make_content_block_start_event(current_block_index, "text")
yield _make_text_delta_event(current_block_index, error_message)
yield _make_content_block_stop_event(current_block_index)
yield _make_message_delta_event("end_turn", 0, cached_tokens)
yield _make_message_stop_event()
🧰 Tools
🪛 Ruff (0.15.4)

[warning] 369-369: Do not catch blind exception: Exception

(BLE001)


[warning] 370-370: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


[warning] 370-370: Logging statement uses f-string

(G004)


[warning] 376-376: Use explicit conversion flag

Replace with conversion flag

(RUF010)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/rotator_library/anthropic_compat/streaming_fast.py` around lines 369 -
381, The exception handler currently starts a new text content block without
closing any previously opened content/message blocks; update the except block in
streaming_fast.py (the logic around message_started, current_block_index and the
yields of _make_content_block_start_event/_make_text_delta_event) to first close
any open content blocks and the current message turn: yield
_make_content_block_stop_event(current_block_index) if a block was started, and
yield the appropriate _make_message_delta_event("end_turn", 0, cached_tokens)
and _make_message_stop_event() as needed before emitting the fallback text block
and the error delta; ensure you only open the fallback text block after these
stops so the SSE sequence stays valid.

Comment on lines +15 to +35
# Try to import orjson for faster JSON handling (3-5x faster than stdlib json)
try:
import orjson

def _json_dumps(obj) -> str:
return orjson.dumps(obj).decode('utf-8')

def _json_loads(s: str):
return orjson.loads(s)

_json_decode_error = orjson.JSONDecodeError
except ImportError:
import json as _json_fallback

def _json_dumps(obj) -> str:
return _json_fallback.dumps(obj)

def _json_loads(s: str):
return _json_fallback.loads(s)

_json_decode_error = _json_fallback.JSONDecodeError
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Extract the JSON shim into a shared helper.

This _json_dumps/_json_loads/_json_decode_error block is now duplicated with src/rotator_library/anthropic_compat/streaming.py. Keeping the orjson fallback logic in two places will drift quickly; one shared helper keeps serialization behavior and exception handling consistent.

🧰 Tools
🪛 Ruff (0.15.4)

[warning] 22-22: Missing return type annotation for private function _json_loads

(ANN202)


[warning] 32-32: Missing return type annotation for private function _json_loads

(ANN202)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/rotator_library/anthropic_compat/translator.py` around lines 15 - 35,
Extract the duplicated JSON shim (_json_dumps, _json_loads, _json_decode_error)
into a single shared helper module (e.g., json_shim or json_utils) that
encapsulates the orjson import/fallback logic and exposes the same symbols;
replace the local definitions in translator.py (functions _json_dumps,
_json_loads and symbol _json_decode_error) and in streaming.py to import these
symbols from the new helper so behavior and exception semantics remain
identical; ensure the helper returns/decodes strings the same way
(orjson.dumps(...).decode('utf-8') vs json.dumps) and re-exports the correct
JSONDecodeError class so existing code that catches _json_decode_error continues
to work.

Comment on lines +61 to +89
async def release_read(self) -> None:
"""Release read lock."""
async with self._read_ready:
self._readers -= 1
if self._readers == 0:
# Notify waiting writers
async with self._write_ready:
self._write_ready.notify_all()

async def acquire_write(self) -> None:
"""Acquire write lock."""
async with self._write_ready:
self._writer_waiting += 1
try:
# Wait until no readers or writers active
while self._readers > 0 or self._writer_active:
await self._write_ready.wait()
self._writer_active = True
finally:
self._writer_waiting -= 1

async def release_write(self) -> None:
"""Release write lock."""
async with self._write_ready:
self._writer_active = False
# Notify both readers and writers
self._write_ready.notify_all()
async with self._read_ready:
self._read_ready.notify_all()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

cat -n src/rotator_library/async_locks.py

Repository: Mirrowel/LLM-API-Key-Proxy

Length of output: 14098


🌐 Web query:

asyncio.Condition reentrant lock semantics Python

💡 Result:

  • asyncio.Condition is not reentrant. A Condition wraps an underlying asyncio.Lock (either the one you pass in, or a new one it creates), and its acquire()/release() are just for that lock. The lock argument must be an asyncio.Lock (or None), not an RLock. [1]
  • asyncio.Lock is non-reentrant, so if the same task tries to acquire it again (e.g., nested async with cond: / await cond.acquire()), it will block forever (deadlock) until it’s released. [1]
  • Condition.wait() semantics: you must hold the lock; wait() releases the lock while waiting and, when notified, re-acquires the lock before returning (so the caller resumes while holding it). [1][2]
  • There is no stdlib asyncio.RLock, so if you truly need reentrancy you must redesign to avoid nested acquisition or use a third-party async RLock implementation. [3]

Sources: [1] Python asyncio synchronization primitives docs, Condition/Lock sections; [2] Stack Overflow explanation of Condition.wait() releasing and re-acquiring the lock; [3] fair-async-rlock (PyPI) noting asyncio lacks an RLock.


The read/write release paths can deadlock each other.

release_read() acquires _read_ready and then _write_ready, while release_write() does the reverse. Two concurrent releases can each block on the other's condition lock forever. Since asyncio.Condition is non-reentrant and each holds a separate underlying lock, concurrent releases will deadlock: Task A holds _read_ready.lock and waits for _write_ready.lock, while Task B holds _write_ready.lock and waits for _read_ready.lock. Both conditions need to share one underlying lock, or these paths need a single acquisition order.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/rotator_library/async_locks.py` around lines 61 - 89, The release paths
deadlock because release_read() acquires _read_ready then _write_ready while
release_write() does the opposite; fix by making both Conditions share the same
underlying lock or by enforcing a single global acquisition order so nested
acquires never invert. Concretely, create one asyncio.Lock (or one
asyncio.Condition) and initialize both _read_ready and _write_ready with that
same lock (or replace both with a single Condition and adjust logic), then
update release_read, release_write and acquire_write to use that shared
lock/condition so notify_all() calls no longer require nested, inversed
acquisitions; ensure functions named release_read, release_write and
acquire_write reference the shared _lock/_condition.

Comment on lines +186 to +211
async with self._lock:
self._stats["acquires"] += 1
self._stats["current_concurrent"] += 1
self._stats["peak_concurrent"] = max(
self._stats["peak_concurrent"],
self._stats["current_concurrent"]
)
return True

except Exception:
async with self._lock:
if waiter in self._waiters:
self._waiters.remove(waiter)
raise

def release(self) -> None:
"""Release the semaphore."""
# Note: This is sync for compatibility with context manager
self._value += 1
self._stats["releases"] += 1
self._stats["current_concurrent"] -= 1

# Wake up next waiter if any
if self._waiters:
next_waiter = self._waiters.pop(0)
next_waiter.set()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

wc -l src/rotator_library/async_locks.py

Repository: Mirrowel/LLM-API-Key-Proxy

Length of output: 108


🏁 Script executed:

cat -n src/rotator_library/async_locks.py | head -220

Repository: Mirrowel/LLM-API-Key-Proxy

Length of output: 8184


Fix direct handoff logic in semaphore release.

The release() method has a critical bug: it increments _value unconditionally, then wakes a waiter if one exists. The waiter continues without decrementing _value, causing the same permit to be counted twice—once as the incremented _value, and once as the acquired permit held by the waiter. This allows concurrent usage to exceed the configured limit.

The fix should only increment _value when no waiters exist; when handing off directly to a waiter, skip the increment and let the waiter consume the permit without double-counting.

Proposed fix
     def release(self) -> None:
         """Release the semaphore."""
         # Note: This is sync for compatibility with context manager
-        self._value += 1
         self._stats["releases"] += 1
         self._stats["current_concurrent"] -= 1

         # Wake up next waiter if any
         if self._waiters:
-            next_waiter = self._waiters.pop(0)
-            next_waiter.set()
+            self._waiters.pop(0).set()
+        else:
+            self._value += 1
🧰 Tools
🪛 Ruff (0.15.4)

[warning] 193-193: Consider moving this statement to an else block

(TRY300)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/rotator_library/async_locks.py` around lines 186 - 211, The release()
method must avoid incrementing _value when directly handing off to a waiter:
change logic in release() so that if _waiters is non-empty you pop a waiter and
call waiter.set() without doing self._value += 1 and without decrementing
self._stats["current_concurrent"] (the permit is transferred), otherwise (no
waiters) do self._value += 1 and decrement self._stats["current_concurrent"] and
increment self._stats["releases"]; ensure references to release(), _value,
_waiters, waiter.set(), and self._stats["current_concurrent"] are updated
accordingly so the semaphore doesn't double-count a permit on direct handoff.

Comment on lines +710 to +733
async def flush(self) -> None:
"""Немедленная запись."""
async with self._lock:
if self._pending is None:
return

data = self._pending
self._pending = None

try:
# Atomic write: temp -> rename
temp_path = self.file_path.with_suffix('.tmp')
temp_path.write_text(
json.dumps(data, indent=2, default=str),
encoding='utf-8'
)
temp_path.replace(self.file_path)
except Exception:
# Recovery: recreate directory if deleted
self.file_path.parent.mkdir(parents=True, exist_ok=True)
self.file_path.write_text(
json.dumps(data, indent=2, default=str),
encoding='utf-8'
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Blocking I/O operations in async context will block the event loop.

The flush() method uses synchronous write_text() and replace() calls. In an async class, these blocking operations will stall the entire event loop during disk I/O, negating the benefits of async design.

Consider wrapping blocking I/O with asyncio.to_thread():

♻️ Proposed fix using asyncio.to_thread()
     async def flush(self) -> None:
-        """Немедленная запись."""
+        """Immediate write."""
         async with self._lock:
             if self._pending is None:
                 return

             data = self._pending
             self._pending = None

+        # Run blocking I/O in thread pool
+        await asyncio.to_thread(self._write_to_disk, data)
+
+    def _write_to_disk(self, data: Dict[str, Any]) -> None:
+        """Synchronous disk write (called via to_thread)."""
+        try:
+            temp_path = self.file_path.with_suffix('.tmp')
+            temp_path.write_text(
+                json.dumps(data, indent=2, default=str),
+                encoding='utf-8'
+            )
+            temp_path.replace(self.file_path)
+        except OSError:
+            self.file_path.parent.mkdir(parents=True, exist_ok=True)
+            self.file_path.write_text(
+                json.dumps(data, indent=2, default=str),
+                encoding='utf-8'
+            )
-            try:
-                # Atomic write: temp -> rename
-                temp_path = self.file_path.with_suffix('.tmp')
-                temp_path.write_text(
-                    json.dumps(data, indent=2, default=str),
-                    encoding='utf-8'
-                )
-                temp_path.replace(self.file_path)
-            except Exception:
-                # Recovery: recreate directory if deleted
-                self.file_path.parent.mkdir(parents=True, exist_ok=True)
-                self.file_path.write_text(
-                    json.dumps(data, indent=2, default=str),
-                    encoding='utf-8'
-                )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
async def flush(self) -> None:
"""Немедленная запись."""
async with self._lock:
if self._pending is None:
return
data = self._pending
self._pending = None
try:
# Atomic write: temp -> rename
temp_path = self.file_path.with_suffix('.tmp')
temp_path.write_text(
json.dumps(data, indent=2, default=str),
encoding='utf-8'
)
temp_path.replace(self.file_path)
except Exception:
# Recovery: recreate directory if deleted
self.file_path.parent.mkdir(parents=True, exist_ok=True)
self.file_path.write_text(
json.dumps(data, indent=2, default=str),
encoding='utf-8'
)
async def flush(self) -> None:
"""Immediate write."""
async with self._lock:
if self._pending is None:
return
data = self._pending
self._pending = None
# Run blocking I/O in thread pool
await asyncio.to_thread(self._write_to_disk, data)
def _write_to_disk(self, data: Dict[str, Any]) -> None:
"""Synchronous disk write (called via to_thread)."""
try:
# Atomic write: temp -> rename
temp_path = self.file_path.with_suffix('.tmp')
temp_path.write_text(
json.dumps(data, indent=2, default=str),
encoding='utf-8'
)
temp_path.replace(self.file_path)
except OSError:
# Recovery: recreate directory if deleted
self.file_path.parent.mkdir(parents=True, exist_ok=True)
self.file_path.write_text(
json.dumps(data, indent=2, default=str),
encoding='utf-8'
)
🧰 Tools
🪛 Ruff (0.15.4)

[warning] 727-727: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/rotator_library/utils/resilient_io.py` around lines 710 - 733, The
flush() method performs blocking filesystem ops
(file_path.with_suffix(...).write_text, replace, parent.mkdir, write_text)
inside an async function, which will block the event loop; change those sync
calls to run in background threads via asyncio.to_thread: call await
asyncio.to_thread(...) for temp_path.write_text, temp_path.replace,
self.file_path.parent.mkdir, and the fallback self.file_path.write_text, and
ensure asyncio is imported; keep the same try/except recovery logic and the use
of self._lock, _pending, temp_path = self.file_path.with_suffix('.tmp') and
preserve JSON serialization behavior.

Comment on lines +719 to +733
try:
# Atomic write: temp -> rename
temp_path = self.file_path.with_suffix('.tmp')
temp_path.write_text(
json.dumps(data, indent=2, default=str),
encoding='utf-8'
)
temp_path.replace(self.file_path)
except Exception:
# Recovery: recreate directory if deleted
self.file_path.parent.mkdir(parents=True, exist_ok=True)
self.file_path.write_text(
json.dumps(data, indent=2, default=str),
encoding='utf-8'
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Predictable temp file path and non-atomic recovery path risk data integrity.

Two issues:

  1. Line 721: Using .with_suffix('.tmp') creates a predictable temp path. If two writers target the same file simultaneously, or a stale temp file exists from a crash, this could cause collisions. The synchronous ResilientStateWriter uses tempfile.mkstemp() for unique paths.

  2. Lines 729-733: The recovery fallback writes directly to the final path without the atomic temp→rename pattern. If this write is interrupted, the file could be left corrupted.

♻️ Proposed fix for safer temp file handling
+import tempfile
+import shutil
+
 # In _write_to_disk or flush:
-            temp_path = self.file_path.with_suffix('.tmp')
-            temp_path.write_text(
-                json.dumps(data, indent=2, default=str),
-                encoding='utf-8'
-            )
-            temp_path.replace(self.file_path)
+            fd, tmp_path_str = tempfile.mkstemp(
+                dir=self.file_path.parent,
+                prefix=".tmp_",
+                suffix=".json"
+            )
+            try:
+                with os.fdopen(fd, "w", encoding="utf-8") as f:
+                    json.dump(data, f, indent=2, default=str)
+                shutil.move(tmp_path_str, self.file_path)
+            except Exception:
+                if os.path.exists(tmp_path_str):
+                    os.unlink(tmp_path_str)
+                raise
🧰 Tools
🪛 Ruff (0.15.4)

[warning] 727-727: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/rotator_library/utils/resilient_io.py` around lines 719 - 733, The
current ResilientStateWriter write path uses a predictable temp path
(self.file_path.with_suffix('.tmp')) and the exception recovery writes directly
to the final path, risking collisions and non-atomic writes; change the
implementation in the write routine (the block around temp_path creation and the
exception handler in ResilientStateWriter.write or the method containing that
try/except) to create a unique temp file in the same directory using
tempfile.mkstemp or tempfile.NamedTemporaryFile(dir=self.file_path.parent,
delete=False), write the JSON to that unique temp file, fsync the file
descriptor and parent directory, then atomically replace the target using
os.replace; in the exception/recovery branch ensure the parent directory is
created first and perform the same unique-temp -> fsync -> os.replace sequence
(not a direct write to the target) so recovery remains atomic and avoids
predictable-temp collisions.

Comment on lines +37 to +45
old_stdout = sys.stdout
old_stderr = sys.stderr
sys.stdout = StringIO()
sys.stderr = StringIO()
try:
yield
finally:
sys.stdout = old_stdout
sys.stderr = old_stderr
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Inspect every call site using this context manager and show surrounding context.
python - <<'PY'
from pathlib import Path
import re

use_pat = re.compile(r'with\s+suppress_litellm_prints\s*\(')
marker_pat = re.compile(r'^\s*(async\s+def\b|threading\.Thread|ThreadPoolExecutor|FastAPI|APIRouter)\b')

for path in sorted(Path(".").rglob("*.py")):
    text = path.read_text(encoding="utf-8", errors="ignore").splitlines()
    hit_lines = [i for i, line in enumerate(text, 1) if use_pat.search(line)]
    if not hit_lines:
        continue

    print(f"\n== {path} ==")
    for ln in hit_lines:
        start = max(1, ln - 3)
        end = min(len(text), ln + 3)
        for i in range(start, end + 1):
            print(f"{i}:{text[i-1]}")
    markers = [i for i, line in enumerate(text, 1) if marker_pat.search(line)]
    if markers:
        print("-- concurrency/request markers in same file --")
        for i in markers[:20]:
            print(f"{i}:{text[i-1]}")
PY

Repository: Mirrowel/LLM-API-Key-Proxy

Length of output: 1737


Avoid replacing global sys.stdout/sys.stderr here.

Lines 37–40 mutate interpreter-global streams. This context manager is called from async def record_failure() in usage_manager.py (line 3065), where multiple concurrent tasks may execute simultaneously. When one coroutine redirects stdout/stderr, any other task writing during that window gets silenced too. Redirecting stderr also risks hiding unrelated tracebacks and can break code expecting a real stream object. Filter the specific LiteLLM spam at the print boundary instead of black-holing process-wide output.

Comment on lines +1 to +23
@echo off
chcp 65001 >nul
setlocal

echo ========================================
echo LLM API Key Proxy - Запуск
echo ========================================
echo.

cd /d "%~dp0"


echo.
echo Запуск прокси-сервера на http://127.0.0.1:8000
echo.
echo.
echo Для остановки нажмите Ctrl+C
echo ========================================
echo.

python src/proxy_app/main.py --host 127.0.0.1 --port 8000

pause
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Consider using the py launcher for better Windows compatibility.

On Windows, the py launcher (PEP 397) is more reliable than calling python directly, as it handles multiple Python installations and doesn't require Python to be in the system PATH.

♻️ Suggested improvement
-python src/proxy_app/main.py --host 127.0.0.1 --port 8000
+py src/proxy_app/main.py --host 127.0.0.1 --port 8000

Alternatively, check for Python availability first:

where python >nul 2>&1 || (echo Python not found in PATH & pause & exit /b 1)
python src/proxy_app/main.py --host 127.0.0.1 --port 8000
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
@echo off
chcp 65001 >nul
setlocal
echo ========================================
echo LLM API Key Proxy - Запуск
echo ========================================
echo.
cd /d "%~dp0"
echo.
echo Запуск прокси-сервера на http://127.0.0.1:8000
echo.
echo.
echo Для остановки нажмите Ctrl+C
echo ========================================
echo.
python src/proxy_app/main.py --host 127.0.0.1 --port 8000
pause
`@echo` off
chcp 65001 >nul
setlocal
echo ========================================
echo LLM API Key Proxy - Запуск
echo ========================================
echo.
cd /d "%~dp0"
echo.
echo Запуск прокси-сервера на http://127.0.0.1:8000
echo.
echo.
echo Для остановки нажмите Ctrl+C
echo ========================================
echo.
py src/proxy_app/main.py --host 127.0.0.1 --port 8000
pause
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@start_proxy.bat` around lines 1 - 23, Replace the direct "python
src/proxy_app/main.py --host 127.0.0.1 --port 8000" invocation with a
Windows-friendly launcher: use "py -3 src/proxy_app/main.py --host 127.0.0.1
--port 8000" to ensure the correct Python installation is used; alternatively,
add a pre-check using "where python" and bail with a clear message if Python is
not found before invoking the existing "python src/proxy_app/main.py --host
127.0.0.1 --port 8000" command so the script works on systems without python on
PATH.

Comment on lines +2 to +7
chcp 65001 >nul
setlocal

echo ========================================
echo LLM API Key Proxy - Запуск
echo ========================================
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Ensure the file is saved with UTF-8 encoding (with BOM recommended for Windows batch).

The Russian text on lines 6, 14, and 17 requires this file to be saved as UTF-8. For Windows batch files with non-ASCII characters, saving with UTF-8 BOM (Byte Order Mark) is typically more reliable to ensure proper display in the console after chcp 65001.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@start_proxy.bat` around lines 2 - 7, The batch file contains non-ASCII
Russian text (e.g., the echo string "LLM API Key Proxy - Запуск" and other
lines) and must be saved using UTF-8 with a BOM so the console displays those
characters correctly after running chcp 65001; reopen start_proxy.bat in your
editor/IDE and resave it as "UTF-8 with BOM" (or "UTF-8 signature") to add the
BOM, then verify the file still contains the lines with chcp 65001 and the echo
statements unchanged.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review continued from previous batch...

Comment on lines +157 to +170
# --- Step 1: Strip all Anthropic-specific top-level fields ---
self._strip_anthropic_fields(payload)

# --- Step 2: Strip cache_control and normalize messages ---
self._sanitize_messages(payload)

# --- Step 3: Handle reasoning/thinking for DeepSeek models ---
reasoning_effort = payload.get("reasoning_effort")

if model_name in deepseek_models and reasoning_effort in ["low", "medium", "high"]:
if "extra_body" not in payload:
payload["extra_body"] = {}
if "chat_template_kwargs" not in payload["extra_body"]:
payload["extra_body"]["chat_template_kwargs"] = {}

payload["extra_body"]["chat_template_kwargs"]["thinking"] = True
lib_logger.info(f"Enabled 'thinking' parameter for model: {model_name} due to reasoning_effort: '{reasoning_effort}'")
if model_name in _NVIDIA_DEEPSEEK_THINKING_MODELS:
if reasoning_effort in ("low", "medium", "high"):
payload.setdefault("extra_body", {})
payload["extra_body"].setdefault("chat_template_kwargs", {})
payload["extra_body"]["chat_template_kwargs"]["thinking"] = True
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Inspect where the NVIDIA provider hook is invoked.
rg -nP --type=py -C2 '\bhandle_thinking_parameter\s*\(' src

echo
# Look for any normalization that converts Anthropic `thinking`
# into `reasoning_effort` before the provider hook runs.
rg -nP --type=py -C3 '\bthinking\b|\breasoning_effort\b' src

Repository: Mirrowel/LLM-API-Key-Proxy

Length of output: 50383


🏁 Script executed:

# Check the context around handle_thinking_parameter calls in client.py
sed -n '2300,2340p' src/rotator_library/client.py

echo "---"

# Check the second location
sed -n '3230,3270p' src/rotator_library/client.py

Repository: Mirrowel/LLM-API-Key-Proxy

Length of output: 4065


🏁 Script executed:

# Look at the full handle_thinking_parameter method in nvidia_provider.py
sed -n '129,190p' src/rotator_library/providers/nvidia_provider.py

Repository: Mirrowel/LLM-API-Key-Proxy

Length of output: 2994


🏁 Script executed:

# Search for any code that converts or maps thinking to reasoning_effort
rg -n 'thinking.*reasoning_effort|reasoning_effort.*thinking' --type=py src

echo "---"

# Look for any preprocessing of Anthropic payloads before provider handler
rg -n 'normalize|convert.*anthropic|anthropic.*convert' --type=py src -i | head -40

Repository: Mirrowel/LLM-API-Key-Proxy

Length of output: 7221


🏁 Script executed:

# Look at the translator.py code around those lines
sed -n '60,70p' src/rotator_library/anthropic_compat/translator.py
sed -n '630,650p' src/rotator_library/anthropic_compat/translator.py

echo "---"

# Check where/how this translator is invoked
rg -n 'translator|translate_anthropic|from anthropic_compat' --type=py src | grep -v test | head -30

Repository: Mirrowel/LLM-API-Key-Proxy

Length of output: 2178


🏁 Script executed:

# Check around line 4436 in client.py to see the context
sed -n '4400,4450p' src/rotator_library/client.py

echo "---"

# Also check if translate_anthropic_request is called in the same code path as handle_thinking_parameter
# Look for where the Anthropic path is handled
rg -n 'translate_anthropic_request\|anthropic_compat' src/rotator_library/client.py | head -20

Repository: Mirrowel/LLM-API-Key-Proxy

Length of output: 2235


🏁 Script executed:

# Get more context from the Anthropic messages handler
sed -n '4430,4500p' src/rotator_library/client.py

echo "---"

# Check if there's a separate code path for Anthropic vs OpenAI
rg -n 'async def messages|async def acompletion' src/rotator_library/client.py | head -10

Repository: Mirrowel/LLM-API-Key-Proxy

Length of output: 3469


🏁 Script executed:

# Check the nvidia_provider class docstring and any comments about Anthropic format
sed -n '1,120p' src/rotator_library/providers/nvidia_provider.py

Repository: Mirrowel/LLM-API-Key-Proxy

Length of output: 4663


🏁 Script executed:

# Check if translate_anthropic_request properly converts thinking to reasoning_effort
sed -n '585,660p' src/rotator_library/anthropic_compat/translator.py

Repository: Mirrowel/LLM-API-Key-Proxy

Length of output: 2725


🏁 Script executed:

# Verify that ALL Anthropic requests go through translate_anthropic_request
# by checking if there are any other callers of acompletion with anthropic-style payloads
rg -n 'def messages\|async def messages' src/rotator_library/client.py -A 5

echo "---"

# Verify the nvidia_provider docstring says it handles "Anthropic-format requests"
# but check if it's ever called with actual Anthropic format (before translation)
sed -n '25,35p' src/rotator_library/providers/nvidia_provider.py

Repository: Mirrowel/LLM-API-Key-Proxy

Length of output: 433


🏁 Script executed:

# Check if nvidia_provider.handle_thinking_parameter is called directly anywhere else
rg -n 'provider.*handle_thinking_parameter\|NvidiaProvider' src/rotator_library/client.py | head -20

echo "---"

# Verify all acompletion calls that go to nvidia_nim provider
rg -n 'nvidia_nim|nvidia\|' src/rotator_library/client.py | grep -E 'provider|model' | head -10

Repository: Mirrowel/LLM-API-Key-Proxy

Length of output: 403


🏁 Script executed:

# Let me trace the complete flow one more time to be absolutely certain
# Check if there's any path where an OpenAI-format request could have Anthropic thinking

# First, confirm that messages() endpoint uses translate_anthropic_request
rg -n 'async def messages' src/rotator_library/client.py -A 50 | grep -E 'translate_anthropic_request|acompletion' | head -5

echo "---"

# Check the acompletion signature to see if it can receive Anthropic thinking
rg -n 'async def acompletion' src/rotator_library/client.py -A 2

Repository: Mirrowel/LLM-API-Key-Proxy

Length of output: 73


🏁 Script executed:

# Let me check if there's any way an OpenAI-format request could have Anthropic thinking field
# by looking at how requests are validated or constructed

# Check the request models/schemas
rg -n 'class.*Request|thinking.*field' src/rotator_library/anthropic_compat/ --type=py | head -20

echo "---"

# Verify by looking at the actual acompletion signature
sed -n '1900,1950p' src/rotator_library/client.py | grep -A 50 'async def acompletion'

Repository: Mirrowel/LLM-API-Key-Proxy

Length of output: 403


🏁 Script executed:

# Look at the _budget_to_reasoning_effort function to understand the mapping
sed -n '60,100p' src/rotator_library/anthropic_compat/translator.py

echo "---"

# Check if there's a comment explaining why thinking without budget_tokens is not converted
sed -n '636,650p' src/rotator_library/anthropic_compat/translator.py

Repository: Mirrowel/LLM-API-Key-Proxy

Length of output: 2358


🏁 Script executed:

# Check what the default thinking structure looks like from Anthropic
rg -n 'thinking.*enabled\|type.*enabled' src/rotator_library/anthropic_compat/ -B2 -A2 | head -30

echo "---"

# Verify if there's a test or example showing thinking without budget_tokens
rg -n 'budget_tokens.*None\|thinking.*{.*type' src/ --type=py -A2 | head -20

Repository: Mirrowel/LLM-API-Key-Proxy

Length of output: 243


🏁 Script executed:

# Check the Anthropic model to see the structure of thinking parameter
sed -n '80,130p' src/rotator_library/anthropic_compat/models.py

echo "---"

# Check if there are any tests or documentation showing thinking without budget_tokens
find src -name '*test*' -o -name '*example*' | xargs grep -l 'thinking' 2>/dev/null | head -5

Repository: Mirrowel/LLM-API-Key-Proxy

Length of output: 1486


🏁 Script executed:

# Verify the exact scenario: Anthropic thinking without budget_tokens
# Check what happens in the translator when budget_tokens is None
sed -n '636,655p' src/rotator_library/anthropic_compat/translator.py

echo "---"

# Now check nvidia provider again - does it have any fallback when reasoning_effort is missing?
sed -n '155,180p' src/rotator_library/providers/nvidia_provider.py

Repository: Mirrowel/LLM-API-Key-Proxy

Length of output: 2051


thinking without budget_tokens loses its signal on NVIDIA DeepSeek models.

When Anthropic requests have thinking: {type: "enabled"} but no budget_tokens, translate_anthropic_request() intentionally skips setting reasoning_effort (line 644: "don't set anything"). The NVIDIA provider then strips the thinking field at Step 1 and finds no reasoning_effort at Step 3, silently disabling reasoning on DeepSeek models.

The translator assumes such cases should "let the provider decide the default," but the provider has no way to know thinking was intended. Either:

  1. Set a default reasoning_effort (e.g., "medium") when thinking.type == "enabled" regardless of budget_tokens, or
  2. Add fallback logic in the NVIDIA provider to enable thinking by default for DeepSeek when thinking was present but reasoning_effort is absent.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/rotator_library/providers/nvidia_provider.py` around lines 157 - 170, The
provider currently strips Anthropic "thinking" before checking reasoning_effort,
losing the signal; change the logic so you detect Anthropic's thinking before
calling _strip_anthropic_fields (or preserve the original payload) and if
model_name is in _NVIDIA_DEEPSEEK_THINKING_MODELS and payload has a "thinking"
object but no reasoning_effort, set payload["reasoning_effort"] = "medium" (or
another chosen default) so the later block that sets
extra_body.chat_template_kwargs.thinking=True still runs; update the code around
_strip_anthropic_fields, _sanitize_messages and the reasoning_effort check to
reflect this fallback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Agent Monitored Monitored for AI Agent to review PR's and commits enhancement New feature or request Priority

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants