Skip to content

Conversation

@b3nw
Copy link
Contributor

@b3nw b3nw commented Jan 16, 2026

Implement quota tracking for Chutes provider using a simple standalone mixin pattern:

Core Implementation:

  • ChutesQuotaTracker: Standalone mixin (no complex base class inheritance)
  • Tracks credential-level quota (1 request = 1 credit consumed)
  • Daily quota reset at 00:00 UTC
  • Automatic tier detection (Legacy=200, Base=300, Plus=2000, Pro=5000)

Features:

  • fetch_quota_usage(): Queries Chutes API for quota/used credits
  • Background refresh job: Periodic quota updates via run_background_job()
  • Integration with UsageManager using virtual model 'chutes/_quota'
  • Configurable refresh interval via CHUTES_QUOTA_REFRESH_INTERVAL env var

Files:

  • NEW: chutes_quota_tracker.py (343 lines) - Standalone quota mixin
  • MODIFIED: chutes_provider.py - Add quota tracking + background job
  • MODIFIED: usage_manager.py - Add 'chutes' to _REQUEST_COUNT_PROVIDERS

API Integration:

Architecture: Simple standalone mixin for credential-level quota tracking.


Important

Adds quota tracking for Chutes provider using a mixin pattern, with daily resets and tier detection, integrated into UsageManager.

  • Behavior:
    • Introduces ChutesQuotaTracker mixin for quota tracking in chutes_quota_tracker.py.
    • Tracks credential-level quota with daily reset at 00:00 UTC.
    • Automatic tier detection based on quota values.
    • Integrates with UsageManager using virtual model chutes/_quota.
  • Features:
    • fetch_quota_usage(): Fetches quota usage from Chutes API.
    • Background job for periodic quota updates in chutes_provider.py.
    • Configurable refresh interval via CHUTES_QUOTA_REFRESH_INTERVAL.
  • Files:
    • NEW: chutes_quota_tracker.py for quota tracking mixin.
    • MODIFIED: chutes_provider.py to include quota tracking and background job.
    • MODIFIED: usage_manager.py to add 'chutes' to _REQUEST_COUNT_PROVIDERS.
  • API Integration:
    • Endpoint: GET https://api.chutes.ai/users/me/quota_usage/me.
    • Auth: Raw API key in Authorization header.
  • Architecture:
    • Simple standalone mixin for credential-level quota tracking.

This description was created by Ellipsis for 556658e. You can customize this summary. It will automatically update as commits are pushed.

@b3nw b3nw requested a review from Mirrowel as a code owner January 16, 2026 02:50
@mirrobot-agent
Copy link
Contributor

I'm starting my review of the Chutes quota tracking implementation. It looks like a solid addition with the new mixin pattern. Diving into the details now!

Copy link
Contributor

@mirrobot-agent mirrobot-agent bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall Assessment

This is a well-structured PR that implements quota tracking for the Chutes provider using a clean mixin pattern. The addition of the background refresh job and the integration with UsageManager via a virtual model are solid architectural choices.

However, there are a few points regarding performance and a potentially critical missing piece (quota grouping) that should be addressed to ensure the tracking works correctly across all models.

Architectural Feedback

The use of a mixin (ChutesQuotaTracker) is excellent for keeping the main provider class clean. The primary concern is the missing quota grouping. Without implementing get_model_quota_group in ChutesProvider, the UsageManager will treat each model as having its own independent quota, which contradicts the "credential-level" nature of Chutes quotas.

Key Suggestions

  • Implement Quota Grouping: Add get_model_quota_group to ChutesProvider to ensure all models share the same usage stats.
  • Parallelize Background Job: Use asyncio.gather in run_background_job to refresh multiple credentials efficiently.
  • Client Reuse: Allow passing an existing httpx.AsyncClient to fetch_quota_usage to avoid repeated overhead.

Questions for the Author

  • Is the authentication difference between get_models (Bearer) and fetch_quota_usage (Raw) intentional?
  • Does the Chutes API return any reset-related headers that we could use instead of the local 00:00 UTC calculation?

This review was generated by an AI assistant.

Implement quota tracking for Chutes provider using a simple standalone mixin pattern:

Core Implementation:
- ChutesQuotaTracker: Standalone mixin (no complex base class inheritance)
- Tracks credential-level quota (1 request = 1 credit consumed)
- Daily quota reset at 00:00 UTC
- Automatic tier detection (Legacy=200, Base=300, Plus=2000, Pro=5000)

Features:
- fetch_quota_usage(): Queries Chutes API for quota/used credits
- Background refresh job: Parallel quota updates via asyncio.gather
- Integration with UsageManager using virtual model 'chutes/_quota'
- get_model_quota_group(): Returns 'chutes_global' for shared quota pool
- Configurable refresh interval via CHUTES_QUOTA_REFRESH_INTERVAL env var

Performance:
- Shared httpx.AsyncClient for connection reuse
- Parallel credential fetching with semaphore (max 5 concurrent)

Files:
- NEW: chutes_quota_tracker.py - Standalone quota mixin
- MODIFIED: chutes_provider.py - Add quota tracking + background job
- MODIFIED: usage_manager.py - Add 'chutes' to _REQUEST_COUNT_PROVIDERS

API Integration:
- Endpoint: GET https://api.chutes.ai/users/me/quota_usage/me
- Returns: { quota: int, used: float }
- Auth: Bearer token in Authorization header

Architecture: Simple standalone mixin for credential-level quota tracking.
No OAuth complexity, no cost learning algorithms, no per-model quotas.
@b3nw b3nw force-pushed the feature/chutes-quota-system branch from 556658e to 8105c20 Compare January 16, 2026 03:07
Copy link
Contributor

@mirrobot-agent mirrobot-agent bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up Review

The author has done an excellent job of addressing the feedback from the previous review. The architecture is now more robust and efficient.

Assessment of New Changes:

  • Quota Grouping: Successfully implemented get_model_quota_group, ensuring shared usage tracking across all models.
  • Performance: Parallelized the background quota refresh job and implemented HTTP client reuse, which will reduce latency and overhead.
  • Consistency: Unified the authentication scheme to use Bearer tokens across all endpoints.
  • Integration: Correctly integrated 'chutes' into the UsageManager's request-count tracking.

Overall Status:
The PR is now in a high-quality state and ready for merging.

This review was generated by an AI assistant.

os.environ.get("CHUTES_QUOTA_REFRESH_INTERVAL", "300")
)

def get_model_quota_group(self, model: str) -> Optional[str]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix verified: Implementing get_model_quota_group correctly ensures that all Chutes models share the same credential-level quota pool in the UsageManager. This addresses the grouping issue raised in the previous review.

tasks = [
refresh_single_credential(api_key, client) for api_key in credentials
]
await asyncio.gather(*tasks, return_exceptions=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix verified: The background job now correctly parallelizes quota fetches using asyncio.gather and a shared httpx.AsyncClient. This significantly improves efficiency when managing multiple credentials, as suggested.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant