feat: add token usage tracking and Prometheus metrics endpoint#416
feat: add token usage tracking and Prometheus metrics endpoint#416zangxin75 wants to merge 2 commits into
Conversation
Parse token usage from upstream API responses (OpenAI, Anthropic, Gemini) and store in request_logs for cost analytics. Expose aggregated metrics at GET /metrics in Prometheus exposition format. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
WalkthroughThis PR adds end-to-end token usage tracking and metrics exposure. The proxy captures token counts from LLM provider responses, persists them in a new database schema, and serves aggregated metrics in Prometheus format via a ChangesToken Usage Tracking and Metrics Exposure
Sequence DiagramsequenceDiagram
participant Client
participant ProxyServer
participant Upstream as Upstream LLM
participant Database
participant MetricsHandler
Client->>ProxyServer: request
ProxyServer->>Upstream: forward request
Upstream-->>ProxyServer: response with usage JSON
ProxyServer->>ProxyServer: extractTokenUsage from response body
ProxyServer->>Database: logRequest with captured TokenUsage
Database->>Database: store PromptTokens, CompletionTokens, TotalTokens
ProxyServer-->>Client: response body
Client->>MetricsHandler: GET /metrics
MetricsHandler->>Database: query RequestLog, aggregate by group_name/model
MetricsHandler->>MetricsHandler: build Prometheus format (counts, tokens, cost)
MetricsHandler-->>Client: text/plain metrics
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Suggested labels
Poem
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@internal/handler/metrics_handler.go`:
- Around line 41-43: The query currently uses Where("is_success = ?", true) and
thereby includes streaming requests; update the DB query in metrics_handler.go
(the code that builds the query with Where("is_success = ?", true),
Group("group_name, model").Scan(&results)) to also filter out streams by adding
a non-stream condition (e.g., Where("is_stream = ?", false) or equivalent
predicate) so that the computed total_requests reflects only successful
non-streaming requests; keep the same Group/Scan/result handling and ensure
variable names like results / total_requests remain unchanged.
In `@internal/proxy/response_handlers.go`:
- Around line 48-63: Currently the handler unconditionally reads the entire
resp.Body into memory (body, err := io.ReadAll(resp.Body)) which can OOM for
large responses; change the logic in the response handling code to only fully
buffer the body when usage extraction is needed (the path that calls
handleGzipCompression and inspects content for usage), and for all other
non-stream responses use streaming copy (io.Copy) from resp.Body to c.Writer to
avoid buffering. Keep existing error logging via logUpstreamError, ensure gzip
handling still occurs on the buffered branch (use handleGzipCompression only
when buffering), and use io.Copy for the non-buffered branch so large payloads
are streamed directly to c.Writer.
In `@internal/proxy/server.go`:
- Around line 342-347: The request logs set token counts on logEntry
(PromptTokens, CompletionTokens, TotalTokens) but never set TokenCostUSD, so
cost metrics remain zero; inside the same nil check for usage (the block that
assigns logEntry.PromptTokens etc.) add an assignment to set
logEntry.TokenCostUSD from the usage object (e.g., logEntry.TokenCostUSD =
usage.TokenCostUSD or usage.CostUSD depending on the usage field name), so the
TokenCostUSD field on logEntry is populated for downstream /metrics aggregation.
In `@internal/proxy/usage.go`:
- Around line 70-86: The current guards reject valid usage when one side is zero
because they only accept OpenAI when oai.TotalTokens>0 and Anthropic when
anthro.InputTokens>0; update the checks in internal/proxy/usage.go (around the
json.Unmarshal into oai and anthropicUsage) to accept a payload if any of its
token fields are present/non-zero (e.g., for oai: err==nil && (oai.TotalTokens>0
|| oai.PromptTokens>0 || oai.CompletionTokens>0); for anthro: err==nil &&
(anthro.InputTokens>0 || anthro.OutputTokens>0)), and compute TotalTokens as
oai.TotalTokens if present else sum of PromptTokens+CompletionTokens (and
similarly sum anthro.InputTokens+OutputTokens) so zero on one side is still
recorded instead of dropping the usage.
In `@internal/router/router.go`:
- Line 78: The /metrics route is currently exposed without protection via
router.GET("/metrics", serverHandler.Metrics); wrap this endpoint with an
authentication check or allowlist before registering it. Implement or reuse an
auth middleware (e.g., metricsAuthMiddleware or existing authMiddleware) that
validates a dedicated metrics token from config/env or checks the requester IP
against an allowlist, return 401/403 on failure, and register the route with
that middleware (e.g., router.GET("/metrics",
metricsAuthMiddleware(serverHandler.Metrics))). Ensure the token/allowlist is
configurable and documented so only authorized systems can access
serverHandler.Metrics.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 75760d10-1702-434c-80a6-d8ca908c9aab
📒 Files selected for processing (8)
internal/db/migrations/migration.gointernal/db/migrations/v1_2_0_AddTokenColumns.gointernal/handler/metrics_handler.gointernal/models/types.gointernal/proxy/response_handlers.gointernal/proxy/server.gointernal/proxy/usage.gointernal/router/router.go
- Filter out streaming requests in /metrics query (is_stream = false) - Stream non-usage responses via io.Copy instead of buffering - Add CostUSD field to TokenUsage and assign in logRequest - Accept usage when any token field is non-zero, not just total - Add optional METRICS_TOKEN env var for /metrics endpoint auth Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
There was a problem hiding this comment.
🧹 Nitpick comments (1)
internal/proxy/server.go (1)
342-348: 💤 Low valueFix inconsistent indentation on line 347.
Line 347 has extra indentation compared to the surrounding assignments. This should align with lines 344-346.
🧹 Proposed fix
if usage != nil { logEntry.PromptTokens = usage.PromptTokens logEntry.CompletionTokens = usage.CompletionTokens logEntry.TotalTokens = usage.TotalTokens - logEntry.TokenCostUSD = usage.CostUSD + logEntry.TokenCostUSD = usage.CostUSD }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/proxy/server.go` around lines 342 - 348, Fix the inconsistent indentation in the block that sets token usage on logEntry: within the if usage != nil { ... } block (where PromptTokens, CompletionTokens, TotalTokens are assigned), align the assignment to logEntry.TokenCostUSD with the other assignments so it uses the same indentation level as logEntry.PromptTokens, logEntry.CompletionTokens, and logEntry.TotalTokens in the code that sets token usage from usage.CostUSD.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@internal/proxy/server.go`:
- Around line 342-348: Fix the inconsistent indentation in the block that sets
token usage on logEntry: within the if usage != nil { ... } block (where
PromptTokens, CompletionTokens, TotalTokens are assigned), align the assignment
to logEntry.TokenCostUSD with the other assignments so it uses the same
indentation level as logEntry.PromptTokens, logEntry.CompletionTokens, and
logEntry.TotalTokens in the code that sets token usage from usage.CostUSD.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 4c11f2c4-4122-4ce2-abd7-03521404f0a2
📒 Files selected for processing (5)
internal/handler/metrics_handler.gointernal/proxy/response_handlers.gointernal/proxy/server.gointernal/proxy/usage.gointernal/router/router.go
Summary
request_logsrequest_logs:prompt_tokens,completion_tokens,total_tokens,token_cost_usdGET /metricsin Prometheus exposition formatChanges
internal/proxy/usage.gointernal/handler/metrics_handler.gointernal/db/migrations/v1_2_0_AddTokenColumns.gointernal/proxy/response_handlers.gointernal/proxy/server.go*TokenUsagethroughlogRequestintoRequestLoginternal/models/types.gointernal/router/router.gointernal/db/migrations/migration.goWhy
gpt-load currently proxies LLM API calls but has no visibility into token consumption or cost. These changes enable operators to:
Test Plan
prompt_tokens/completion_tokens/total_tokensare populated inrequest_logsGET /metricsand verify Prometheus formatSummary by CodeRabbit
New Features
Chores