Skip to content

feat: add token usage tracking and Prometheus metrics endpoint#416

Open
zangxin75 wants to merge 2 commits into
tbphp:mainfrom
zangxin75:feat/token-usage-tracking
Open

feat: add token usage tracking and Prometheus metrics endpoint#416
zangxin75 wants to merge 2 commits into
tbphp:mainfrom
zangxin75:feat/token-usage-tracking

Conversation

@zangxin75
Copy link
Copy Markdown

@zangxin75 zangxin75 commented May 20, 2026

Summary

  • Parse token usage from upstream API responses (OpenAI, Anthropic, Gemini formats) and store in request_logs
  • Add 4 new columns to request_logs: prompt_tokens, completion_tokens, total_tokens, token_cost_usd
  • Expose aggregated metrics at GET /metrics in Prometheus exposition format
  • Database migration v1.2.0 for the new columns

Changes

File Change
internal/proxy/usage.go New — Token usage parsing from OpenAI/Anthropic/Gemini response bodies
internal/handler/metrics_handler.go New — Prometheus /metrics endpoint (requests, tokens, cost by group+model)
internal/db/migrations/v1_2_0_AddTokenColumns.go New — Add token columns to request_logs table
internal/proxy/response_handlers.go Buffer response body, extract token usage before forwarding to client
internal/proxy/server.go Thread *TokenUsage through logRequest into RequestLog
internal/models/types.go Add PromptTokens, CompletionTokens, TotalTokens, TokenCostUSD to RequestLog
internal/router/router.go Register GET /metrics route
internal/db/migrations/migration.go Chain v1.2.0 migration

Why

gpt-load currently proxies LLM API calls but has no visibility into token consumption or cost. These changes enable operators to:

  1. Track per-group, per-model token usage over time
  2. Export cost metrics to Prometheus/Grafana
  3. Build usage dashboards and billing reports

Test Plan

  • Run migration against a dev database (columns created successfully)
  • Send a chat completion request through the proxy, verify prompt_tokens/completion_tokens/total_tokens are populated in request_logs
  • Scrape GET /metrics and verify Prometheus format

Summary by CodeRabbit

  • New Features

    • Added a Prometheus-compatible /metrics endpoint (optional token protection) exposing request counts, token breakdowns (prompt/completion/total) and USD costs per model/group.
    • Server now captures token usage from responses to populate metrics and request records.
  • Chores

    • Database migrations updated to run the v1.2.0 token columns migration after v1.1.0.

Review Change Stack

Parse token usage from upstream API responses (OpenAI, Anthropic, Gemini)
and store in request_logs for cost analytics. Expose aggregated metrics
at GET /metrics in Prometheus exposition format.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 20, 2026

Walkthrough

This PR adds end-to-end token usage tracking and metrics exposure. The proxy captures token counts from LLM provider responses, persists them in a new database schema, and serves aggregated metrics in Prometheus format via a /metrics endpoint.

Changes

Token Usage Tracking and Metrics Exposure

Layer / File(s) Summary
Data model and database schema
internal/models/types.go, internal/db/migrations/migration.go, internal/db/migrations/v1_2_0_AddTokenColumns.go
RequestLog extended with PromptTokens, CompletionTokens, TotalTokens, and TokenCostUSD fields; migration orchestration updated to run V1_1_0 then V1_2_0, and V1_2_0 adds the four token columns if missing.
Token usage extraction utilities
internal/proxy/usage.go
New TokenUsage type and extractTokenUsage parse JSON responses to extract token counts from OpenAI usage objects (and Anthropic) and Gemini usageMetadata; isChatCompletionPath detects relevant endpoints.
Response handling and token capture
internal/proxy/response_handlers.go
handleNormalResponse now conditionally streams or buffers the upstream response; buffered successful chat/completion responses are decompressed if needed and passed to extractTokenUsage, with the extracted *TokenUsage returned.
Request logging with token persistence
internal/proxy/server.go
Internal logRequest signature extended to accept optional *TokenUsage; normal non-stream success path captures usage from handleNormalResponse and passes it to logging; error/streaming/retry paths pass nil; RequestLog population includes token and cost fields only when usage is present.
Metrics aggregation and HTTP endpoint
internal/handler/metrics_handler.go, internal/router/router.go
New Metrics handler aggregates successful non-stream request totals by group_name and model and emits Prometheus exposition-format metrics: gpt_load_requests_total, gpt_load_tokens_total (prompt/completion/total), and gpt_load_cost_total; route registered at GET /metrics with optional METRICS_TOKEN Bearer auth.

Sequence Diagram

sequenceDiagram
  participant Client
  participant ProxyServer
  participant Upstream as Upstream LLM
  participant Database
  participant MetricsHandler
  
  Client->>ProxyServer: request
  ProxyServer->>Upstream: forward request
  Upstream-->>ProxyServer: response with usage JSON
  ProxyServer->>ProxyServer: extractTokenUsage from response body
  ProxyServer->>Database: logRequest with captured TokenUsage
  Database->>Database: store PromptTokens, CompletionTokens, TotalTokens
  ProxyServer-->>Client: response body
  
  Client->>MetricsHandler: GET /metrics
  MetricsHandler->>Database: query RequestLog, aggregate by group_name/model
  MetricsHandler->>MetricsHandler: build Prometheus format (counts, tokens, cost)
  MetricsHandler-->>Client: text/plain metrics
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested labels

enhancement

Poem

🐰 I nibble bytes where tokens bloom,

I count each prompt and completion's doom,
Metrics hum in Prometheus song,
Logs now hold the token throng,
Hop, tally, report — the numbers long.

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 62.50% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check ❓ Inconclusive The description includes a summary, detailed file changes, context (why), and test plan. However, the description template requires a Related Issue number and a self-check checklist with specific boxes to mark, which are missing. Fill in the Related Issue field (Closes #) and mark the self-check boxes to confirm local testing and documentation updates were completed.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main changes: adding token usage tracking and a Prometheus metrics endpoint.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@internal/handler/metrics_handler.go`:
- Around line 41-43: The query currently uses Where("is_success = ?", true) and
thereby includes streaming requests; update the DB query in metrics_handler.go
(the code that builds the query with Where("is_success = ?", true),
Group("group_name, model").Scan(&results)) to also filter out streams by adding
a non-stream condition (e.g., Where("is_stream = ?", false) or equivalent
predicate) so that the computed total_requests reflects only successful
non-streaming requests; keep the same Group/Scan/result handling and ensure
variable names like results / total_requests remain unchanged.

In `@internal/proxy/response_handlers.go`:
- Around line 48-63: Currently the handler unconditionally reads the entire
resp.Body into memory (body, err := io.ReadAll(resp.Body)) which can OOM for
large responses; change the logic in the response handling code to only fully
buffer the body when usage extraction is needed (the path that calls
handleGzipCompression and inspects content for usage), and for all other
non-stream responses use streaming copy (io.Copy) from resp.Body to c.Writer to
avoid buffering. Keep existing error logging via logUpstreamError, ensure gzip
handling still occurs on the buffered branch (use handleGzipCompression only
when buffering), and use io.Copy for the non-buffered branch so large payloads
are streamed directly to c.Writer.

In `@internal/proxy/server.go`:
- Around line 342-347: The request logs set token counts on logEntry
(PromptTokens, CompletionTokens, TotalTokens) but never set TokenCostUSD, so
cost metrics remain zero; inside the same nil check for usage (the block that
assigns logEntry.PromptTokens etc.) add an assignment to set
logEntry.TokenCostUSD from the usage object (e.g., logEntry.TokenCostUSD =
usage.TokenCostUSD or usage.CostUSD depending on the usage field name), so the
TokenCostUSD field on logEntry is populated for downstream /metrics aggregation.

In `@internal/proxy/usage.go`:
- Around line 70-86: The current guards reject valid usage when one side is zero
because they only accept OpenAI when oai.TotalTokens>0 and Anthropic when
anthro.InputTokens>0; update the checks in internal/proxy/usage.go (around the
json.Unmarshal into oai and anthropicUsage) to accept a payload if any of its
token fields are present/non-zero (e.g., for oai: err==nil && (oai.TotalTokens>0
|| oai.PromptTokens>0 || oai.CompletionTokens>0); for anthro: err==nil &&
(anthro.InputTokens>0 || anthro.OutputTokens>0)), and compute TotalTokens as
oai.TotalTokens if present else sum of PromptTokens+CompletionTokens (and
similarly sum anthro.InputTokens+OutputTokens) so zero on one side is still
recorded instead of dropping the usage.

In `@internal/router/router.go`:
- Line 78: The /metrics route is currently exposed without protection via
router.GET("/metrics", serverHandler.Metrics); wrap this endpoint with an
authentication check or allowlist before registering it. Implement or reuse an
auth middleware (e.g., metricsAuthMiddleware or existing authMiddleware) that
validates a dedicated metrics token from config/env or checks the requester IP
against an allowlist, return 401/403 on failure, and register the route with
that middleware (e.g., router.GET("/metrics",
metricsAuthMiddleware(serverHandler.Metrics))). Ensure the token/allowlist is
configurable and documented so only authorized systems can access
serverHandler.Metrics.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 75760d10-1702-434c-80a6-d8ca908c9aab

📥 Commits

Reviewing files that changed from the base of the PR and between 275cc50 and ef931af.

📒 Files selected for processing (8)
  • internal/db/migrations/migration.go
  • internal/db/migrations/v1_2_0_AddTokenColumns.go
  • internal/handler/metrics_handler.go
  • internal/models/types.go
  • internal/proxy/response_handlers.go
  • internal/proxy/server.go
  • internal/proxy/usage.go
  • internal/router/router.go

Comment thread internal/handler/metrics_handler.go Outdated
Comment thread internal/proxy/response_handlers.go
Comment thread internal/proxy/server.go
Comment thread internal/proxy/usage.go Outdated
Comment thread internal/router/router.go Outdated
- Filter out streaming requests in /metrics query (is_stream = false)
- Stream non-usage responses via io.Copy instead of buffering
- Add CostUSD field to TokenUsage and assign in logRequest
- Accept usage when any token field is non-zero, not just total
- Add optional METRICS_TOKEN env var for /metrics endpoint auth

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
internal/proxy/server.go (1)

342-348: 💤 Low value

Fix inconsistent indentation on line 347.

Line 347 has extra indentation compared to the surrounding assignments. This should align with lines 344-346.

🧹 Proposed fix
 	if usage != nil {
 		logEntry.PromptTokens = usage.PromptTokens
 		logEntry.CompletionTokens = usage.CompletionTokens
 		logEntry.TotalTokens = usage.TotalTokens
-			logEntry.TokenCostUSD = usage.CostUSD
+		logEntry.TokenCostUSD = usage.CostUSD
 	}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/proxy/server.go` around lines 342 - 348, Fix the inconsistent
indentation in the block that sets token usage on logEntry: within the if usage
!= nil { ... } block (where PromptTokens, CompletionTokens, TotalTokens are
assigned), align the assignment to logEntry.TokenCostUSD with the other
assignments so it uses the same indentation level as logEntry.PromptTokens,
logEntry.CompletionTokens, and logEntry.TotalTokens in the code that sets token
usage from usage.CostUSD.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@internal/proxy/server.go`:
- Around line 342-348: Fix the inconsistent indentation in the block that sets
token usage on logEntry: within the if usage != nil { ... } block (where
PromptTokens, CompletionTokens, TotalTokens are assigned), align the assignment
to logEntry.TokenCostUSD with the other assignments so it uses the same
indentation level as logEntry.PromptTokens, logEntry.CompletionTokens, and
logEntry.TotalTokens in the code that sets token usage from usage.CostUSD.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 4c11f2c4-4122-4ce2-abd7-03521404f0a2

📥 Commits

Reviewing files that changed from the base of the PR and between ef931af and 77f11ff.

📒 Files selected for processing (5)
  • internal/handler/metrics_handler.go
  • internal/proxy/response_handlers.go
  • internal/proxy/server.go
  • internal/proxy/usage.go
  • internal/router/router.go

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant