feat: add token usage tracking and Prometheus metrics endpoint by zangxin75 · Pull Request #416 · tbphp/gpt-load

zangxin75 · 2026-05-20T03:15:46Z

Summary

Parse token usage from upstream API responses (OpenAI, Anthropic, Gemini formats) and store in request_logs
Add 4 new columns to request_logs: prompt_tokens, completion_tokens, total_tokens, token_cost_usd
Expose aggregated metrics at GET /metrics in Prometheus exposition format
Database migration v1.2.0 for the new columns

Changes

File	Change
`internal/proxy/usage.go`	New — Token usage parsing from OpenAI/Anthropic/Gemini response bodies
`internal/handler/metrics_handler.go`	New — Prometheus /metrics endpoint (requests, tokens, cost by group+model)
`internal/db/migrations/v1_2_0_AddTokenColumns.go`	New — Add token columns to request_logs table
`internal/proxy/response_handlers.go`	Buffer response body, extract token usage before forwarding to client
`internal/proxy/server.go`	Thread `*TokenUsage` through `logRequest` into `RequestLog`
`internal/models/types.go`	Add PromptTokens, CompletionTokens, TotalTokens, TokenCostUSD to RequestLog
`internal/router/router.go`	Register GET /metrics route
`internal/db/migrations/migration.go`	Chain v1.2.0 migration

Why

gpt-load currently proxies LLM API calls but has no visibility into token consumption or cost. These changes enable operators to:

Track per-group, per-model token usage over time
Export cost metrics to Prometheus/Grafana
Build usage dashboards and billing reports

Test Plan

Run migration against a dev database (columns created successfully)
Send a chat completion request through the proxy, verify prompt_tokens/completion_tokens/total_tokens are populated in request_logs
Scrape GET /metrics and verify Prometheus format

Summary by CodeRabbit

New Features
- Added a Prometheus-compatible /metrics endpoint (optional token protection) exposing request counts, token breakdowns (prompt/completion/total) and USD costs per model/group.
- Server now captures token usage from responses to populate metrics and request records.
Chores
- Database migrations updated to run the v1.2.0 token columns migration after v1.1.0.

Parse token usage from upstream API responses (OpenAI, Anthropic, Gemini) and store in request_logs for cost analytics. Expose aggregated metrics at GET /metrics in Prometheus exposition format. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

coderabbitai · 2026-05-20T03:15:59Z

Walkthrough

This PR adds end-to-end token usage tracking and metrics exposure. The proxy captures token counts from LLM provider responses, persists them in a new database schema, and serves aggregated metrics in Prometheus format via a /metrics endpoint.

Changes

Token Usage Tracking and Metrics Exposure

Layer / File(s)	Summary
Data model and database schema `internal/models/types.go`, `internal/db/migrations/migration.go`, `internal/db/migrations/v1_2_0_AddTokenColumns.go`	`RequestLog` extended with `PromptTokens`, `CompletionTokens`, `TotalTokens`, and `TokenCostUSD` fields; migration orchestration updated to run V1_1_0 then V1_2_0, and V1_2_0 adds the four token columns if missing.
Token usage extraction utilities `internal/proxy/usage.go`	New `TokenUsage` type and `extractTokenUsage` parse JSON responses to extract token counts from OpenAI `usage` objects (and Anthropic) and Gemini `usageMetadata`; `isChatCompletionPath` detects relevant endpoints.
Response handling and token capture `internal/proxy/response_handlers.go`	`handleNormalResponse` now conditionally streams or buffers the upstream response; buffered successful chat/completion responses are decompressed if needed and passed to `extractTokenUsage`, with the extracted `*TokenUsage` returned.
Request logging with token persistence `internal/proxy/server.go`	Internal `logRequest` signature extended to accept optional `*TokenUsage`; normal non-stream success path captures usage from `handleNormalResponse` and passes it to logging; error/streaming/retry paths pass `nil`; `RequestLog` population includes token and cost fields only when usage is present.
Metrics aggregation and HTTP endpoint `internal/handler/metrics_handler.go`, `internal/router/router.go`	New `Metrics` handler aggregates successful non-stream request totals by `group_name` and `model` and emits Prometheus exposition-format metrics: `gpt_load_requests_total`, `gpt_load_tokens_total` (prompt/completion/total), and `gpt_load_cost_total`; route registered at `GET /metrics` with optional `METRICS_TOKEN` Bearer auth.

Sequence Diagram

sequenceDiagram
  participant Client
  participant ProxyServer
  participant Upstream as Upstream LLM
  participant Database
  participant MetricsHandler
  
  Client->>ProxyServer: request
  ProxyServer->>Upstream: forward request
  Upstream-->>ProxyServer: response with usage JSON
  ProxyServer->>ProxyServer: extractTokenUsage from response body
  ProxyServer->>Database: logRequest with captured TokenUsage
  Database->>Database: store PromptTokens, CompletionTokens, TotalTokens
  ProxyServer-->>Client: response body
  
  Client->>MetricsHandler: GET /metrics
  MetricsHandler->>Database: query RequestLog, aggregate by group_name/model
  MetricsHandler->>MetricsHandler: build Prometheus format (counts, tokens, cost)
  MetricsHandler-->>Client: text/plain metrics

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested labels

enhancement

Poem

🐰 I nibble bytes where tokens bloom,

I count each prompt and completion's doom,
Metrics hum in Prometheus song,
Logs now hold the token throng,
Hop, tally, report — the numbers long.

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 62.50% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check	❓ Inconclusive	The description includes a summary, detailed file changes, context (why), and test plan. However, the description template requires a Related Issue number and a self-check checklist with specific boxes to mark, which are missing.	Fill in the Related Issue field (Closes #) and mark the self-check boxes to confirm local testing and documentation updates were completed.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately describes the main changes: adding token usage tracking and a Prometheus metrics endpoint.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@internal/handler/metrics_handler.go`:
- Around line 41-43: The query currently uses Where("is_success = ?", true) and
thereby includes streaming requests; update the DB query in metrics_handler.go
(the code that builds the query with Where("is_success = ?", true),
Group("group_name, model").Scan(&results)) to also filter out streams by adding
a non-stream condition (e.g., Where("is_stream = ?", false) or equivalent
predicate) so that the computed total_requests reflects only successful
non-streaming requests; keep the same Group/Scan/result handling and ensure
variable names like results / total_requests remain unchanged.

In `@internal/proxy/response_handlers.go`:
- Around line 48-63: Currently the handler unconditionally reads the entire
resp.Body into memory (body, err := io.ReadAll(resp.Body)) which can OOM for
large responses; change the logic in the response handling code to only fully
buffer the body when usage extraction is needed (the path that calls
handleGzipCompression and inspects content for usage), and for all other
non-stream responses use streaming copy (io.Copy) from resp.Body to c.Writer to
avoid buffering. Keep existing error logging via logUpstreamError, ensure gzip
handling still occurs on the buffered branch (use handleGzipCompression only
when buffering), and use io.Copy for the non-buffered branch so large payloads
are streamed directly to c.Writer.

In `@internal/proxy/server.go`:
- Around line 342-347: The request logs set token counts on logEntry
(PromptTokens, CompletionTokens, TotalTokens) but never set TokenCostUSD, so
cost metrics remain zero; inside the same nil check for usage (the block that
assigns logEntry.PromptTokens etc.) add an assignment to set
logEntry.TokenCostUSD from the usage object (e.g., logEntry.TokenCostUSD =
usage.TokenCostUSD or usage.CostUSD depending on the usage field name), so the
TokenCostUSD field on logEntry is populated for downstream /metrics aggregation.

In `@internal/proxy/usage.go`:
- Around line 70-86: The current guards reject valid usage when one side is zero
because they only accept OpenAI when oai.TotalTokens>0 and Anthropic when
anthro.InputTokens>0; update the checks in internal/proxy/usage.go (around the
json.Unmarshal into oai and anthropicUsage) to accept a payload if any of its
token fields are present/non-zero (e.g., for oai: err==nil && (oai.TotalTokens>0
|| oai.PromptTokens>0 || oai.CompletionTokens>0); for anthro: err==nil &&
(anthro.InputTokens>0 || anthro.OutputTokens>0)), and compute TotalTokens as
oai.TotalTokens if present else sum of PromptTokens+CompletionTokens (and
similarly sum anthro.InputTokens+OutputTokens) so zero on one side is still
recorded instead of dropping the usage.

In `@internal/router/router.go`:
- Line 78: The /metrics route is currently exposed without protection via
router.GET("/metrics", serverHandler.Metrics); wrap this endpoint with an
authentication check or allowlist before registering it. Implement or reuse an
auth middleware (e.g., metricsAuthMiddleware or existing authMiddleware) that
validates a dedicated metrics token from config/env or checks the requester IP
against an allowlist, return 401/403 on failure, and register the route with
that middleware (e.g., router.GET("/metrics",
metricsAuthMiddleware(serverHandler.Metrics))). Ensure the token/allowlist is
configurable and documented so only authorized systems can access
serverHandler.Metrics.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 75760d10-1702-434c-80a6-d8ca908c9aab

📥 Commits

Reviewing files that changed from the base of the PR and between 275cc50 and ef931af.

📒 Files selected for processing (8)

internal/db/migrations/migration.go
internal/db/migrations/v1_2_0_AddTokenColumns.go
internal/handler/metrics_handler.go
internal/models/types.go
internal/proxy/response_handlers.go
internal/proxy/server.go
internal/proxy/usage.go
internal/router/router.go

- Filter out streaming requests in /metrics query (is_stream = false) - Stream non-usage responses via io.Copy instead of buffering - Add CostUSD field to TokenUsage and assign in logRequest - Accept usage when any token field is non-zero, not just total - Add optional METRICS_TOKEN env var for /metrics endpoint auth Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

coderabbitai

🧹 Nitpick comments (1)

internal/proxy/server.go (1)

342-348: 💤 Low value

Fix inconsistent indentation on line 347.

Line 347 has extra indentation compared to the surrounding assignments. This should align with lines 344-346.

🧹 Proposed fix

 	if usage != nil {
 		logEntry.PromptTokens = usage.PromptTokens
 		logEntry.CompletionTokens = usage.CompletionTokens
 		logEntry.TotalTokens = usage.TotalTokens
-			logEntry.TokenCostUSD = usage.CostUSD
+		logEntry.TokenCostUSD = usage.CostUSD
 	}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/proxy/server.go` around lines 342 - 348, Fix the inconsistent
indentation in the block that sets token usage on logEntry: within the if usage
!= nil { ... } block (where PromptTokens, CompletionTokens, TotalTokens are
assigned), align the assignment to logEntry.TokenCostUSD with the other
assignments so it uses the same indentation level as logEntry.PromptTokens,
logEntry.CompletionTokens, and logEntry.TotalTokens in the code that sets token
usage from usage.CostUSD.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@internal/proxy/server.go`:
- Around line 342-348: Fix the inconsistent indentation in the block that sets
token usage on logEntry: within the if usage != nil { ... } block (where
PromptTokens, CompletionTokens, TotalTokens are assigned), align the assignment
to logEntry.TokenCostUSD with the other assignments so it uses the same
indentation level as logEntry.PromptTokens, logEntry.CompletionTokens, and
logEntry.TotalTokens in the code that sets token usage from usage.CostUSD.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 4c11f2c4-4122-4ce2-abd7-03521404f0a2

📥 Commits

Reviewing files that changed from the base of the PR and between ef931af and 77f11ff.

📒 Files selected for processing (5)

internal/handler/metrics_handler.go
internal/proxy/response_handlers.go
internal/proxy/server.go
internal/proxy/usage.go
internal/router/router.go

coderabbitai Bot reviewed May 20, 2026

View reviewed changes

Comment thread internal/handler/metrics_handler.go Outdated

Comment thread internal/proxy/response_handlers.go

Comment thread internal/proxy/server.go

Comment thread internal/proxy/usage.go Outdated

Comment thread internal/router/router.go Outdated

coderabbitai Bot reviewed May 20, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add token usage tracking and Prometheus metrics endpoint#416

feat: add token usage tracking and Prometheus metrics endpoint#416
zangxin75 wants to merge 2 commits into
tbphp:mainfrom
zangxin75:feat/token-usage-tracking

zangxin75 commented May 20, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 20, 2026 •

edited

Loading

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

zangxin75 commented May 20, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Why

Test Plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Suggested labels

Poem

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

zangxin75 commented May 20, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 20, 2026 •

edited

Loading