Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions .changeset/openrouter-cost-tracking.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
---
'@tanstack/ai-openrouter': minor
'@tanstack/ai': minor
---

- Add OpenRouter cost tracking. The `OpenRouterTextAdapter` now attaches OpenRouter's authoritative per-request cost amount to the `RUN_FINISHED` event under `usage.cost`, along with numeric/null fields from `usage.cost_details` under `usage.costDetails`. Cost is sourced from OpenRouter's chat completion response itself (the field arrives in the trailing SSE chunk), so there is no extra HTTP request and no added end-of-stream latency.
- Cost is always sourced from OpenRouter — never computed locally from token counts and a price table — because OpenRouter routes the same model id to different upstream providers (primary, fallback, BYOK) with different pricing, and may expose provider-specific cost breakdowns the SDK cannot reconstruct.
- Cost is captured via the SDK's public `HTTPClient` response hook. The hook calls `Response.clone()` and parses the cloned body out-of-band to recover `usage.cost` and `usage.cost_details`, which the @openrouter/sdk chat-completion parser would otherwise strip. The SDK's stream consumer is unaffected — both clones are read independently.
- Custom `httpClient` values passed into the adapter are preserved: the adapter clones the caller's client (inheriting their fetcher, retries, tracing, and any pre-registered hooks) and appends the cost-capture hook to the clone, so the caller's original instance is never mutated and cost tracking still works when callers supply their own transport.
- Defer the OpenRouter `RUN_FINISHED` emission until after the upstream stream fully drains, so token usage that arrives in a trailing usage-only chunk (the common case for OpenAI-compatible providers, where the final chunk has empty `choices`) is included in `usage` instead of being dropped.
- Extend `RunFinishedEvent.usage` in `@tanstack/ai` with optional `cost` and `costDetails` fields. The middleware `UsageInfo` (consumed by `onUsage`) and `FinishInfo.usage` (consumed by `onFinish`) carry the same fields, so middleware authors can read cost without casts. The change is additive and backwards-compatible for adapters that don't populate cost.
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,7 @@ A powerful, type-safe AI SDK for building AI-powered applications.
- Isomorphic type-safe tools with server/client execution
- **Enhanced integration with TanStack Start** - Share implementations between AI tools and server functions
- **Observability events** - Structured, typed events for text, tools, image, speech, transcription, and video ([docs](./docs/guides/observability.md))
- **Cost tracking** - Per-request cost on `RUN_FINISHED` for providers that report it (currently [OpenRouter](./docs/adapters/openrouter.md#cost-tracking))

### <a href="https://tanstack.com/ai">Read the docs →</a>

Expand Down
26 changes: 25 additions & 1 deletion docs/adapters/openrouter.md
Original file line number Diff line number Diff line change
Expand Up @@ -115,6 +115,31 @@ Set your API key in environment variables:
OPENROUTER_API_KEY=sk-or-...
```

## Cost Tracking

The OpenRouter adapter attaches the authoritative per-request cost to the `RUN_FINISHED` event under `usage.cost`. OpenRouter [reports cost inline in every chat response](https://openrouter.ai/docs/use-cases/usage-accounting), in credits, so cost arrives in the same SSE stream as the model output — there is **no extra HTTP request** and **no added latency**.

Why we don't compute cost locally from tokens × price: OpenRouter routes the same model id to different upstream providers (primary, fallback, BYOK), each with different pricing, and may include cached-token pricing or BYOK upstream costs. A static price table would silently drift and produce wrong numbers.

```typescript
import { chat } from "@tanstack/ai";
import { openRouterText } from "@tanstack/ai-openrouter";

const stream = chat({
adapter: openRouterText("openai/gpt-5"),
messages: [{ role: "user", content: "Hello!" }],
});

for await (const chunk of stream) {
if (chunk.type === "RUN_FINISHED") {
console.log("OpenRouter cost:", chunk.usage?.cost);
console.log("Upstream inference cost:", chunk.usage?.costDetails?.upstreamInferenceCost);
}
}
```

If a particular response doesn't include cost (rare — for example if a future OpenRouter response shape change moves the field), `RUN_FINISHED` still emits with token usage; only the `cost` field is omitted. Cost tracking will never break a chat stream.

## Model Routing

OpenRouter can automatically route requests to the best available provider:
Expand Down Expand Up @@ -177,4 +202,3 @@ const stream = chat({
```

**Supported models:** all OpenRouter chat models. See [Provider Tools](../tools/provider-tools.md#which-models-support-which-tools).

Loading