Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
71 changes: 47 additions & 24 deletions CHANGELOG.md

Large diffs are not rendered by default.

14 changes: 12 additions & 2 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,7 @@ Post-ingest hooks run synchronously after each span is written to DB:
- **`tokenjam/core/export/`**: Routing-config snippet generators for `tj optimize --export-config`. Currently `claude_code.py` emits a JSONC fragment under a `tokenjam.routing_recommendations` namespace with honest-framing caveat comments baked in. Writes to `~/.config/tokenjam/exports/`; never touches `~/.claude/settings.json` or other external configs (no `--apply` flag — Claude Code doesn't currently honor TokenJam routing keys, so auto-writing would change nothing and erode trust).
- **`tokenjam/core/backfill.py`**: Parses Claude Code on-disk session JSONL files into `NormalizedSpan`s. Cost is recomputed from `pricing/models.toml` because the on-disk format has no `cost_usd`. The parser tolerates the dated `claude-<family>-<ver>-YYYYMMDD` model-name suffixes Anthropic ships (handled by `core/pricing.py.get_rates()`, which strips the trailing 8-digit date suffix when no exact pricing match exists). Idempotency relies on deterministic span IDs derived from `(session_id, message uuid)` / `(session_id, tool_use id)`.
- **`tokenjam/core/schema_validator.py`**: Validates tool outputs against declared or genson-inferred JSON Schema. Only fires on `gen_ai.tool.call` spans with `gen_ai.tool.output` in attributes. Schema priority: 1) declared file from agent config `output_schema`, 2) inferred schema from `DriftBaseline.output_schema_inferred`. Caches schemas in-memory per agent.
- **`tokenjam/core/models.py`**: All domain dataclasses — `NormalizedSpan`, `SessionRecord`, `Alert`, `DriftBaseline`, filter types, etc. `NormalizedSpan` carries `billing_account` (provider-only: `anthropic` / `openai` / `google` / `bedrock` / `local.ollama`). `SessionRecord` carries `plan_tier` (api / pro / max_5x / max_20x / plus / team / enterprise / local / unknown) plus a derived `pricing_mode` property (`local` / `subscription` / `api` / `unknown`). Spans inherit plan via the session FK — analyzers JOIN through `SessionRecord` when they need plan context.
- **`tokenjam/core/models.py`**: All domain dataclasses — `NormalizedSpan`, `SessionRecord`, `Alert`, `DriftBaseline`, filter types, etc. `NormalizedSpan` carries `billing_account` (provider-only: `anthropic` / `openai` / `google` / `bedrock` / `local.ollama`). `SessionRecord` carries `plan_tier` (api / pro / max_5x / max_20x / plus / team / enterprise / local / unknown) plus a derived `pricing_mode` property (`local` / `subscription` / `api` / `unknown`). Spans inherit plan via the session FK — analyzers JOIN through `SessionRecord` when they need plan context. See [`docs/architecture.md`](docs/architecture.md) → "OTel semconv extensions" for the full derivation rules.
- **`tokenjam/core/config.py`**: `TjConfig` dataclass tree, TOML loading/writing, config file discovery. `ProviderBudget` carries an optional `plan` field (set by `tj onboard`'s plan-tier prompt) that `IngestPipeline._build_or_update_session` reads to populate `SessionRecord.plan_tier` at session creation. `CaptureConfig` has four fine-grained content-capture toggles (`prompts` / `completions` / `tool_inputs` / `tool_outputs`); `strip_captured_content()` in `core/ingest.py` enforces them at the single ingest-pipeline gate.
- **`tokenjam/sdk/agent.py`**: `@watch()` decorator creates session spans only. `record_llm_call()` and `record_tool_call()` create child spans for manual instrumentation. LLM call spans from provider clients require `patch_anthropic()`, `patch_openai()`, etc.
- **`tokenjam/sdk/transport.py`**: `HttpTransport` — buffers up to 1000 spans, retries with exponential backoff (3 attempts, 2s base). Used when `tj serve` runs as a separate process.
Expand Down Expand Up @@ -229,4 +229,14 @@ Key runtime dependency: `pytz` is required by DuckDB for `TIMESTAMPTZ` column ha

## Further Reading

- **[docs/architecture.md](docs/architecture.md)** — comprehensive architecture document covering design principles, system overview, data flow, SDK internals, alert system, drift detection, MCP server, Claude Code integration, budget system, and testing architecture.
- **[docs/architecture.md](docs/architecture.md)** — design principles, system overview, data flow, SDK internals, alert system, drift detection, MCP server, Claude Code integration, budget system, testing architecture, and the **OTel semconv extensions** section documenting `tokenjam.billing_account` (span attribute) and `tokenjam.plan_tier` (session-level), the `pricing_mode` derivation rules, and why `plan_tier` lives on `SessionRecord` rather than each span.
- **[docs/installation.md](docs/installation.md)** — base install vs optional extras matrix. Documents `tokenjam[bloat]` (the ~2GB torch + transformers extra used by the Trim analyzer), framework adapter extras (`[langchain]` / `[crewai]` / `[autogen]` / `[litellm]`), and the MCP / dev extras.
- **[docs/configuration.md](docs/configuration.md)** — full TOML config surface plus the "Content capture and privacy" section explaining the four `[capture]` toggles and how they interact with `alerts.include_captured_content`.
- **Optimize product pages** — one per user-facing product, all under `docs/optimize/`:
- [`downsize.md`](docs/optimize/downsize.md) — model-downgrade candidate flagging (internal: `model-downgrade`)
- [`cache.md`](docs/optimize/cache.md) — `cache-efficacy` (current caching ratio) + `cache-recommend` (Anthropic-only breakpoint suggestions)
- [`script.md`](docs/optimize/script.md) — `workflow-restructure` clustering by `(tool_name, arg_shape)` signature
- [`trim.md`](docs/optimize/trim.md) — LLMLingua-2 token-significance classifier (`prompt-bloat`), install + capture requirements, performance numbers
- **Backfill adapters** — `docs/backfill/overview.md` lists the four sources (`claude-code` / `langfuse` / `helicone` / `otlp`) with the partnership-posture framing; per-adapter pages document modes (URL / file), field mapping, idempotency, and v1 limitations.
- **[docs/policy/overview.md](docs/policy/overview.md)** — read-only preview of the unified policy surface (`tj policy list`). Notes that the `add` / `edit` / `apply` subcommands and the underlying `[policy]` config migration land next sprint.
- **Internal specs** — `docs/internal/specs/` is reserved for canonical specs that production code references at long-term. Currently empty (sprint specs have been cleaned up after merge); add new ones here when a feature needs a stable, code-referenced source of truth.
46 changes: 46 additions & 0 deletions docs/architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -377,6 +377,52 @@ The converted spans flow through the standard `IngestPipeline.process()` path

---

## OTel semconv extensions: `billing_account` and `plan_tier`

TokenJam extends the standard OTel GenAI semconv with two attributes for plan-tier-aware cost rendering. Both live in `TjAttributes` in `tokenjam/otel/semconv.py`:

| Attribute | Constant | Lives on | Set by |
|---|---|---|---|
| `tokenjam.billing_account` | `TjAttributes.BILLING_ACCOUNT` | Spans | Every integration (OTel patches, Claude Code JSONL backfill, OTLP HTTP ingest, OTLP logs ingest, Langfuse/Helicone backfill adapters) |
| `tokenjam.plan_tier` | `TjAttributes.PLAN_TIER` | `SessionRecord` (DB column, not a span attribute) | `IngestPipeline._build_or_update_session` reads `ProviderBudget.plan` for the session's `billing_account` |

### `billing_account`

Provider-only identifier. Valid values: `anthropic`, `openai`, `google`, `bedrock`, `local.ollama`. Distinct from `gen_ai.system` (which can include framework-level wrappers like `litellm` or `langchain`) — `billing_account` is whichever provider actually billed the call.

It is **not** a composite. No API-key fingerprint, no plan tier, no account-name suffix. Multi-key disambiguation is deferred until someone asks for it.

### `plan_tier`

The user's declared plan for the relevant provider. Valid values are enumerated in `VALID_PLAN_TIERS`:

```python
{"api", "pro", "max_5x", "max_20x", "plus", "team", "enterprise", "local", "unknown"}
```

`SUBSCRIPTION_PLAN_TIERS` is a sub-set: the plans where "spend" is a flat fee and the user isn't paying per-token. Renderers branch on this to avoid presenting a per-token dollar "spend" claim against a subscription plan.

### `pricing_mode` derived property

`SessionRecord.pricing_mode` is a derived Python property — **not** a stored DB column. It maps `plan_tier` to one of four rendering modes, evaluated top-to-bottom (first match wins):

1. `local` if `billing_account == "local.ollama"`
2. `subscription` if `plan_tier in SUBSCRIPTION_PLAN_TIERS`
3. `api` if `plan_tier == "api"`
4. `unknown` if `plan_tier == "unknown"`

Renderers (`tj optimize`, `tj cost`, the web UI cost views) read `pricing_mode` and pick the appropriate framing. `tj optimize` suppresses dollar figures entirely when the entire window is `pricing_mode = unknown`.

### Why a session-level column, not a span attribute

`plan_tier` doesn't change call-to-call within a session. Storing it on each span would duplicate the value across thousands of rows and create skew if a provider plan changes mid-session. Storing on `SessionRecord` keeps it normalized and lets analyzers JOIN through to it.

### Backfilled sessions

`SessionRecord.plan_tier` defaults to `unknown` for backfilled rows (no plan signal in the source data). `tj status` surfaces a one-line note when unknown-tier sessions exist; `tj optimize` refuses to render dollar figures for those sessions until the user resolves them via `tj onboard --reconfigure`.

---

## Budget system

### CLI (`tj budget`)
Expand Down
62 changes: 62 additions & 0 deletions docs/backfill/helicone.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
# `tj backfill helicone`

Imports [Helicone](https://helicone.ai) request records into the local TokenJam DB. Two input modes — live API or local JSON dump.

TokenJam doesn't replace Helicone. Keep Helicone wherever you have it; point `tj backfill helicone` at the same data so the local cost-optimization analyzers (`tj optimize`) can read it.

## Live API ingestion

```bash
tj backfill helicone \
--source-url https://api.helicone.ai \
--api-key hc_pk_... \
--since 30d
```

POSTs `/v1/request/query` against Helicone with Bearer auth and follows pagination. Self-hosted Helicone instances work the same way — point `--source-url` at the base URL of your deployment.

`--since` accepts the same syntax as `tj cost --since`: `30d`, `24h`, or an ISO-8601 timestamp.

## File ingestion

```bash
tj backfill helicone --source-file ./helicone-export.json
```

Accepts three input shapes:

1. `{"data": [...]}` — the format returned by the live `/v1/request/query` endpoint.
2. `[...]` — a bare JSON array of records.
3. NDJSON — one JSON record per line.

The file mode is the right choice for testing, offline analysis, or scripted ingestion from a snapshot.

## What gets mapped

Each Helicone request record becomes one TokenJam span:

| Helicone field | TokenJam field |
|---|---|
| `request.id` | `span_id` (deterministic hash) |
| `Helicone-Property-Session` (fallback: `request.id`) | `conversation_id` |
| `request.user_id` (fallback: `"helicone"`) | `agent_id` |
| `request.model` | `model` |
| `request.provider` (fallback: derived from model) | `provider` + `billing_account` |
| `request.created_at` | `start_time` |
| `request.created_at + response.delay_ms` | `end_time` |
| `request.prompt_tokens` | `input_tokens` |
| `response.completion_tokens` | `output_tokens` |
| `cost_usd` / `costUSD` | `cost_usd` |
| `properties` | merged into `attrs` |

`billing_account` is derived from `request.provider` when present, or from the model name otherwise. Unknown providers leave it `NULL`; affected sessions will surface as `plan_tier = 'unknown'` in `tj optimize`.

## Idempotency

The TokenJam `span_id` is a deterministic SHA-256 hash of `("helicone", request.id)`. Re-running the same backfill skips rows already present — the output reports `spans_written` vs `spans_skipped`. Safe to schedule nightly.

## Limitations in v1

- Helicone's per-request prompt/response bodies are not extracted into `gen_ai.prompt.content` / `gen_ai.completion.content`. Token counts and structural metadata only.
- Multi-tenant Helicone instances aren't filtered by org — the API key's scope determines what's returned.
- If `cost_usd` is missing from a record, TokenJam recomputes cost from `pricing/models.toml` using the model name.
46 changes: 46 additions & 0 deletions docs/backfill/otlp.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
# `tj backfill otlp`

Generic OTLP-JSON ingestion. The "works with anything" adapter for sources Langfuse and Helicone don't cover: OTel SDKs writing JSON files, observability tools that emit OTLP-shaped exports, OTLP HTTP collectors that publish JSON dumps.

Like the other backfill adapters, this is a partnership move — keep whatever OTel-emitting tool you already use, then point `tj backfill otlp` at a dump to run `tj optimize` against the same data locally.

## File ingestion

```bash
tj backfill otlp --source-file ./traces.json
```

Accepts:

1. A single `{"resourceSpans": [...]}` envelope (the OTLP JSON wire format).
2. NDJSON — one OTLP envelope per line.

This is the recommended mode. Export from your OTel SDK or collector, then ingest.

## URL ingestion

```bash
tj backfill otlp --source-url https://example.com/traces.json --since 7d
```

GET-fetches a JSON dump from a URL. **Not** for live push-style OTLP — for that, configure your collector to send to the running `tj serve` endpoint at `POST /api/v1/spans`, which uses the same OTLP parser as this adapter.

`--since` filters spans by `start_time` and accepts `30d` / `24h` / ISO-8601, matching `tj cost --since`.

## What gets mapped

The adapter uses `tokenjam.otel.otlp_parsing.iter_otlp_spans()` — the same parser that handles live `POST /api/v1/spans` ingest. Resource attributes are merged into per-span attributes (span wins on conflict), OTLP timestamps (nanosecond strings) are converted to UTC datetimes, and OTLP `intValue` fields (strings per the OTLP spec) are coerced to ints.

Standard GenAI semconv attributes (`gen_ai.system`, `gen_ai.request.model`, `gen_ai.usage.input_tokens`, etc.) populate the corresponding `NormalizedSpan` fields. The `tokenjam.billing_account` extension attribute is honored when present; otherwise `billing_account` is derived from the model name.

## Idempotency

`span_id` is read directly from the OTLP payload (each span carries a unique `(trace_id, span_id)`). The DB's `PRIMARY KEY` on `spans.span_id` causes re-runs to skip already-present rows. Safe to schedule nightly.

The CLI reports four counters: `spans_seen`, `spans_written`, `spans_skipped`, `spans_rejected`. Rejected spans are those that fail sanitization (oversized attributes, malformed timestamps) — the same gate the live ingest path uses.

## Limitations in v1

- No support for OTLP/protobuf wire format. JSON only. If you need protobuf, convert via the OTel collector first.
- Histogram and metric records in mixed payloads are ignored — only `resourceSpans` is processed.
- Sources that don't emit GenAI semconv attributes will still ingest, but with empty token counts and zero cost (no provider/model to look up in `pricing/models.toml`).
13 changes: 9 additions & 4 deletions docs/backfill/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,30 +2,35 @@

TokenJam's `backfill` command imports historical telemetry from external sources into the local DuckDB, where the standard analyzers (`tj cost`, `tj optimize`, etc.) can read it. Every backfill source maps records onto the same internal `NormalizedSpan` schema, so once imported the data is indistinguishable from natively-captured telemetry.

TokenJam's posture toward upstream tools like Langfuse, Helicone, Phoenix, and LangSmith is partnership, not displacement: keep using whatever you use for live tracing, then point `tj backfill <source>` at it to run the local cost-optimization analyzers (`tj optimize`) against the same data.

## Supported sources

| Source | Command | Status |
|---|---|---|
| Claude Code (on-disk session JSONL) | `tj backfill claude-code` | Stable |
| Langfuse (live API or JSON dump) | `tj backfill langfuse` | Stable |
| Helicone (live API or JSON dump) | `tj backfill helicone` | Stable |
| Raw OTLP JSON (file or HTTP dump) | `tj backfill otlp` | Stable |

Helicone and raw OTLP adapters land in Wave 3 of the current sprint.
Every adapter accepts `--source-url` (live endpoint) or `--source-file` (offline JSON dump), plus optional `--since` for time-windowed ingest. Re-running an ingest is a no-op — see the idempotency note below.

## Idempotency

Every adapter derives deterministic `span_id`s from the source's identifiers (e.g. `(langfuse_trace_id, observation_id)` for Langfuse, `(session_id, message_uuid)` for Claude Code). Re-running an ingest is a no-op — rows that already exist are skipped via the spans table's `PRIMARY KEY` on `span_id`.
Every adapter derives deterministic `span_id`s from the source's identifiers (e.g. `(langfuse_trace_id, observation_id)` for Langfuse, `(helicone_request_id)` for Helicone, `(session_id, message_uuid)` for Claude Code, the OTLP-payload's own `trace_id`+`span_id` for raw OTLP). Re-running an ingest is a no-op — rows that already exist are skipped via the spans table's `PRIMARY KEY` on `span_id`.

This means you can:

- Re-run backfills nightly via cron without duplicating data.
- Combine multiple `--since` windows; overlapping spans collapse on the first import.
- Run `tj backfill langfuse --source-url ...` *and* keep TokenJam's own daemon collecting live; the daemon's spans and backfilled spans share the same DB without conflict (as long as the source IDs differ, which they will).
- Run `tj backfill <source>` *and* keep TokenJam's own daemon collecting live; the daemon's spans and backfilled spans share the same DB without conflict (as long as the source IDs differ, which they will).

## Plan tier on backfilled sessions

Backfilled sessions get `SessionRecord.plan_tier = 'unknown'` because the source data doesn't carry a plan-tier identifier. Run `tj onboard --reconfigure` to set your plan; `tj optimize` will then render dollar figures correctly for those sessions.

## See also

- [`tj backfill claude-code`](claude-code.md) — Claude Code session JSONL ingestion
- [`tj backfill langfuse`](langfuse.md) — Langfuse observation ingestion
- [`tj backfill helicone`](helicone.md) — Helicone request-record ingestion
- [`tj backfill otlp`](otlp.md) — generic OTLP JSON ingestion
Loading
Loading