Metabuilder-Labs · anilmurty · May 28, 2026 · May 28, 2026 · May 28, 2026 · May 28, 2026
@@ -69,7 +69,7 @@ Post-ingest hooks run synchronously after each span is written to DB:
 - **`tokenjam/core/export/`**: Routing-config snippet generators for `tj optimize --export-config`. Currently `claude_code.py` emits a JSONC fragment under a `tokenjam.routing_recommendations` namespace with honest-framing caveat comments baked in. Writes to `~/.config/tokenjam/exports/`; never touches `~/.claude/settings.json` or other external configs (no `--apply` flag — Claude Code doesn't currently honor TokenJam routing keys, so auto-writing would change nothing and erode trust).
 - **`tokenjam/core/backfill.py`**: Parses Claude Code on-disk session JSONL files into `NormalizedSpan`s. Cost is recomputed from `pricing/models.toml` because the on-disk format has no `cost_usd`. The parser tolerates the dated `claude-<family>-<ver>-YYYYMMDD` model-name suffixes Anthropic ships (handled by `core/pricing.py.get_rates()`, which strips the trailing 8-digit date suffix when no exact pricing match exists). Idempotency relies on deterministic span IDs derived from `(session_id, message uuid)` / `(session_id, tool_use id)`.
 - **`tokenjam/core/schema_validator.py`**: Validates tool outputs against declared or genson-inferred JSON Schema. Only fires on `gen_ai.tool.call` spans with `gen_ai.tool.output` in attributes. Schema priority: 1) declared file from agent config `output_schema`, 2) inferred schema from `DriftBaseline.output_schema_inferred`. Caches schemas in-memory per agent.
-- **`tokenjam/core/models.py`**: All domain dataclasses — `NormalizedSpan`, `SessionRecord`, `Alert`, `DriftBaseline`, filter types, etc. `NormalizedSpan` carries `billing_account` (provider-only: `anthropic` / `openai` / `google` / `bedrock` / `local.ollama`). `SessionRecord` carries `plan_tier` (api / pro / max_5x / max_20x / plus / team / enterprise / local / unknown) plus a derived `pricing_mode` property (`local` / `subscription` / `api` / `unknown`). Spans inherit plan via the session FK — analyzers JOIN through `SessionRecord` when they need plan context.
+- **`tokenjam/core/models.py`**: All domain dataclasses — `NormalizedSpan`, `SessionRecord`, `Alert`, `DriftBaseline`, filter types, etc. `NormalizedSpan` carries `billing_account` (provider-only: `anthropic` / `openai` / `google` / `bedrock` / `local.ollama`). `SessionRecord` carries `plan_tier` (api / pro / max_5x / max_20x / plus / team / enterprise / local / unknown) plus a derived `pricing_mode` property (`local` / `subscription` / `api` / `unknown`). Spans inherit plan via the session FK — analyzers JOIN through `SessionRecord` when they need plan context. See [`docs/architecture.md`](docs/architecture.md) → "OTel semconv extensions" for the full derivation rules.
 - **`tokenjam/core/config.py`**: `TjConfig` dataclass tree, TOML loading/writing, config file discovery. `ProviderBudget` carries an optional `plan` field (set by `tj onboard`'s plan-tier prompt) that `IngestPipeline._build_or_update_session` reads to populate `SessionRecord.plan_tier` at session creation. `CaptureConfig` has four fine-grained content-capture toggles (`prompts` / `completions` / `tool_inputs` / `tool_outputs`); `strip_captured_content()` in `core/ingest.py` enforces them at the single ingest-pipeline gate.
 - **`tokenjam/sdk/agent.py`**: `@watch()` decorator creates session spans only. `record_llm_call()` and `record_tool_call()` create child spans for manual instrumentation. LLM call spans from provider clients require `patch_anthropic()`, `patch_openai()`, etc.
 - **`tokenjam/sdk/transport.py`**: `HttpTransport` — buffers up to 1000 spans, retries with exponential backoff (3 attempts, 2s base). Used when `tj serve` runs as a separate process.
@@ -229,4 +229,14 @@ Key runtime dependency: `pytz` is required by DuckDB for `TIMESTAMPTZ` column ha
 
 ## Further Reading
 
-- **[docs/architecture.md](docs/architecture.md)** — comprehensive architecture document covering design principles, system overview, data flow, SDK internals, alert system, drift detection, MCP server, Claude Code integration, budget system, and testing architecture.
+- **[docs/architecture.md](docs/architecture.md)** — design principles, system overview, data flow, SDK internals, alert system, drift detection, MCP server, Claude Code integration, budget system, testing architecture, and the **OTel semconv extensions** section documenting `tokenjam.billing_account` (span attribute) and `tokenjam.plan_tier` (session-level), the `pricing_mode` derivation rules, and why `plan_tier` lives on `SessionRecord` rather than each span.
+- **[docs/installation.md](docs/installation.md)** — base install vs optional extras matrix. Documents `tokenjam[bloat]` (the ~2GB torch + transformers extra used by the Trim analyzer), framework adapter extras (`[langchain]` / `[crewai]` / `[autogen]` / `[litellm]`), and the MCP / dev extras.
+- **[docs/configuration.md](docs/configuration.md)** — full TOML config surface plus the "Content capture and privacy" section explaining the four `[capture]` toggles and how they interact with `alerts.include_captured_content`.
+- **Optimize product pages** — one per user-facing product, all under `docs/optimize/`:
+  - [`downsize.md`](docs/optimize/downsize.md) — model-downgrade candidate flagging (internal: `model-downgrade`)
+  - [`cache.md`](docs/optimize/cache.md) — `cache-efficacy` (current caching ratio) + `cache-recommend` (Anthropic-only breakpoint suggestions)
+  - [`script.md`](docs/optimize/script.md) — `workflow-restructure` clustering by `(tool_name, arg_shape)` signature
+  - [`trim.md`](docs/optimize/trim.md) — LLMLingua-2 token-significance classifier (`prompt-bloat`), install + capture requirements, performance numbers
+- **Backfill adapters** — `docs/backfill/overview.md` lists the four sources (`claude-code` / `langfuse` / `helicone` / `otlp`) with the partnership-posture framing; per-adapter pages document modes (URL / file), field mapping, idempotency, and v1 limitations.
+- **[docs/policy/overview.md](docs/policy/overview.md)** — read-only preview of the unified policy surface (`tj policy list`). Notes that the `add` / `edit` / `apply` subcommands and the underlying `[policy]` config migration land next sprint.
+- **Internal specs** — `docs/internal/specs/` is reserved for canonical specs that production code references at long-term. Currently empty (sprint specs have been cleaned up after merge); add new ones here when a feature needs a stable, code-referenced source of truth.
@@ -377,6 +377,52 @@ The converted spans flow through the standard `IngestPipeline.process()` path
 
 ---
 
+## OTel semconv extensions: `billing_account` and `plan_tier`
+
+TokenJam extends the standard OTel GenAI semconv with two attributes for plan-tier-aware cost rendering. Both live in `TjAttributes` in `tokenjam/otel/semconv.py`:
+
+| Attribute | Constant | Lives on | Set by |
+|---|---|---|---|
+| `tokenjam.billing_account` | `TjAttributes.BILLING_ACCOUNT` | Spans | Every integration (OTel patches, Claude Code JSONL backfill, OTLP HTTP ingest, OTLP logs ingest, Langfuse/Helicone backfill adapters) |
+| `tokenjam.plan_tier` | `TjAttributes.PLAN_TIER` | `SessionRecord` (DB column, not a span attribute) | `IngestPipeline._build_or_update_session` reads `ProviderBudget.plan` for the session's `billing_account` |
+
+### `billing_account`
+
+Provider-only identifier. Valid values: `anthropic`, `openai`, `google`, `bedrock`, `local.ollama`. Distinct from `gen_ai.system` (which can include framework-level wrappers like `litellm` or `langchain`) — `billing_account` is whichever provider actually billed the call.
+
+It is **not** a composite. No API-key fingerprint, no plan tier, no account-name suffix. Multi-key disambiguation is deferred until someone asks for it.
+
+### `plan_tier`
+
+The user's declared plan for the relevant provider. Valid values are enumerated in `VALID_PLAN_TIERS`:
+
+```python
+{"api", "pro", "max_5x", "max_20x", "plus", "team", "enterprise", "local", "unknown"}
+```
+
+`SUBSCRIPTION_PLAN_TIERS` is a sub-set: the plans where "spend" is a flat fee and the user isn't paying per-token. Renderers branch on this to avoid presenting a per-token dollar "spend" claim against a subscription plan.
+
+### `pricing_mode` derived property
+
+`SessionRecord.pricing_mode` is a derived Python property — **not** a stored DB column. It maps `plan_tier` to one of four rendering modes, evaluated top-to-bottom (first match wins):
+
+1. `local` if `billing_account == "local.ollama"`
+2. `subscription` if `plan_tier in SUBSCRIPTION_PLAN_TIERS`
+3. `api` if `plan_tier == "api"`
+4. `unknown` if `plan_tier == "unknown"`
+
+Renderers (`tj optimize`, `tj cost`, the web UI cost views) read `pricing_mode` and pick the appropriate framing. `tj optimize` suppresses dollar figures entirely when the entire window is `pricing_mode = unknown`.
+
+### Why a session-level column, not a span attribute
+
+`plan_tier` doesn't change call-to-call within a session. Storing it on each span would duplicate the value across thousands of rows and create skew if a provider plan changes mid-session. Storing on `SessionRecord` keeps it normalized and lets analyzers JOIN through to it.
+
+### Backfilled sessions
+
+`SessionRecord.plan_tier` defaults to `unknown` for backfilled rows (no plan signal in the source data). `tj status` surfaces a one-line note when unknown-tier sessions exist; `tj optimize` refuses to render dollar figures for those sessions until the user resolves them via `tj onboard --reconfigure`.
+
+---
+
 ## Budget system
 
 ### CLI (`tj budget`)

@@ -0,0 +1,62 @@
+# `tj backfill helicone`
+
+Imports [Helicone](https://helicone.ai) request records into the local TokenJam DB. Two input modes — live API or local JSON dump.
+
+TokenJam doesn't replace Helicone. Keep Helicone wherever you have it; point `tj backfill helicone` at the same data so the local cost-optimization analyzers (`tj optimize`) can read it.
+
+## Live API ingestion
+
+```bash
+tj backfill helicone \
+  --source-url https://api.helicone.ai \
+  --api-key hc_pk_... \
+  --since 30d
+```
+
+POSTs `/v1/request/query` against Helicone with Bearer auth and follows pagination. Self-hosted Helicone instances work the same way — point `--source-url` at the base URL of your deployment.
+
+`--since` accepts the same syntax as `tj cost --since`: `30d`, `24h`, or an ISO-8601 timestamp.
+
+## File ingestion
+
+```bash
+tj backfill helicone --source-file ./helicone-export.json
+```
+
+Accepts three input shapes:
+
+1. `{"data": [...]}` — the format returned by the live `/v1/request/query` endpoint.
+2. `[...]` — a bare JSON array of records.
+3. NDJSON — one JSON record per line.
+
+The file mode is the right choice for testing, offline analysis, or scripted ingestion from a snapshot.
+
+## What gets mapped
+
+Each Helicone request record becomes one TokenJam span:
+
+| Helicone field | TokenJam field |
+|---|---|
+| `request.id` | `span_id` (deterministic hash) |
+| `Helicone-Property-Session` (fallback: `request.id`) | `conversation_id` |
+| `request.user_id` (fallback: `"helicone"`) | `agent_id` |
+| `request.model` | `model` |
+| `request.provider` (fallback: derived from model) | `provider` + `billing_account` |
+| `request.created_at` | `start_time` |
+| `request.created_at + response.delay_ms` | `end_time` |
+| `request.prompt_tokens` | `input_tokens` |
+| `response.completion_tokens` | `output_tokens` |
+| `cost_usd` / `costUSD` | `cost_usd` |
+| `properties` | merged into `attrs` |
+
+`billing_account` is derived from `request.provider` when present, or from the model name otherwise. Unknown providers leave it `NULL`; affected sessions will surface as `plan_tier = 'unknown'` in `tj optimize`.
+
+## Idempotency
+
+The TokenJam `span_id` is a deterministic SHA-256 hash of `("helicone", request.id)`. Re-running the same backfill skips rows already present — the output reports `spans_written` vs `spans_skipped`. Safe to schedule nightly.
+
+## Limitations in v1
+
+- Helicone's per-request prompt/response bodies are not extracted into `gen_ai.prompt.content` / `gen_ai.completion.content`. Token counts and structural metadata only.
+- Multi-tenant Helicone instances aren't filtered by org — the API key's scope determines what's returned.
+- If `cost_usd` is missing from a record, TokenJam recomputes cost from `pricing/models.toml` using the model name.
@@ -0,0 +1,46 @@
+# `tj backfill otlp`
+
+Generic OTLP-JSON ingestion. The "works with anything" adapter for sources Langfuse and Helicone don't cover: OTel SDKs writing JSON files, observability tools that emit OTLP-shaped exports, OTLP HTTP collectors that publish JSON dumps.
+
+Like the other backfill adapters, this is a partnership move — keep whatever OTel-emitting tool you already use, then point `tj backfill otlp` at a dump to run `tj optimize` against the same data locally.
+
+## File ingestion
+
+```bash
+tj backfill otlp --source-file ./traces.json
+```
+
+Accepts:
+
+1. A single `{"resourceSpans": [...]}` envelope (the OTLP JSON wire format).
+2. NDJSON — one OTLP envelope per line.
+
+This is the recommended mode. Export from your OTel SDK or collector, then ingest.
+
+## URL ingestion
+
+```bash
+tj backfill otlp --source-url https://example.com/traces.json --since 7d
+```
+
+GET-fetches a JSON dump from a URL. **Not** for live push-style OTLP — for that, configure your collector to send to the running `tj serve` endpoint at `POST /api/v1/spans`, which uses the same OTLP parser as this adapter.
+
+`--since` filters spans by `start_time` and accepts `30d` / `24h` / ISO-8601, matching `tj cost --since`.
+
+## What gets mapped
+
+The adapter uses `tokenjam.otel.otlp_parsing.iter_otlp_spans()` — the same parser that handles live `POST /api/v1/spans` ingest. Resource attributes are merged into per-span attributes (span wins on conflict), OTLP timestamps (nanosecond strings) are converted to UTC datetimes, and OTLP `intValue` fields (strings per the OTLP spec) are coerced to ints.
+
+Standard GenAI semconv attributes (`gen_ai.system`, `gen_ai.request.model`, `gen_ai.usage.input_tokens`, etc.) populate the corresponding `NormalizedSpan` fields. The `tokenjam.billing_account` extension attribute is honored when present; otherwise `billing_account` is derived from the model name.
+
+## Idempotency
+
+`span_id` is read directly from the OTLP payload (each span carries a unique `(trace_id, span_id)`). The DB's `PRIMARY KEY` on `spans.span_id` causes re-runs to skip already-present rows. Safe to schedule nightly.
+
+The CLI reports four counters: `spans_seen`, `spans_written`, `spans_skipped`, `spans_rejected`. Rejected spans are those that fail sanitization (oversized attributes, malformed timestamps) — the same gate the live ingest path uses.
+
+## Limitations in v1
+
+- No support for OTLP/protobuf wire format. JSON only. If you need protobuf, convert via the OTel collector first.
+- Histogram and metric records in mixed payloads are ignored — only `resourceSpans` is processed.
+- Sources that don't emit GenAI semconv attributes will still ingest, but with empty token counts and zero cost (no provider/model to look up in `pricing/models.toml`).
@@ -2,30 +2,35 @@
 
 TokenJam's `backfill` command imports historical telemetry from external sources into the local DuckDB, where the standard analyzers (`tj cost`, `tj optimize`, etc.) can read it. Every backfill source maps records onto the same internal `NormalizedSpan` schema, so once imported the data is indistinguishable from natively-captured telemetry.
 
+TokenJam's posture toward upstream tools like Langfuse, Helicone, Phoenix, and LangSmith is partnership, not displacement: keep using whatever you use for live tracing, then point `tj backfill <source>` at it to run the local cost-optimization analyzers (`tj optimize`) against the same data.
+
 ## Supported sources
 
 | Source | Command | Status |
 |---|---|---|
 | Claude Code (on-disk session JSONL) | `tj backfill claude-code` | Stable |
 | Langfuse (live API or JSON dump) | `tj backfill langfuse` | Stable |
+| Helicone (live API or JSON dump) | `tj backfill helicone` | Stable |
+| Raw OTLP JSON (file or HTTP dump) | `tj backfill otlp` | Stable |
 
-Helicone and raw OTLP adapters land in Wave 3 of the current sprint.
+Every adapter accepts `--source-url` (live endpoint) or `--source-file` (offline JSON dump), plus optional `--since` for time-windowed ingest. Re-running an ingest is a no-op — see the idempotency note below.
 
 ## Idempotency
 
-Every adapter derives deterministic `span_id`s from the source's identifiers (e.g. `(langfuse_trace_id, observation_id)` for Langfuse, `(session_id, message_uuid)` for Claude Code). Re-running an ingest is a no-op — rows that already exist are skipped via the spans table's `PRIMARY KEY` on `span_id`.
+Every adapter derives deterministic `span_id`s from the source's identifiers (e.g. `(langfuse_trace_id, observation_id)` for Langfuse, `(helicone_request_id)` for Helicone, `(session_id, message_uuid)` for Claude Code, the OTLP-payload's own `trace_id`+`span_id` for raw OTLP). Re-running an ingest is a no-op — rows that already exist are skipped via the spans table's `PRIMARY KEY` on `span_id`.
 
 This means you can:
 
 - Re-run backfills nightly via cron without duplicating data.
 - Combine multiple `--since` windows; overlapping spans collapse on the first import.
-- Run `tj backfill langfuse --source-url ...` *and* keep TokenJam's own daemon collecting live; the daemon's spans and backfilled spans share the same DB without conflict (as long as the source IDs differ, which they will).
+- Run `tj backfill <source>` *and* keep TokenJam's own daemon collecting live; the daemon's spans and backfilled spans share the same DB without conflict (as long as the source IDs differ, which they will).
 
 ## Plan tier on backfilled sessions
 
 Backfilled sessions get `SessionRecord.plan_tier = 'unknown'` because the source data doesn't carry a plan-tier identifier. Run `tj onboard --reconfigure` to set your plan; `tj optimize` will then render dollar figures correctly for those sessions.
 
 ## See also
 
-- [`tj backfill claude-code`](claude-code.md) — Claude Code session JSONL ingestion
 - [`tj backfill langfuse`](langfuse.md) — Langfuse observation ingestion
+- [`tj backfill helicone`](helicone.md) — Helicone request-record ingestion
+- [`tj backfill otlp`](otlp.md) — generic OTLP JSON ingestion