Strategy pivot: Layer-9 cost-optimization analyzers + ingest adapters by anilmurty · Pull Request #66 · Metabuilder-Labs/tokenjam

anilmurty · 2026-05-28T15:29:07Z

Summary

Implements the May 26 OSS sprint described in docs/internal/tasks/ — TokenJam's strategic pivot from observability product to Layer-9 cost-optimization. Spans Task 0 (foundation), Waves 1–3 (analyzers, ingest adapters, config export), and Task 4 (tj policy preview).

Scale: 82 files changed, +8,276 / −842 lines. 566 tests passing (up from 447). mypy strict clean. Ruff at pre-sprint baseline (49 errors, all pre-existing).

Note: this PR was originally opened as #65 against task-5-docs-sweep by mistake; closed and re-opened from task-4-policy-list. task-5-docs-sweep is reserved for a parallel docs-only effort.

What's in here

Foundation (Task 0)

Data model: new billing_account column on spans (provider-only); new plan_tier column on sessions + pricing_mode derived property; new ProviderBudget.plan field. Migration 4.
Registry-driven optimize package: tokenjam/core/optimize.py split into a package with pkgutil-based auto-discovery. New analyzers drop a file under analyzers/ with a @register("name") decorator — cmd_optimize's --finding choices auto-derive. --only model|budget flag replaced with --finding.
Plan-tier-aware onboard: tj onboard prompts for plan tier (api/pro/max_5x/max_20x for Anthropic; api/plus/team/enterprise for OpenAI). --reconfigure re-prompts; --plan <tier> skips interaction for scripted onboards. Stops auto-writing [budget.anthropic] usd = 200.
Unknown-tier handling: tj status prints a one-line note; tj optimize suppresses dollar figures with a header note.

Wave 1 — honest output + first ingest adapter + period comparison

v1.1 honest output in tj optimize: subscription users see "implied API value" framing and token-share savings, never dollar "spend." Local users see token-only. JSON carries plan + pricing_mode + monthly_tokens_freed.
Langfuse ingest adapter (tj backfill langfuse) with live API + JSON-file modes, deterministic span IDs, per-model billing_account derivation.
Period comparison (--compare on tj cost and tj optimize): keywords (previous / last-week / last-month / last-7d / last-30d) or explicit YYYY-MM-DD:YYYY-MM-DD ranges. ▲/▼ deltas with top per-agent/per-model shifts.

Wave 2 — four new analyzers

cache-efficacy — current caching ratio per (provider, model). Anthropic full; OpenAI/Gemini best-effort; others unsupported.
cache-recommend (Anthropic-only v1) — flags stable prompt prefixes for cache_control placement.
workflow-restructure (Script product) — clusters sessions by (tool_name, arg_shape) signature, flags ≥20-instance deterministic patterns as script-replacement candidates.
prompt-bloat (Trim product, optional tokenjam[bloat] extra) — LLMLingua-2 token-significance classifier with lazy import + HTML report via tj report --bloat.

Wave 3

Helicone ingest adapter (tj backfill helicone).
Raw OTLP ingest adapter (tj backfill otlp) — extracted shared OTLP parsing into tokenjam/otel/otlp_parsing.py, now used by both the live POST /api/v1/spans route and the backfill adapter.
tj optimize --export-config claude-code — writes a JSONC routing snippet to ~/.config/tokenjam/exports/ with mandatory honest-framing caveat comments baked in. No --apply flag by design.

Task 4 — policy preview

tj policy list — read-only preview consolidating alerts, drift, schema, sensitive-actions, capture, and per-provider budget config under a unified policy view. Each row carries its source TOML section. Full tj policy add | edit | apply lands next sprint.

CLAUDE.md

Updated for everything above: new data-model fields, key-modules entries for ingest_adapters/ + export/ + otlp_parsing.py, expanded CLI commands description, Critical Rules #16 (analyzer self-registration) and #17 (OTLP parsing single home), revised Config section, optional-extras documentation.

Honesty constraints — preserved end-to-end

MODEL_DOWNGRADE_CAVEAT still surfaces in every rendering path.
New analyzers each carry equivalent caveats.
Subscription users never see a dollar figure framed as "spend" they didn't pay. The list-price math surfaces only as "implied API value" with explicit comparison to plan cost.
tj optimize --export-config snippets bake the caveat block into the JSONC output as comments.
No --apply flag on config export — TokenJam doesn't sit in the call path.

Test plan

Full CI suite: 566 passing (pytest tests/unit tests/synthetic tests/agents tests/integration)
mypy strict: clean (102 source files)
ruff: 49 errors, all pre-existing
Manual: tj onboard --claude-code --plan max_20x writes correct plan_tier
Manual: tj optimize renders subscription framing for a Max-20x session
Manual: tj backfill langfuse --source-file ... ingests + tj optimize analyzes
Manual: tj report --bloat renders an HTML report (requires pip install "tokenjam[bloat]")
Manual: tj policy list surfaces every config section correctly

Out of scope (deferred)

Per the task-queue decision document (docs/internal/tasks/decisions-locked.md):

Replay validation (Confidence Level 2) — Pro-tier, next sprint
License verification infrastructure — own design sprint
Full tj policy add | edit | apply surface + unified [policy] config migration — next sprint
MCP tool surface expansion — depends on validation engine
TypeScript SDK framework patches
TokenJam Cloud MVP

🤖 Generated with Claude Code

Implements Task 0 + Waves 1-3 + Task 4 of the May 26 OSS sprint (see docs/internal/tasks/ for the task queue). Foundation (Task 0): - Add billing_account (TEXT, provider-only) to spans table (migration 4) - Add plan_tier (TEXT) to sessions table + pricing_mode derived property - Add ProviderBudget.plan field — set by tj onboard prompts, read by IngestPipeline at session creation to populate SessionRecord.plan_tier - Refactor tokenjam/core/optimize.py into a package with registry-driven analyzers (auto-discovery via pkgutil) + ANALYZER_ORDER - Replace tj optimize --only model|budget with --finding (registry-driven choices) - Add tj onboard plan-tier prompts (anthropic: api/pro/max_5x/max_20x; openai: api/plus/team/enterprise) + --reconfigure flag + --plan flag - Stop auto-writing [budget.anthropic] usd = 200; only write plan field - Add unknown-plan-tier handling to tj status (one-line note) and tj optimize (suppress dollar figures when sessions are unknown-tier) - Document existing [capture] config (four fine-grained flags) in docs/configuration.md — Wave-2 analyzers depend on it Wave 1: - v1.1 honest output: plan-tier-aware rendering in tj optimize (subscription users see "implied API value" + token-share framing, never dollar "spend"; local users see token-only framing). DowngradeFinding extended with token-share fields. JSON output carries plan + pricing_mode + monthly_tokens_freed. - Langfuse ingest adapter (tj backfill langfuse) — live API or JSON dump, deterministic span IDs, per-model billing_account derivation - Period comparison (--compare flag on tj cost and tj optimize) — spend/token deltas, top per-agent/per-model shifts, ▲/▼ indicators Wave 2 — four new analyzers: - cache-efficacy: measures current cache_tokens / input_tokens ratio per (provider, model). Anthropic full, OpenAI/Gemini best-effort, others unsupported. Flags rows >=100K tokens with <30% efficacy. - cache-recommend (Anthropic-only v1, needs capture.prompts): walks captured prompts, hashes first 2000 chars, flags prefixes shared by >=3 calls as cache_control breakpoint candidates. - workflow-restructure (Script product): clusters sessions by ordered (tool_name, arg_shape) signature. arg_shape classifies args by type (file_path / command_string / json_object / array / etc.) so structural patterns cluster even when values vary. Conservative thresholds (>=20 instances) for v1. - prompt-bloat (Trim product, optional tokenjam[bloat] extra): LLMLingua-2 token-significance classifier identifies low-significance regions in captured prompts. Lazy import; self-registers without the extra installed. HTML report via tj report --bloat. Wave 3: - Helicone ingest adapter (tj backfill helicone) - Raw OTLP ingest adapter (tj backfill otlp) — extracts shared parsing to tokenjam/otel/otlp_parsing.py (one impl, two callers: backfill + live POST /api/v1/spans) - tj optimize --export-config claude-code — writes JSONC snippet to ~/.config/tokenjam/exports/ with mandatory honest-framing caveat comments. No --apply flag by design. Task 4: - tj policy list — read-only preview of the unified policy surface, consolidating alerts/drift/schema/sensitive_actions/capture/budget config under one view. Each row carries its source TOML section. Full tj policy add|edit|apply land next sprint with the underlying config migration. CLAUDE.md updated for the new data model, registry pattern, OTLP parsing single-source rule, optional extras, and new CLI surface. CHANGELOG.md ## Unreleased section documents every addition. Tests: 566 passing (102 source files). mypy strict clean. Ruff at pre-sprint baseline (49 errors, all pre-existing). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

CLAUDE.md: - Expand "Further Reading" from a single architecture.md pointer into a structured list covering the new docs landed in the Task-5 consistency sweep: installation.md (optional extras matrix incl. tokenjam[bloat]), configuration.md (content-capture privacy section), the four optimize product pages (downsize / cache / script / trim), backfill/overview.md, policy/overview.md, and docs/internal/specs/v1.1-honest-output.md. - Cross-reference architecture.md's new "OTel semconv extensions" section from the core/models.py description in Key Modules so readers find the pricing_mode derivation rules from the entry point they're most likely to land on first. Remove docs/internal/tasks/ — the May 26 sprint's per-task specs + decisions-locked.md served their purpose during the sprint and are captured in the merged PR (#66) history. Keeping the v1.1 honest-output spec under docs/internal/specs/ because Wave 1 / the production code still references it as the canonical source. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

anilmurty requested a review from anshss May 28, 2026 15:32

anilmurty merged commit c3b06f3 into main May 28, 2026
4 checks passed

anilmurty deleted the task-4-policy-list branch May 28, 2026 15:38

anilmurty mentioned this pull request May 28, 2026

docs: post-sprint sweep — analyzer docs, installation, semconv, release-testing playbooks #67

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Strategy pivot: Layer-9 cost-optimization analyzers + ingest adapters#66

Strategy pivot: Layer-9 cost-optimization analyzers + ingest adapters#66
anilmurty merged 1 commit into
mainfrom
task-4-policy-list

anilmurty commented May 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

anilmurty commented May 28, 2026

Summary

What's in here

Foundation (Task 0)

Wave 1 — honest output + first ingest adapter + period comparison

Wave 2 — four new analyzers

Wave 3

Task 4 — policy preview

CLAUDE.md

Honesty constraints — preserved end-to-end

Test plan

Out of scope (deferred)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant