Strategy pivot: Layer-9 cost-optimization analyzers + ingest adapters#66
Merged
Conversation
Implements Task 0 + Waves 1-3 + Task 4 of the May 26 OSS sprint (see docs/internal/tasks/ for the task queue). Foundation (Task 0): - Add billing_account (TEXT, provider-only) to spans table (migration 4) - Add plan_tier (TEXT) to sessions table + pricing_mode derived property - Add ProviderBudget.plan field — set by tj onboard prompts, read by IngestPipeline at session creation to populate SessionRecord.plan_tier - Refactor tokenjam/core/optimize.py into a package with registry-driven analyzers (auto-discovery via pkgutil) + ANALYZER_ORDER - Replace tj optimize --only model|budget with --finding (registry-driven choices) - Add tj onboard plan-tier prompts (anthropic: api/pro/max_5x/max_20x; openai: api/plus/team/enterprise) + --reconfigure flag + --plan flag - Stop auto-writing [budget.anthropic] usd = 200; only write plan field - Add unknown-plan-tier handling to tj status (one-line note) and tj optimize (suppress dollar figures when sessions are unknown-tier) - Document existing [capture] config (four fine-grained flags) in docs/configuration.md — Wave-2 analyzers depend on it Wave 1: - v1.1 honest output: plan-tier-aware rendering in tj optimize (subscription users see "implied API value" + token-share framing, never dollar "spend"; local users see token-only framing). DowngradeFinding extended with token-share fields. JSON output carries plan + pricing_mode + monthly_tokens_freed. - Langfuse ingest adapter (tj backfill langfuse) — live API or JSON dump, deterministic span IDs, per-model billing_account derivation - Period comparison (--compare flag on tj cost and tj optimize) — spend/token deltas, top per-agent/per-model shifts, ▲/▼ indicators Wave 2 — four new analyzers: - cache-efficacy: measures current cache_tokens / input_tokens ratio per (provider, model). Anthropic full, OpenAI/Gemini best-effort, others unsupported. Flags rows >=100K tokens with <30% efficacy. - cache-recommend (Anthropic-only v1, needs capture.prompts): walks captured prompts, hashes first 2000 chars, flags prefixes shared by >=3 calls as cache_control breakpoint candidates. - workflow-restructure (Script product): clusters sessions by ordered (tool_name, arg_shape) signature. arg_shape classifies args by type (file_path / command_string / json_object / array / etc.) so structural patterns cluster even when values vary. Conservative thresholds (>=20 instances) for v1. - prompt-bloat (Trim product, optional tokenjam[bloat] extra): LLMLingua-2 token-significance classifier identifies low-significance regions in captured prompts. Lazy import; self-registers without the extra installed. HTML report via tj report --bloat. Wave 3: - Helicone ingest adapter (tj backfill helicone) - Raw OTLP ingest adapter (tj backfill otlp) — extracts shared parsing to tokenjam/otel/otlp_parsing.py (one impl, two callers: backfill + live POST /api/v1/spans) - tj optimize --export-config claude-code — writes JSONC snippet to ~/.config/tokenjam/exports/ with mandatory honest-framing caveat comments. No --apply flag by design. Task 4: - tj policy list — read-only preview of the unified policy surface, consolidating alerts/drift/schema/sensitive_actions/capture/budget config under one view. Each row carries its source TOML section. Full tj policy add|edit|apply land next sprint with the underlying config migration. CLAUDE.md updated for the new data model, registry pattern, OTLP parsing single-source rule, optional extras, and new CLI surface. CHANGELOG.md ## Unreleased section documents every addition. Tests: 566 passing (102 source files). mypy strict clean. Ruff at pre-sprint baseline (49 errors, all pre-existing). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
anilmurty
added a commit
that referenced
this pull request
May 28, 2026
CLAUDE.md: - Expand "Further Reading" from a single architecture.md pointer into a structured list covering the new docs landed in the Task-5 consistency sweep: installation.md (optional extras matrix incl. tokenjam[bloat]), configuration.md (content-capture privacy section), the four optimize product pages (downsize / cache / script / trim), backfill/overview.md, policy/overview.md, and docs/internal/specs/v1.1-honest-output.md. - Cross-reference architecture.md's new "OTel semconv extensions" section from the core/models.py description in Key Modules so readers find the pricing_mode derivation rules from the entry point they're most likely to land on first. Remove docs/internal/tasks/ — the May 26 sprint's per-task specs + decisions-locked.md served their purpose during the sprint and are captured in the merged PR (#66) history. Keeping the v1.1 honest-output spec under docs/internal/specs/ because Wave 1 / the production code still references it as the canonical source. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Merged
4 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Implements the May 26 OSS sprint described in
docs/internal/tasks/— TokenJam's strategic pivot from observability product to Layer-9 cost-optimization. Spans Task 0 (foundation), Waves 1–3 (analyzers, ingest adapters, config export), and Task 4 (tj policypreview).Scale: 82 files changed, +8,276 / −842 lines. 566 tests passing (up from 447). mypy strict clean. Ruff at pre-sprint baseline (49 errors, all pre-existing).
What's in here
Foundation (Task 0)
billing_accountcolumn on spans (provider-only); newplan_tiercolumn on sessions +pricing_modederived property; newProviderBudget.planfield. Migration 4.tokenjam/core/optimize.pysplit into a package withpkgutil-based auto-discovery. New analyzers drop a file underanalyzers/with a@register("name")decorator —cmd_optimize's--findingchoices auto-derive.--only model|budgetflag replaced with--finding.tj onboardprompts for plan tier (api/pro/max_5x/max_20x for Anthropic; api/plus/team/enterprise for OpenAI).--reconfigurere-prompts;--plan <tier>skips interaction for scripted onboards. Stops auto-writing[budget.anthropic] usd = 200.tj statusprints a one-line note;tj optimizesuppresses dollar figures with a header note.Wave 1 — honest output + first ingest adapter + period comparison
tj optimize: subscription users see "implied API value" framing and token-share savings, never dollar "spend." Local users see token-only. JSON carriesplan+pricing_mode+monthly_tokens_freed.tj backfill langfuse) with live API + JSON-file modes, deterministic span IDs, per-model billing_account derivation.--compareontj costandtj optimize): keywords (previous/last-week/last-month/last-7d/last-30d) or explicitYYYY-MM-DD:YYYY-MM-DDranges. ▲/▼ deltas with top per-agent/per-model shifts.Wave 2 — four new analyzers
cache-efficacy— current caching ratio per (provider, model). Anthropic full; OpenAI/Gemini best-effort; others unsupported.cache-recommend(Anthropic-only v1) — flags stable prompt prefixes forcache_controlplacement.workflow-restructure(Script product) — clusters sessions by(tool_name, arg_shape)signature, flags≥20-instance deterministic patterns as script-replacement candidates.prompt-bloat(Trim product, optionaltokenjam[bloat]extra) — LLMLingua-2 token-significance classifier with lazy import + HTML report viatj report --bloat.Wave 3
tj backfill helicone).tj backfill otlp) — extracted shared OTLP parsing intotokenjam/otel/otlp_parsing.py, now used by both the livePOST /api/v1/spansroute and the backfill adapter.tj optimize --export-config claude-code— writes a JSONC routing snippet to~/.config/tokenjam/exports/with mandatory honest-framing caveat comments baked in. No--applyflag by design.Task 4 — policy preview
tj policy list— read-only preview consolidating alerts, drift, schema, sensitive-actions, capture, and per-provider budget config under a unifiedpolicyview. Each row carries its source TOML section. Fulltj policy add | edit | applylands next sprint.CLAUDE.md
Updated for everything above: new data-model fields, key-modules entries for
ingest_adapters/+export/+otlp_parsing.py, expanded CLI commands description, Critical Rules #16 (analyzer self-registration) and #17 (OTLP parsing single home), revised Config section, optional-extras documentation.Honesty constraints — preserved end-to-end
MODEL_DOWNGRADE_CAVEATstill surfaces in every rendering path.tj optimize --export-configsnippets bake the caveat block into the JSONC output as comments.--applyflag on config export — TokenJam doesn't sit in the call path.Test plan
pytest tests/unit tests/synthetic tests/agents tests/integration)tj onboard --claude-code --plan max_20xwrites correct plan_tiertj optimizerenders subscription framing for a Max-20x sessiontj backfill langfuse --source-file ...ingests +tj optimizeanalyzestj report --bloatrenders an HTML report (requirespip install "tokenjam[bloat]")tj policy listsurfaces every config section correctlyOut of scope (deferred)
Per the task-queue decision document (
docs/internal/tasks/decisions-locked.md):tj policy add | edit | applysurface + unified[policy]config migration — next sprint🤖 Generated with Claude Code