Skip to content

Strategy pivot: Layer-9 cost-optimization analyzers + ingest adapters#66

Merged
anilmurty merged 1 commit into
mainfrom
task-4-policy-list
May 28, 2026
Merged

Strategy pivot: Layer-9 cost-optimization analyzers + ingest adapters#66
anilmurty merged 1 commit into
mainfrom
task-4-policy-list

Conversation

@anilmurty
Copy link
Copy Markdown
Contributor

Summary

Implements the May 26 OSS sprint described in docs/internal/tasks/ — TokenJam's strategic pivot from observability product to Layer-9 cost-optimization. Spans Task 0 (foundation), Waves 1–3 (analyzers, ingest adapters, config export), and Task 4 (tj policy preview).

Scale: 82 files changed, +8,276 / −842 lines. 566 tests passing (up from 447). mypy strict clean. Ruff at pre-sprint baseline (49 errors, all pre-existing).

Note: this PR was originally opened as #65 against task-5-docs-sweep by mistake; closed and re-opened from task-4-policy-list. task-5-docs-sweep is reserved for a parallel docs-only effort.

What's in here

Foundation (Task 0)

  • Data model: new billing_account column on spans (provider-only); new plan_tier column on sessions + pricing_mode derived property; new ProviderBudget.plan field. Migration 4.
  • Registry-driven optimize package: tokenjam/core/optimize.py split into a package with pkgutil-based auto-discovery. New analyzers drop a file under analyzers/ with a @register("name") decorator — cmd_optimize's --finding choices auto-derive. --only model|budget flag replaced with --finding.
  • Plan-tier-aware onboard: tj onboard prompts for plan tier (api/pro/max_5x/max_20x for Anthropic; api/plus/team/enterprise for OpenAI). --reconfigure re-prompts; --plan <tier> skips interaction for scripted onboards. Stops auto-writing [budget.anthropic] usd = 200.
  • Unknown-tier handling: tj status prints a one-line note; tj optimize suppresses dollar figures with a header note.

Wave 1 — honest output + first ingest adapter + period comparison

  • v1.1 honest output in tj optimize: subscription users see "implied API value" framing and token-share savings, never dollar "spend." Local users see token-only. JSON carries plan + pricing_mode + monthly_tokens_freed.
  • Langfuse ingest adapter (tj backfill langfuse) with live API + JSON-file modes, deterministic span IDs, per-model billing_account derivation.
  • Period comparison (--compare on tj cost and tj optimize): keywords (previous / last-week / last-month / last-7d / last-30d) or explicit YYYY-MM-DD:YYYY-MM-DD ranges. ▲/▼ deltas with top per-agent/per-model shifts.

Wave 2 — four new analyzers

  • cache-efficacy — current caching ratio per (provider, model). Anthropic full; OpenAI/Gemini best-effort; others unsupported.
  • cache-recommend (Anthropic-only v1) — flags stable prompt prefixes for cache_control placement.
  • workflow-restructure (Script product) — clusters sessions by (tool_name, arg_shape) signature, flags ≥20-instance deterministic patterns as script-replacement candidates.
  • prompt-bloat (Trim product, optional tokenjam[bloat] extra) — LLMLingua-2 token-significance classifier with lazy import + HTML report via tj report --bloat.

Wave 3

  • Helicone ingest adapter (tj backfill helicone).
  • Raw OTLP ingest adapter (tj backfill otlp) — extracted shared OTLP parsing into tokenjam/otel/otlp_parsing.py, now used by both the live POST /api/v1/spans route and the backfill adapter.
  • tj optimize --export-config claude-code — writes a JSONC routing snippet to ~/.config/tokenjam/exports/ with mandatory honest-framing caveat comments baked in. No --apply flag by design.

Task 4 — policy preview

  • tj policy list — read-only preview consolidating alerts, drift, schema, sensitive-actions, capture, and per-provider budget config under a unified policy view. Each row carries its source TOML section. Full tj policy add | edit | apply lands next sprint.

CLAUDE.md

Updated for everything above: new data-model fields, key-modules entries for ingest_adapters/ + export/ + otlp_parsing.py, expanded CLI commands description, Critical Rules #16 (analyzer self-registration) and #17 (OTLP parsing single home), revised Config section, optional-extras documentation.

Honesty constraints — preserved end-to-end

  • MODEL_DOWNGRADE_CAVEAT still surfaces in every rendering path.
  • New analyzers each carry equivalent caveats.
  • Subscription users never see a dollar figure framed as "spend" they didn't pay. The list-price math surfaces only as "implied API value" with explicit comparison to plan cost.
  • tj optimize --export-config snippets bake the caveat block into the JSONC output as comments.
  • No --apply flag on config export — TokenJam doesn't sit in the call path.

Test plan

  • Full CI suite: 566 passing (pytest tests/unit tests/synthetic tests/agents tests/integration)
  • mypy strict: clean (102 source files)
  • ruff: 49 errors, all pre-existing
  • Manual: tj onboard --claude-code --plan max_20x writes correct plan_tier
  • Manual: tj optimize renders subscription framing for a Max-20x session
  • Manual: tj backfill langfuse --source-file ... ingests + tj optimize analyzes
  • Manual: tj report --bloat renders an HTML report (requires pip install "tokenjam[bloat]")
  • Manual: tj policy list surfaces every config section correctly

Out of scope (deferred)

Per the task-queue decision document (docs/internal/tasks/decisions-locked.md):

  • Replay validation (Confidence Level 2) — Pro-tier, next sprint
  • License verification infrastructure — own design sprint
  • Full tj policy add | edit | apply surface + unified [policy] config migration — next sprint
  • MCP tool surface expansion — depends on validation engine
  • TypeScript SDK framework patches
  • TokenJam Cloud MVP

🤖 Generated with Claude Code

Implements Task 0 + Waves 1-3 + Task 4 of the May 26 OSS sprint
(see docs/internal/tasks/ for the task queue).

Foundation (Task 0):
- Add billing_account (TEXT, provider-only) to spans table (migration 4)
- Add plan_tier (TEXT) to sessions table + pricing_mode derived property
- Add ProviderBudget.plan field — set by tj onboard prompts, read by
  IngestPipeline at session creation to populate SessionRecord.plan_tier
- Refactor tokenjam/core/optimize.py into a package with registry-driven
  analyzers (auto-discovery via pkgutil) + ANALYZER_ORDER
- Replace tj optimize --only model|budget with --finding (registry-driven choices)
- Add tj onboard plan-tier prompts (anthropic: api/pro/max_5x/max_20x;
  openai: api/plus/team/enterprise) + --reconfigure flag + --plan flag
- Stop auto-writing [budget.anthropic] usd = 200; only write plan field
- Add unknown-plan-tier handling to tj status (one-line note) and
  tj optimize (suppress dollar figures when sessions are unknown-tier)
- Document existing [capture] config (four fine-grained flags) in
  docs/configuration.md — Wave-2 analyzers depend on it

Wave 1:
- v1.1 honest output: plan-tier-aware rendering in tj optimize
  (subscription users see "implied API value" + token-share framing,
   never dollar "spend"; local users see token-only framing).
  DowngradeFinding extended with token-share fields. JSON output
  carries plan + pricing_mode + monthly_tokens_freed.
- Langfuse ingest adapter (tj backfill langfuse) — live API or JSON
  dump, deterministic span IDs, per-model billing_account derivation
- Period comparison (--compare flag on tj cost and tj optimize) —
  spend/token deltas, top per-agent/per-model shifts, ▲/▼ indicators

Wave 2 — four new analyzers:
- cache-efficacy: measures current cache_tokens / input_tokens ratio
  per (provider, model). Anthropic full, OpenAI/Gemini best-effort,
  others unsupported. Flags rows >=100K tokens with <30% efficacy.
- cache-recommend (Anthropic-only v1, needs capture.prompts): walks
  captured prompts, hashes first 2000 chars, flags prefixes shared by
  >=3 calls as cache_control breakpoint candidates.
- workflow-restructure (Script product): clusters sessions by ordered
  (tool_name, arg_shape) signature. arg_shape classifies args by type
  (file_path / command_string / json_object / array / etc.) so
  structural patterns cluster even when values vary. Conservative
  thresholds (>=20 instances) for v1.
- prompt-bloat (Trim product, optional tokenjam[bloat] extra):
  LLMLingua-2 token-significance classifier identifies low-significance
  regions in captured prompts. Lazy import; self-registers without the
  extra installed. HTML report via tj report --bloat.

Wave 3:
- Helicone ingest adapter (tj backfill helicone)
- Raw OTLP ingest adapter (tj backfill otlp) — extracts shared parsing
  to tokenjam/otel/otlp_parsing.py (one impl, two callers: backfill +
  live POST /api/v1/spans)
- tj optimize --export-config claude-code — writes JSONC snippet to
  ~/.config/tokenjam/exports/ with mandatory honest-framing caveat
  comments. No --apply flag by design.

Task 4:
- tj policy list — read-only preview of the unified policy surface,
  consolidating alerts/drift/schema/sensitive_actions/capture/budget
  config under one view. Each row carries its source TOML section.
  Full tj policy add|edit|apply land next sprint with the underlying
  config migration.

CLAUDE.md updated for the new data model, registry pattern, OTLP
parsing single-source rule, optional extras, and new CLI surface.

CHANGELOG.md ## Unreleased section documents every addition.

Tests: 566 passing (102 source files). mypy strict clean.
Ruff at pre-sprint baseline (49 errors, all pre-existing).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@anilmurty anilmurty requested a review from anshss May 28, 2026 15:32
@anilmurty anilmurty merged commit c3b06f3 into main May 28, 2026
4 checks passed
@anilmurty anilmurty deleted the task-4-policy-list branch May 28, 2026 15:38
anilmurty added a commit that referenced this pull request May 28, 2026
CLAUDE.md:
- Expand "Further Reading" from a single architecture.md pointer into a
  structured list covering the new docs landed in the Task-5 consistency
  sweep: installation.md (optional extras matrix incl. tokenjam[bloat]),
  configuration.md (content-capture privacy section), the four optimize
  product pages (downsize / cache / script / trim), backfill/overview.md,
  policy/overview.md, and docs/internal/specs/v1.1-honest-output.md.
- Cross-reference architecture.md's new "OTel semconv extensions" section
  from the core/models.py description in Key Modules so readers find the
  pricing_mode derivation rules from the entry point they're most likely
  to land on first.

Remove docs/internal/tasks/ — the May 26 sprint's per-task specs +
decisions-locked.md served their purpose during the sprint and are
captured in the merged PR (#66) history. Keeping the v1.1 honest-output
spec under docs/internal/specs/ because Wave 1 / the production code
still references it as the canonical source.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant