Metabuilder-Labs · anilmurty · May 29, 2026 · May 29, 2026
@@ -55,7 +55,7 @@ example sessions before changing models.
      Run rate $160.3500/mo — 19% of cycle budget unused.
 ```
 
-Two analyzers reading the same spans you'd otherwise pay LangSmith to host: structural model-downgrade candidate flagging (never claims quality equivalence — surfaces examples to review) and per-provider monthly budget projection. Works with **any** agent already sending TokenJam data, not just Claude Code.
+Two analyzers reading the same spans you'd otherwise pay LangSmith to host: structural downsize candidate flagging (never claims quality equivalence — surfaces examples to review) and per-provider monthly budget projection. Works with **any** agent already sending TokenJam data, not just Claude Code.
 
 Try a tighter budget to see the over-budget renderer:
 

@@ -90,7 +90,7 @@ Set limits via CLI (`tj budget --daily 10`), the REST API (`POST /api/v1/budget`
 
 ## Content capture and privacy
 
-By default, `tj` does not capture prompt content, completion content, or tool inputs/outputs — only token counts, model names, tool names, timestamps, and structural metadata. Enable content capture selectively when you need it (for debugging, prompt-bloat analysis, or evaluation):
+By default, `tj` does not capture prompt content, completion content, or tool inputs/outputs — only token counts, model names, tool names, timestamps, and structural metadata. Enable content capture selectively when you need it (for debugging, trim analysis, or evaluation):
 
 ```toml
 [capture]
@@ -108,9 +108,9 @@ The four flags are independent: capture prompts without completions, or tool inp
 
 **What this means for downstream analyzers.**
 
-- `tj optimize --finding cache-efficacy` reads token-count fields and works without content capture.
-- `tj optimize --finding prompt-bloat` reads prompt text and requires `capture.prompts = true`.
-- `tj optimize --finding cache-recommend` reads prompts and requires `capture.prompts = true`.
+- `tj optimize cache` reads token-count fields and works without content capture.
+- `tj optimize trim` reads prompt text and requires `capture.prompts = true`.
+- `tj optimize cache-recommend` reads prompts and requires `capture.prompts = true`.
 
 The analyzers that need content fail with a clear message ("set `capture.prompts = true` in tj.toml and let the daemon collect a fresh window of data") rather than running on partial data.
 

@@ -27,7 +27,7 @@ TokenJam keeps heavyweight ML dependencies, framework adapters, and the MCP serv
 | Extra | What it pulls in | Why it's optional |
 |---|---|---|
 | `tokenjam[mcp]` | `fastmcp` | Only needed for the Claude Code / Codex MCP server (`tj mcp`). Pulled by `tj onboard --claude-code` automatically when invoked through the documented one-liner. |
-| `tokenjam[bloat]` | `llmlingua>=0.2`, transitively PyTorch + transformers (~2GB) | The Trim analyzer (`tj optimize --finding prompt-bloat`) scores token significance with LLMLingua-2. Most users don't run it; keeping torch out of the base install means `pip install tokenjam` stays small and fast on machines that don't have a GPU/CPU build of torch already. |
+| `tokenjam[bloat]` | `llmlingua>=0.2`, transitively PyTorch + transformers (~2GB) | The Trim analyzer (`tj optimize trim`) scores token significance with LLMLingua-2. Most users don't run it; keeping torch out of the base install means `pip install tokenjam` stays small and fast on machines that don't have a GPU/CPU build of torch already. |
 | `tokenjam[langchain]` | `langchain>=0.2` | Convenience pin for `patch_langchain()`; you can also install langchain yourself. |
 | `tokenjam[crewai]` | `crewai>=0.28` | Convenience pin for `patch_crewai()`. |
 | `tokenjam[autogen]` | `pyautogen>=0.2` | Convenience pin for `patch_autogen()`. |
@@ -44,7 +44,7 @@ pip install "tokenjam[mcp,bloat]"
 
 `tokenjam[bloat]` is the largest extra — LLMLingua-2 transitively pulls in PyTorch and Hugging Face transformers, roughly 2GB on disk. On first run the analyzer downloads a ~110MB BERT-class classifier model under `~/.cache/tokenjam/models/` (override via `TOKENJAM_MODEL_CACHE`); subsequent runs are offline-capable.
 
-If you run `tj optimize --finding prompt-bloat` without the extra installed, the analyzer self-registers and exits with a clear hint pointing at this install command — nothing in the base install crashes from its absence.
+If you run `tj optimize trim` without the extra installed, the analyzer self-registers and exits with a clear hint pointing at this install command — nothing in the base install crashes from its absence.
 
 See [`docs/optimize/trim.md`](optimize/trim.md) for performance numbers, capture requirements, and what the analyzer actually reports.
 

@@ -1,13 +1,13 @@
 # Cache
 
-Product name: **Cache**. Internal/CLI names: `cache-efficacy` and `cache-recommend`. Two related findings under the same product — both surface prompt-caching opportunities; they differ in what they need and what they recommend.
+Product name: **Cache**. Internal/CLI names: `cache` and `cache-recommend`. Two related findings under the same product — both surface prompt-caching opportunities; they differ in what they need and what they recommend.
 
 ```bash
-tj optimize --finding cache-efficacy
-tj optimize --finding cache-recommend
+tj optimize cache
+tj optimize cache-recommend
 ```
 
-## `cache-efficacy` — measure current caching
+## `cache` — measure current caching
 
 Reads aggregate `input_tokens` and `cache_tokens` from spans in the
 window. Computes the share of input bytes served from cache per

@@ -1,9 +1,9 @@
 # Downsize
 
-Product name: **Downsize**. Internal/CLI name: `model-downgrade`.
+Product name: **Downsize**. Internal/CLI name: `downsize`.
 
 ```bash
-tj optimize --finding model-downgrade
+tj optimize downsize
 ```
 
 Flags sessions whose structural shape — short input (< 5K tokens), short output (< 500 tokens), few tool calls (≤ 5) — matches a class of work where a cheaper model in the same provider family is worth reviewing.
@@ -37,7 +37,7 @@ JSON output mirrors the same data with top-level `plan` and `pricing_mode` field
 
 ## Confidence
 
-`structural`. The model-downgrade finding identifies a structural pattern in the captured data; it does not validate that the cheaper model would produce equivalent output. The mandatory caveat is the honest framing of that limitation.
+`structural`. The downsize finding identifies a structural pattern in the captured data; it does not validate that the cheaper model would produce equivalent output. The mandatory caveat is the honest framing of that limitation.
 
 ## See also
 

@@ -1,9 +1,9 @@
 # Script
 
-Product name: **Script**. Internal/CLI name: `workflow-restructure`.
+Product name: **Script**. Internal/CLI name: `script`.
 
 ```bash
-tj optimize --finding workflow-restructure
+tj optimize script
 ```
 
 Flags sessions whose tool-call sequence is structurally identical

@@ -1,9 +1,9 @@
 # Trim
 
-Product name: **Trim**. Internal/CLI name: `prompt-bloat`.
+Product name: **Trim**. Internal/CLI name: `trim`.
 
 ```bash
-tj optimize --finding prompt-bloat
+tj optimize trim
 ```
 
 Scores token-by-token significance in captured prompts using
@@ -26,7 +26,7 @@ pip install "tokenjam[bloat]"
 ```
 
 The base `pip install tokenjam` does NOT pull torch. Trim shows up in
-`tj optimize --finding` choices regardless, but running it without the
+`tj optimize` analyzer choices regardless, but running it without the
 extra prints a clear install hint and exits.
 
 ## Requirements
@@ -56,10 +56,10 @@ marked.
 ## HTML report
 
 ```bash
-tj report --bloat                # all agents, 30d window
-tj report --bloat my-agent       # scope to one agent
-tj report --bloat --since 7d     # custom window
-tj report --bloat --no-open      # write file without opening browser
+tj report --trim                # all agents, 30d window
+tj report --trim my-agent       # scope to one agent
+tj report --trim --since 7d     # custom window
+tj report --trim --no-open      # write file without opening browser
 ```
 
 Output goes to `~/.cache/tokenjam/reports/trim-<timestamp>.html` and

@@ -4,7 +4,7 @@ build-backend = "hatchling.build"
 
 [project]
 name = "tokenjam"
-version = "0.3.0"
+version = "0.3.1"
 description = "TokenJam — local-first OTel-native observability for Autonomous AI agents"
 readme = "README.md"
 requires-python = ">=3.10"

@@ -1,6 +1,6 @@
 {
   "name": "@tokenjam/sdk",
-  "version": "0.3.0",
+  "version": "0.3.1",
   "description": "TypeScript SDK for TokenJam — local-first observability for AI agents",
   "main": "dist/index.js",
   "types": "dist/index.d.ts",

@@ -816,7 +816,7 @@ def test_optimize_budget_projection_from_config(runner, db):
         )
         db.insert_span(span)
 
-    result = _invoke(runner, db, cfg, ["optimize", "--finding", "budget-projection"])
+    result = _invoke(runner, db, cfg, ["optimize", "budget-projection"])
     assert result.exit_code == 0
     assert "Budget projection" in result.output
     assert "anthropic" in result.output

@@ -49,11 +49,11 @@ tj doctor             # exit 0 or 1 (warnings ok)
 
 ```bash
 tj optimize                                # all analyzers
-tj optimize --finding model-downgrade      # Downsize
-tj optimize --finding cache-efficacy       # Cache (efficacy)
-tj optimize --finding cache-recommend      # Cache (recommend) — surfaces "enable capture.prompts" if not set
-tj optimize --finding workflow-restructure # Script — likely no candidates on a fresh DB
-tj optimize --finding prompt-bloat         # Trim — should print install hint without [bloat] extra
+tj optimize downsize      # Downsize
+tj optimize cache       # Cache (efficacy)
+tj optimize cache-recommend      # Cache (recommend) — surfaces "enable capture.prompts" if not set
+tj optimize script # Script — likely no candidates on a fresh DB
+tj optimize trim         # Trim — should print install hint without [bloat] extra
 
 # Caveat enforcement on the downgrade finding
 tj optimize --json | python3 -c \
@@ -64,7 +64,7 @@ tj optimize --json | python3 -c \
   "import json,sys;d=json.load(sys.stdin);assert 'plan' in d and 'pricing_mode' in d;print('ok')"
 ```
 
-**Pass criteria:** every `--finding` runs without crashing. Optional analyzers (`cache-recommend`, `prompt-bloat`) surface clear hints when their prereqs aren't met instead of erroring.
+**Pass criteria:** every positional analyzer name runs without crashing. Optional analyzers (`cache-recommend`, `trim`) surface clear hints when their prereqs aren't met instead of erroring.
 
 ## 5. Backfill adapters (smoke against committed fixtures)
 

@@ -106,61 +106,61 @@ tj doctor     # exit 0 (or 1 with warnings); no errors
 
 ## 6. Cost-optimization analyzers (the four products)
 
-### 6a. Downsize (`--finding model-downgrade`)
+### 6a. Downsize
 
 ```bash
-tj optimize --finding model-downgrade
+tj optimize downsize
 # [ ] If candidates found: caveat "Candidate-flagging heuristic, not a quality judgment." is present
 # [ ] If no candidates: clean "No candidates flagged" message — not a crash
-tj optimize --finding model-downgrade --json | python3 -c \
+tj optimize downsize --json | python3 -c \
   "import json,sys;r=json.load(sys.stdin);d=r.get('downgrade');assert d is None or 'Candidate-flagging heuristic' in d['caveat'];print('ok: caveat enforced')"
 ```
 
-### 6b. Cache (`--finding cache-efficacy` + `--finding cache-recommend`)
+### 6b. Cache
 
 ```bash
-# cache-efficacy works without content capture
-tj optimize --finding cache-efficacy
+# cache works without content capture
+tj optimize cache
 # [ ] Per (provider, model) rows. Anthropic shows numerically-accurate ratios.
 # [ ] OpenAI / Gemini rows (if present) carry the best-effort caveat.
 # [ ] Rows for Bedrock / LiteLLM / Cohere (if present) show unsupported, not flagged.
 
 # cache-recommend needs capture.prompts. Without it, the analyzer returns a hint.
-tj optimize --finding cache-recommend
+tj optimize cache-recommend
 # [ ] Without capture.prompts: surfaces the "enable capture.prompts" hint, doesn't crash.
 # [ ] With capture.prompts and ≥3 calls sharing a long prefix: surfaces breakpoint candidates.
 ```
 
 To exercise the content-needed branch, set `capture.prompts = true` in `.tj/config.toml`, re-run an example, then re-run `cache-recommend`.
 
-### 6c. Script (`--finding workflow-restructure`)
+### 6c. Script
 
 ```bash
-tj optimize --finding workflow-restructure
+tj optimize script
 # [ ] If ≥20 sessions match a single (tool_name, arg_shape) signature: cluster surfaces with
 #     "review carefully" caveat. With v1's conservative thresholds, most fresh test DBs see no candidates.
 # [ ] No crash; "no clusters found" message is acceptable.
 ```
 
-### 6d. Trim (`--finding prompt-bloat`)
+### 6d. Trim
 
 Trim requires the optional `tokenjam[bloat]` extra (LLMLingua-2 + torch + transformers, ~2GB).
 
 ```bash
 # Without the extra installed: self-registers, errors gracefully with install hint.
-tj optimize --finding prompt-bloat
+tj optimize trim
 # [ ] Output points the user at: pip install "tokenjam[bloat]"
 
 # Install the extra and re-run — only do this if you actually want to test Trim end-to-end
 # (the 2GB download is real). Skip this on quick passes.
 pip3 install -e ".[dev,mcp,bloat]"
 # Enable capture.prompts in .tj/config.toml, re-run examples to populate content, then:
-tj optimize --finding prompt-bloat
+tj optimize trim
 # [ ] First run downloads the ~110MB BERT model under ~/.cache/tokenjam/models/.
 # [ ] Subsequent runs are offline.
 
 # HTML report renderer
-tj report --bloat --no-open
+tj report --trim --no-open
 # [ ] Writes to ~/.cache/tokenjam/reports/trim-<timestamp>.html
 # [ ] HTML contains a caveat block + per-prompt sections
 ```
@@ -400,8 +400,8 @@ python3 examples/single_provider/anthropic_agent.py
 
 tj status && tj traces && tj cost --since 1h
 tj optimize           # all analyzers
-tj optimize --finding model-downgrade
-tj optimize --finding cache-efficacy
+tj optimize downsize
+tj optimize cache
 tj backfill langfuse --source-file tests/fixtures/langfuse_real_response.json
 tj backfill helicone --source-file tests/fixtures/helicone_real_response.json
 tj backfill otlp --source-file tests/fixtures/otlp_sample.json

@@ -1,4 +1,4 @@
-"""Unit tests for the cache-efficacy analyzer."""
+"""Unit tests for the cache analyzer."""
 from __future__ import annotations
 
 from datetime import datetime, timedelta, timezone
@@ -140,10 +140,10 @@ def test_run_integrates_with_build_report(db, config):
     until = datetime(2026, 5, 30, tzinfo=timezone.utc)
     report = build_report(
         db=db, config=config, since=since, until=until,
-        findings=["cache-efficacy"],
+        findings=["cache"],
     )
-    assert "cache-efficacy" in report.findings
-    finding = report.findings["cache-efficacy"]
+    assert "cache" in report.findings
+    finding = report.findings["cache"]
     assert finding.confidence == "structural"
     assert len(finding.rows) == 1
     assert len(finding.flagged) == 1
@@ -1,5 +1,5 @@
 """
-Unit tests for the prompt-bloat (Trim) analyzer.
+Unit tests for the trim (Trim) analyzer.
 
 The LLMLingua-2 model is mocked across these tests so CI doesn't have to
 download ~110MB and run an actual BERT classifier. The mock returns
@@ -83,8 +83,8 @@ def test_disabled_without_capture_prompts(db):
     since = datetime(2026, 5, 1, tzinfo=timezone.utc)
     until = datetime(2026, 5, 30, tzinfo=timezone.utc)
     report = build_report(db=db, config=config, since=since, until=until,
-                          findings=["prompt-bloat"])
-    finding = report.findings["prompt-bloat"]
+                          findings=["trim"])
+    finding = report.findings["trim"]
     assert isinstance(finding, PromptBloatFinding)
     assert finding.enabled is False
     assert "capture" in finding.hint.lower()
@@ -98,8 +98,8 @@ def test_disabled_when_llmlingua_missing(db, monkeypatch):
     since = datetime(2026, 5, 1, tzinfo=timezone.utc)
     until = datetime(2026, 5, 30, tzinfo=timezone.utc)
     report = build_report(db=db, config=config, since=since, until=until,
-                          findings=["prompt-bloat"])
-    finding = report.findings["prompt-bloat"]
+                          findings=["trim"])
+    finding = report.findings["trim"]
     assert finding.enabled is False
     assert "tokenjam[bloat]" in finding.hint
 
@@ -153,8 +153,8 @@ def test_scores_prompts_and_finds_bloat(db, monkeypatch):
     since = datetime(2026, 5, 1, tzinfo=timezone.utc)
     until = datetime(2026, 5, 30, tzinfo=timezone.utc)
     report = build_report(db=db, config=config, since=since, until=until,
-                          findings=["prompt-bloat"])
-    finding = report.findings["prompt-bloat"]
+                          findings=["trim"])
+    finding = report.findings["trim"]
     assert finding.enabled is True
     assert finding.prompts_scored == 3
     # Each scored prompt produces one BloatPrompt entry, up to the 10 cap.
@@ -173,8 +173,8 @@ def test_skips_short_prompts(db, monkeypatch):
     since = datetime(2026, 5, 1, tzinfo=timezone.utc)
     until = datetime(2026, 5, 30, tzinfo=timezone.utc)
     report = build_report(db=db, config=config, since=since, until=until,
-                          findings=["prompt-bloat"])
-    finding = report.findings["prompt-bloat"]
+                          findings=["trim"])
+    finding = report.findings["trim"]
     assert finding.prompts_scored == 0
     assert finding.prompts_skipped == 5
     assert finding.per_prompt == []