fix(v0.3.0): close all 11 pre-release findings + 2 API follow-ups (closes #68) by anilmurty · Pull Request #69 · Metabuilder-Labs/tokenjam

anilmurty · 2026-05-28T21:02:37Z

Closes #68. Closes all 11 numbered findings from the pre-release walkthrough plus the two API-mode follow-ups we noted while fixing §12. Originally opened against only the two launch-blockers (§12 + §15) — scope expanded after triage so the v0.3.0 release tag goes out with a clean slate.

Commits (oldest → newest)

Commit	Bug	Scope
`92b0811`	§14, §15, §16	Wave-2 renderer dispatch; per-example dollar leak under unknown-tier; Rich-markup escape in hints
`3f146e3`	§12	`/api/v1/optimize` route + `report_from_dict` deserializer; `tj optimize` no longer crashes alongside daemon
`cced338`	§1	Explicit error on bare `tj onboard --reconfigure` (it's a no-op without `--claude-code` / `--codex`)
`cd1f78f`	§4	`tj doctor` downgrades DB-writable from `✗` to `i` when the daemon holds the lock
`9bc8562`	§11	`tj budget` enumerates recently-active agents via `get_traces()` under API-shim mode
`0770966`	§3	`tj drift` surfaces baselines via new `ApiBackend.list_baseline_agents()` + `get_baseline()`
`12f0c36`	§2, §5	SDK fails fast + drops spans on 401 (not silent buffering); `load_config` warns on diverged secrets across project-local and global configs
`e78608d`	#69 §28, §29	New `/api/v1/cost/compare` route restores `--compare` under daemon mode; `plan_tier_mix` now included in optimize API payload
`5d938fe`	§6	`ensure_demo_agent_config` writes to every config file in SEARCH_PATHS (not just the first match); demos now call `warn_if_daemon_running()` with the actionable restart hint
`2c4653e`	§7, §8, §9, §10	Playbook polish: bare onboard doesn't prompt for plan tier (route through integrations); add "stop pre-existing daemon" to prerequisites; plan-tier examples are plan-agnostic; ruff baseline note updated to "clean"

What this enables

tj optimize works in the recommended operating mode. That's the strategy-pivot product. The combination of 92b0811 (Wave-2 renderer) and 3f146e3 (API route) means every --finding runs and renders correctly whether the daemon is up or down. Verified end-to-end on the live system.

tj doctor cleanly reports a healthy system. With cd1f78f, doctor under daemon mode is now all ✓ / i, no false ✗. Exit 0.

tj drift surfaces baselines. §3 was confusing during the walkthrough — drift_detected fired but tj drift reported "no baselines found" because the CLI went through the API path that didn't enumerate. Fixed at 0770966.

No more silent data loss. §2 was the worst footgun in the codebase: SDK got 401s, dropped spans, logged a single warning line. Now: ERROR-level loud message with the truncated secret fingerprint, no retry (won't change), counter for cumulative loss. Plus 12f0c36 adds a config-load-time warning when project-local and global secrets diverge — the upstream cause of most 401s.

tj budget and tj cost --compare work everywhere. Both used to silently degrade under daemon mode. Now full functionality regardless of daemon state.

Demos actually fire alerts. §6's silent-no-fire was caused by the same secret-divergence class of bug plus the daemon's no-hot-reload behavior. Helper now writes to every config; demos warn if daemon is up.

What I changed in the playbook

Step 3 routes plan-tier verification through tj onboard --claude-code (the bare path doesn't prompt — tj onboard is provider-agnostic).
Prerequisites now include "stop any lingering daemon" with the launchctl / systemctl one-liners.
Plan-tier examples switched to max_5x (closer to the typical tester's actual plan), with a format table so the tester picks their own.
Ruff baseline updated from "49 errors" to "clean" — and the practical fallback "if anything reports, confirm it's pre-existing on main".

Test plan

575 tests passing locally (pytest tests/unit tests/synthetic tests/agents tests/integration)
mypy strict clean across 103+ source files
ruff clean (no new warnings introduced by this PR)
End-to-end verification on the user's live system — every Wave-2 analyzer renders, tj cost --compare works under daemon, tj drift shows baselines under daemon, tj doctor exits 0 under daemon
9 new unit tests (5 for the SDK 401 fail-fast + counter behavior, 4 for the config-divergence warning)
Re-run the playbook end-to-end after merge to validate the v0.3.0 tag — that's the next step (issue Pre-release testing findings (v0.3.0 sprint) — bugs and playbook fixes #68 is closed pending merge)

🤖 Generated with Claude Code

Fixes issue #68 §15 (LAUNCH-BLOCKING), §14, and §16. #15 (renderer gap) — Wave-2 analyzers (cache-efficacy, cache-recommend, workflow-restructure, prompt-bloat) attach their results to `OptimizeReport.findings[<name>]` (the generic dict), but the CLI's `_render_report` only read the typed slots `report.downgrade` and `report.budgets`. All four analyzers ran successfully, populated the findings dict, and the renderer silently dropped them — every run fell through to the catch-all "No candidates flagged" message. Fix: dispatch on `report.findings` through a `_FINDING_RENDERERS` table. Each Wave-2 analyzer gets a dedicated renderer that knows how to surface its specific shape (per-(provider,model) table for cache-efficacy, prefix-candidate list for cache-recommend, cluster list for workflow-restructure, top-N bloat-prompts for prompt-bloat). The catch-all "No candidates flagged" condition is updated to fire only when nothing rendered — typed slots empty AND no Wave-2 findings. #14 (per-example cost leak under unknown-tier suppression) — when `pricing_mode != "api"` the top-level downgrade savings line correctly suppresses dollar figures, but the per-example table immediately below was still rendering `format_cost(ex.cost_usd)`. The honesty discipline should apply consistently: drop the cost column from the example rows under unknown/subscription/local pricing modes. #16 (Rich-markup escape in hint strings) — analyzer hint strings reference TOML sections like `[capture] prompts = true`. Rich's console.print() interprets square brackets as style tags and silently strips them, so the rendered hint became "Enable ` prompts = true`". Fix: wrap analyzer hints in `rich.markup.escape()` before printing. Keeps the technically-correct TOML notation in the analyzer source (other consumers — JSON output, the API, docs — get the literal string they expect). Also drops a similarly-bracketed `[bold]...[/bold]` reference inside the workflow_restructure "degraded mode" notice that was visible-but- ugly with the bracket markup leaking through. Verified end-to-end against the user's real DB (claude-code-tokenjam data with 5.8M tokens, 36 sessions): - cache-efficacy renders 8 (provider, model) rows correctly, including the OpenAI best-effort caveat and the Anthropic full-support rows. - cache-recommend renders the disabled hint with `[capture] prompts = true` visible verbatim. - workflow-restructure renders the "examined N / no clusters above threshold + degraded mode" path. - prompt-bloat renders the disabled hint with brackets visible. - model-downgrade per-example rows now drop the dollar column under unknown-tier rendering (the rest of the suppression was already in place). Tests: 566 passing (no regressions). Lint: ruff back at baseline 49 errors (none new). Type: mypy strict clean across 102 source files. Filed bugs that remain on issue #68: #1, #2, #3, #4, #5, #12 — left for follow-up PRs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Fixes issue #68 §12 (LAUNCH-BLOCKING). `tj optimize` used to fail outright whenever `tj serve` was running: $ tj optimize Error: Could not open /Users/.../telemetry.duckdb read-only: IO Error: Could not set lock on file "...": Conflicting lock is held in ... The original design assumed DuckDB would allow read-only attaches alongside the daemon's write lock, but DuckDB enforces process-level exclusivity — once one process opens the DB in write mode (the daemon), no other process can attach, even with read_only=True. Since `tj onboard` auto-installs and starts the daemon by default, this regression meant the headline strategy-pivot product (cost optimization) didn't work in the recommended operating mode. Architecture ------------ Add /api/v1/optimize: a GET endpoint that runs build_report() on the server side (where the daemon already has DB access) and returns the JSON-serialised report. Mirrors the CLI's --since / --agent / --finding / --budget / --budget-usd surface. Add report_from_dict() in tokenjam/core/optimize/runner.py: the symmetric deserialiser for report_to_dict(). Reconstructs OptimizeReport including all Wave-2 analyzer findings via a lazy-built constructor dispatch table (loaded on first use to avoid coupling runner.py to every analyzer module's import order). Unknown finding names are dropped silently — keeps the CLI forward-compatible with future analyzers. Add ApiBackend.fetch_optimize_report(): thin httpx wrapper that hits /api/v1/optimize with the right query params. Update cmd_optimize: when getattr(db, "conn", None) is None (i.e. the caller is an ApiBackend), fetch the report from the daemon and run it through report_from_dict. Otherwise (local DB), build the report locally as before. Renderer is unchanged — same OptimizeReport input, same code path, no API-specific rendering branch. Handle --compare gracefully under API mode: compute_cost_diff also uses db.conn directly. Rather than crash, the CLI now surfaces a clear note that --compare needs the daemon stopped (or omitted). Adding a /api/v1/cost/compare endpoint is a #68 §12 follow-up. Side effects ------------ The unknown-plan-tier suppression behaviour is currently lost under the API path: plan_tier_mix isn't on the API payload yet, so _dominant_plan defaults to "api" rendering. This is the safest default (matches historical behaviour) but means subscription users using the daemon get the api-mode renderer until we extend the API payload to carry plan_mix. Noted as a follow-up; not launch-blocking. Verified end-to-end against the user's live system: - Daemon started (tj serve, PID confirmed) - `tj optimize --finding cache-efficacy` rendered 8 (provider, model) rows correctly. Exit 0. - `tj optimize --finding model-downgrade` rendered the structural candidate, full caveat, examples table. Exit 0. - `tj optimize --compare previous` surfaced the graceful "not yet supported via API; tj stop and re-run" note, then continued to render the rest of the report. Exit 0. Tests: 566 passing (no regressions). Lint: ruff back at baseline 49 errors (none new). Type: mypy strict clean across 103 source files. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Previously `tj onboard --reconfigure` (no --claude-code / no --codex) hit the same "Config already exists. Use --force" early-return as a bare onboard with no flags. Users running e.g. `tj onboard --reconfigure --plan max_5x` got a silent no-op without any signal that their plan update didn't land. The bare onboard path writes a generic project-local config from a template; it doesn't manage plan tier, which is per-provider and lives in [budget.<provider>] sections written by --claude-code and --codex. So --reconfigure genuinely has nothing to do in the bare path. Fix: explicit error pointing the user at the right flow, exit 1. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…on (#68 §4) `tj doctor` previously reported ✗ DuckDB writable any time the daemon held the write lock — which is the expected operating state after `tj onboard` auto-installs the daemon. Users saw a "failure" indicator on a healthy system. The same doctor output already handles this gracefully for one other check (column statistics): `i Spans column statistics: Skipped — CLI is running through the HTTP API fallback`. The DB-writable check now follows the same pattern. Fix: detect DuckDB's lock-conflict error message ("Conflicting lock is held in ..." / "Could not set lock on file ...") and downgrade to info level. Other DB-open errors stay as ✗ — only the expected daemon-holds-lock case is demoted. With this fix and §12 (PR #69), `tj doctor` is fully clean while `tj serve` is running. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

`tj budget` previously called `db.conn.execute("SELECT DISTINCT agent_id FROM sessions")` to enumerate DB-observed agents. Under API-shim mode (daemon up, no db.conn) that branch was skipped entirely — the table only showed agents declared in tj.toml. Users running `tj budget` while the daemon was running saw a misleadingly small list. Same bug class as the other §3 / §12 cases — CLI commands that read the DB directly silently miss data under API mode. Fix: mirror the pattern from cmd_status — fall back to `db.get_traces(TraceFilters(limit=200))` and pull agent IDs from recent traces. Scope is "recently active agents" (last 200 traces) rather than "every session ever", which is arguably the right scope for a budget display anyway. Verified: - API mode (daemon up): now shows recently-active agents alongside config-declared ones. - Direct-DB mode: unchanged behavior. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…§3) `tj drift` previously did: elif hasattr(db, "conn"): rows = db.conn.execute("SELECT ... FROM drift_baselines") agent_ids = [r[0] for r in rows] else: agent_ids = [] # API mode: silently empty Under daemon-up mode (db is ApiBackend, no .conn), agent_ids was set to [] and the command exited with "No drift baselines found" — even when drift_detected had clearly fired and baselines existed in the DB. Highly confusing during the v0.3.0 pre-release walkthrough. Fix has three parts: 1. ApiBackend.list_baseline_agents(): hits GET /api/v1/drift (no agent_id) which already returns {"agents": [...]} for every agent with a baseline. Extract the agent IDs. 2. ApiBackend.get_baseline(agent_id): hits GET /api/v1/drift?agent_id=X and converts the response back into a DriftBaseline dataclass so the rest of cmd_drift can call db.get_baseline() transparently. 3. cmd_drift: added an `elif hasattr(db, "list_baseline_agents")` branch between the direct-DB and the empty-list fallback. The downstream loop already used db.get_baseline() and db.get_completed_sessions() which now resolve correctly via the ApiBackend method surface. Verified end-to-end with the drift-demo baseline visible under daemon mode for the first time. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

§2 — SDK silently dropped spans on 401 ------------------------------------------------------------ When the SDK's HttpTransport got a 401 from tj serve (the classic "my project-local config has secret A but the daemon was started with global-config secret B"), it logged a single-line warning, retried 3 times with exponential backoff (6s of stalling), then left the spans buffered. The buffered spans were dropped at process exit without notification. Users seeing the dropped data on tj status had no obvious signal that anything was wrong. Fix: - 401 fails fast — no retry, no backoff. The auth won't change just because we waited. - Logged at ERROR level (not warning) so it's harder to miss, and includes the truncated SDK-side secret fingerprint to make the mismatch debuggable. - Spans are dropped immediately (cleared from buffer) rather than hopelessly retained. - New counter `dropped_auth_failures` accumulates across calls so tj doctor can surface cumulative loss later. Non-401 4xx/5xx still follow the original retry-and-buffer flow. §5 — diverged secrets between configs detected at load time ------------------------------------------------------------ The footgun that produced §2's 401s: project-local .tj/config.toml has secret A, global ~/.config/tj/config.toml has secret B, SDK walks the search paths and picks A, daemon was launched with B. No warning at any point. Fix: at load_config time, scan the other SEARCH_PATHS entries for the same `[security] ingest_secret` field. When one exists with a different value, emit a stderr warning naming both files. Fires at most once per process (guarded by a module flag) so this doesn't spam test environments or multi-call CLI invocations. Tests: - 5 transport tests covering 401 fail-fast, counter accumulation, non-401 retry-and-buffer, success path, error-log content. - 4 config tests covering diverged-secret warning, single-config silence, matched-secret silence, once-per-process throttling. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

follow-ups) Two follow-ups to PR #69 that close the daemon-mode gaps the previous commit explicitly punted on: §28 — /api/v1/cost/compare ------------------------------------------------------------ Restores `tj cost --compare` and `tj optimize --compare` under daemon-up mode. Previously these printed a "not supported" notice and degraded to running optimize without comparison. - New /api/v1/cost/compare route mirroring compute_cost_diff's input surface (since, compare keyword, agent_id, top_n). - ApiBackend.fetch_cost_compare() — thin httpx wrapper. - cmd_cost: routes through the API in shim mode, renders the resulting dict via the new _render_diff_dict helper (same visual structure as _render_diff, dict-of-strings input shape). - cmd_optimize: same wiring — cost_diff_dict gets attached to JSON output and surfaces in the "Window comparison" section. Direct-DB mode is unchanged: still uses compute_cost_diff locally. §29 — plan_tier_mix on optimize API payload ------------------------------------------------------------ The optimize endpoint now returns `plan_tier_mix` (a SELECT plan_tier, COUNT(*) FROM sessions WHERE started_at IN window). cmd_optimize reads it from the API response in shim mode rather than defaulting to an empty dict (which made _dominant_plan return "api" regardless of the user's actual plan). With this fix, subscription / local / unknown-tier framings render correctly under the daemon — verified on a live system where the "All sessions have unknown plan tier; dollar figures suppressed" note now appears under tj serve mode same as direct-DB mode. Verified end-to-end: - tj cost --since 7d --compare previous → full diff with arrows + per-agent shifts. - tj optimize --finding model-downgrade → unknown-tier suppression applied. - tj optimize --compare previous → both analyzer findings AND window comparison. Tests: 575 passing (no regressions). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…mon (#68 §6) The alerts/drift demos always self-registered their per-agent config via ensure_demo_agent_config, but two latent bugs meant alerts often didn't fire during pre-release testing: 1. Helper wrote to `find_config_file()` only — first match in SEARCH_PATHS, project-local before global. Users who'd run `tj onboard --claude-code` had both a project-local and a global config; the daemon was launched with the global one, the helper updated the project-local one. Demo agent silently invisible. 2. Even when the helper wrote to the right file, the running daemon had loaded config at startup and didn't hot-reload. So freshly- added config never reached the AlertEngine until the daemon was restarted. Both surfaced as "No active alerts" on demo agents during the v0.3.0 pre-release walkthrough, with no signal to the user why. Fixes: ensure_demo_agent_config now writes to EVERY tj config file in SEARCH_PATHS that exists. Idempotent merge — existing user customisation is preserved. Falls back to creating .tj/config.toml if nothing exists. New warn_if_daemon_running helper checks for a tj serve on the default port and prints an actionable note pointing the user at `tj stop && tj serve &` (or just `tj stop` for direct-DB mode). Each of the three demos (sensitive_actions, budget_breach, drift) now calls warn_if_daemon_running() right after the config write and before ensure_initialised, so the user sees the heads-up before bootstrap. Verified: - Direct-DB mode (daemon stopped): sensitive-demo alert fires as expected. - Daemon-up mode: clear stderr warning fires before the demo runs, pointing at the restart command. Tests: 575 passing (no regressions). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…#68 §7-§10) Four small playbook polish items from the pre-release walkthrough: §7 — Step 3 used to claim bare `tj onboard` prompts for plan tier. It doesn't; plan tier is per-provider and only the integration- specific flows (--claude-code / --codex) prompt for it. Restructured Step 3 to route plan-tier verification through `tj onboard --claude-code` and added a check for the bare --reconfigure error path that landed in §1. §8 — Added "no lingering daemon from a prior install" to Prerequisites with the launchctl / systemctl one-liners to verify. The pre-release walkthrough showed how an old daemon could silently corrupt the rest of the playbook (secret divergence, 401s, ghost data). §9 — Made plan-tier examples plan-agnostic. The walkthrough was using literal "Max 20x plan, $200/mo flat" expected output — testers with actual max_5x / pro / plus plans had to translate. Switched the defaults to max_5x and added a table of expected formats so the tester picks based on their real plan. §10 — Ruff baseline note updated. The "49 errors as of v0.3.x" line was stale (and overly conservative). Latest run on `main` is clean. Note now says "ruff clean (no errors)" with the practical fallback of "if anything reports, confirm it's pre-existing on `main`". Also applied the plan-agnostic substitution in the post-release smoke test (it had the same max_20x assumption). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

anilmurty and others added 2 commits May 28, 2026 12:42

anilmurty changed the title ~~fix(optimize): launch-blocking rendering + daemon-coexistence bugs (#68)~~ fix(optimize): launch-blocking rendering + daemon-coexistence bugs May 28, 2026

anilmurty and others added 8 commits May 28, 2026 15:48

anilmurty mentioned this pull request May 28, 2026

Pre-release testing findings (v0.3.0 sprint) — bugs and playbook fixes #68

Closed

anilmurty changed the title ~~fix(optimize): launch-blocking rendering + daemon-coexistence bugs~~ fix(v0.3.0): close all 11 pre-release findings + 2 API follow-ups (closes #68) May 28, 2026

anilmurty merged commit 33d6b5f into main May 29, 2026
4 checks passed

anilmurty deleted the fix/optimize-renderer-and-daemon-coexistence branch May 29, 2026 00:42

anilmurty mentioned this pull request May 29, 2026

fix(sdk): apply 401 fail-loud to TjHttpExporter (the path bootstrap actually uses) #70

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(v0.3.0): close all 11 pre-release findings + 2 API follow-ups (closes #68)#69

fix(v0.3.0): close all 11 pre-release findings + 2 API follow-ups (closes #68)#69
anilmurty merged 10 commits into
mainfrom
fix/optimize-renderer-and-daemon-coexistence

anilmurty commented May 28, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

anilmurty commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Commits (oldest → newest)

What this enables

What I changed in the playbook

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

anilmurty commented May 28, 2026 •

edited

Loading