Skip to content

fix(v0.3.0): close all 11 pre-release findings + 2 API follow-ups (closes #68)#69

Merged
anilmurty merged 10 commits into
mainfrom
fix/optimize-renderer-and-daemon-coexistence
May 29, 2026
Merged

fix(v0.3.0): close all 11 pre-release findings + 2 API follow-ups (closes #68)#69
anilmurty merged 10 commits into
mainfrom
fix/optimize-renderer-and-daemon-coexistence

Conversation

@anilmurty
Copy link
Copy Markdown
Contributor

@anilmurty anilmurty commented May 28, 2026

Closes #68. Closes all 11 numbered findings from the pre-release walkthrough plus the two API-mode follow-ups we noted while fixing §12. Originally opened against only the two launch-blockers (§12 + §15) — scope expanded after triage so the v0.3.0 release tag goes out with a clean slate.

Commits (oldest → newest)

Commit Bug Scope
92b0811 §14, §15, §16 Wave-2 renderer dispatch; per-example dollar leak under unknown-tier; Rich-markup escape in hints
3f146e3 §12 /api/v1/optimize route + report_from_dict deserializer; tj optimize no longer crashes alongside daemon
cced338 §1 Explicit error on bare tj onboard --reconfigure (it's a no-op without --claude-code / --codex)
cd1f78f §4 tj doctor downgrades DB-writable from to i when the daemon holds the lock
9bc8562 §11 tj budget enumerates recently-active agents via get_traces() under API-shim mode
0770966 §3 tj drift surfaces baselines via new ApiBackend.list_baseline_agents() + get_baseline()
12f0c36 §2, §5 SDK fails fast + drops spans on 401 (not silent buffering); load_config warns on diverged secrets across project-local and global configs
e78608d #69 §28, §29 New /api/v1/cost/compare route restores --compare under daemon mode; plan_tier_mix now included in optimize API payload
5d938fe §6 ensure_demo_agent_config writes to every config file in SEARCH_PATHS (not just the first match); demos now call warn_if_daemon_running() with the actionable restart hint
2c4653e §7, §8, §9, §10 Playbook polish: bare onboard doesn't prompt for plan tier (route through integrations); add "stop pre-existing daemon" to prerequisites; plan-tier examples are plan-agnostic; ruff baseline note updated to "clean"

What this enables

tj optimize works in the recommended operating mode. That's the strategy-pivot product. The combination of 92b0811 (Wave-2 renderer) and 3f146e3 (API route) means every --finding runs and renders correctly whether the daemon is up or down. Verified end-to-end on the live system.

tj doctor cleanly reports a healthy system. With cd1f78f, doctor under daemon mode is now all ✓ / i, no false ✗. Exit 0.

tj drift surfaces baselines. §3 was confusing during the walkthrough — drift_detected fired but tj drift reported "no baselines found" because the CLI went through the API path that didn't enumerate. Fixed at 0770966.

No more silent data loss. §2 was the worst footgun in the codebase: SDK got 401s, dropped spans, logged a single warning line. Now: ERROR-level loud message with the truncated secret fingerprint, no retry (won't change), counter for cumulative loss. Plus 12f0c36 adds a config-load-time warning when project-local and global secrets diverge — the upstream cause of most 401s.

tj budget and tj cost --compare work everywhere. Both used to silently degrade under daemon mode. Now full functionality regardless of daemon state.

Demos actually fire alerts. §6's silent-no-fire was caused by the same secret-divergence class of bug plus the daemon's no-hot-reload behavior. Helper now writes to every config; demos warn if daemon is up.

What I changed in the playbook

  • Step 3 routes plan-tier verification through tj onboard --claude-code (the bare path doesn't prompt — tj onboard is provider-agnostic).
  • Prerequisites now include "stop any lingering daemon" with the launchctl / systemctl one-liners.
  • Plan-tier examples switched to max_5x (closer to the typical tester's actual plan), with a format table so the tester picks their own.
  • Ruff baseline updated from "49 errors" to "clean" — and the practical fallback "if anything reports, confirm it's pre-existing on main".

Test plan

  • 575 tests passing locally (pytest tests/unit tests/synthetic tests/agents tests/integration)
  • mypy strict clean across 103+ source files
  • ruff clean (no new warnings introduced by this PR)
  • End-to-end verification on the user's live system — every Wave-2 analyzer renders, tj cost --compare works under daemon, tj drift shows baselines under daemon, tj doctor exits 0 under daemon
  • 9 new unit tests (5 for the SDK 401 fail-fast + counter behavior, 4 for the config-divergence warning)
  • Re-run the playbook end-to-end after merge to validate the v0.3.0 tag — that's the next step (issue Pre-release testing findings (v0.3.0 sprint) — bugs and playbook fixes #68 is closed pending merge)

🤖 Generated with Claude Code

anilmurty and others added 2 commits May 28, 2026 12:42
Fixes issue #68 §15 (LAUNCH-BLOCKING), §14, and §16.

#15 (renderer gap) — Wave-2 analyzers (cache-efficacy, cache-recommend,
workflow-restructure, prompt-bloat) attach their results to
`OptimizeReport.findings[<name>]` (the generic dict), but the CLI's
`_render_report` only read the typed slots `report.downgrade` and
`report.budgets`. All four analyzers ran successfully, populated the
findings dict, and the renderer silently dropped them — every run
fell through to the catch-all "No candidates flagged" message.

Fix: dispatch on `report.findings` through a `_FINDING_RENDERERS`
table. Each Wave-2 analyzer gets a dedicated renderer that knows how
to surface its specific shape (per-(provider,model) table for
cache-efficacy, prefix-candidate list for cache-recommend, cluster
list for workflow-restructure, top-N bloat-prompts for prompt-bloat).

The catch-all "No candidates flagged" condition is updated to fire
only when nothing rendered — typed slots empty AND no Wave-2 findings.

#14 (per-example cost leak under unknown-tier suppression) — when
`pricing_mode != "api"` the top-level downgrade savings line correctly
suppresses dollar figures, but the per-example table immediately below
was still rendering `format_cost(ex.cost_usd)`. The honesty discipline
should apply consistently: drop the cost column from the example rows
under unknown/subscription/local pricing modes.

#16 (Rich-markup escape in hint strings) — analyzer hint strings
reference TOML sections like `[capture] prompts = true`. Rich's
console.print() interprets square brackets as style tags and silently
strips them, so the rendered hint became "Enable ` prompts = true`".

Fix: wrap analyzer hints in `rich.markup.escape()` before printing.
Keeps the technically-correct TOML notation in the analyzer source
(other consumers — JSON output, the API, docs — get the literal
string they expect).

Also drops a similarly-bracketed `[bold]...[/bold]` reference inside
the workflow_restructure "degraded mode" notice that was visible-but-
ugly with the bracket markup leaking through.

Verified end-to-end against the user's real DB (claude-code-tokenjam
data with 5.8M tokens, 36 sessions):
- cache-efficacy renders 8 (provider, model) rows correctly, including
  the OpenAI best-effort caveat and the Anthropic full-support rows.
- cache-recommend renders the disabled hint with `[capture] prompts =
  true` visible verbatim.
- workflow-restructure renders the "examined N / no clusters above
  threshold + degraded mode" path.
- prompt-bloat renders the disabled hint with brackets visible.
- model-downgrade per-example rows now drop the dollar column under
  unknown-tier rendering (the rest of the suppression was already in
  place).

Tests: 566 passing (no regressions).
Lint: ruff back at baseline 49 errors (none new).
Type: mypy strict clean across 102 source files.

Filed bugs that remain on issue #68: #1, #2, #3, #4, #5, #12 — left
for follow-up PRs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Fixes issue #68 §12 (LAUNCH-BLOCKING).

`tj optimize` used to fail outright whenever `tj serve` was running:

  $ tj optimize
  Error: Could not open /Users/.../telemetry.duckdb read-only: IO Error:
  Could not set lock on file "...": Conflicting lock is held in ...

The original design assumed DuckDB would allow read-only attaches
alongside the daemon's write lock, but DuckDB enforces process-level
exclusivity — once one process opens the DB in write mode (the daemon),
no other process can attach, even with read_only=True.

Since `tj onboard` auto-installs and starts the daemon by default,
this regression meant the headline strategy-pivot product (cost
optimization) didn't work in the recommended operating mode.

Architecture
------------

Add /api/v1/optimize: a GET endpoint that runs build_report() on the
server side (where the daemon already has DB access) and returns the
JSON-serialised report. Mirrors the CLI's --since / --agent / --finding
/ --budget / --budget-usd surface.

Add report_from_dict() in tokenjam/core/optimize/runner.py: the symmetric
deserialiser for report_to_dict(). Reconstructs OptimizeReport including
all Wave-2 analyzer findings via a lazy-built constructor dispatch table
(loaded on first use to avoid coupling runner.py to every analyzer
module's import order). Unknown finding names are dropped silently —
keeps the CLI forward-compatible with future analyzers.

Add ApiBackend.fetch_optimize_report(): thin httpx wrapper that hits
/api/v1/optimize with the right query params.

Update cmd_optimize: when getattr(db, "conn", None) is None (i.e. the
caller is an ApiBackend), fetch the report from the daemon and run it
through report_from_dict. Otherwise (local DB), build the report
locally as before. Renderer is unchanged — same OptimizeReport input,
same code path, no API-specific rendering branch.

Handle --compare gracefully under API mode: compute_cost_diff also
uses db.conn directly. Rather than crash, the CLI now surfaces a
clear note that --compare needs the daemon stopped (or omitted).
Adding a /api/v1/cost/compare endpoint is a #68 §12 follow-up.

Side effects
------------

The unknown-plan-tier suppression behaviour is currently lost under
the API path: plan_tier_mix isn't on the API payload yet, so
_dominant_plan defaults to "api" rendering. This is the safest default
(matches historical behaviour) but means subscription users using
the daemon get the api-mode renderer until we extend the API payload
to carry plan_mix. Noted as a follow-up; not launch-blocking.

Verified end-to-end against the user's live system:
- Daemon started (tj serve, PID confirmed)
- `tj optimize --finding cache-efficacy` rendered 8 (provider, model)
  rows correctly. Exit 0.
- `tj optimize --finding model-downgrade` rendered the structural
  candidate, full caveat, examples table. Exit 0.
- `tj optimize --compare previous` surfaced the graceful "not yet
  supported via API; tj stop and re-run" note, then continued to
  render the rest of the report. Exit 0.

Tests: 566 passing (no regressions).
Lint: ruff back at baseline 49 errors (none new).
Type: mypy strict clean across 103 source files.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@anilmurty anilmurty changed the title fix(optimize): launch-blocking rendering + daemon-coexistence bugs (#68) fix(optimize): launch-blocking rendering + daemon-coexistence bugs May 28, 2026
anilmurty and others added 8 commits May 28, 2026 15:48
Previously `tj onboard --reconfigure` (no --claude-code / no --codex)
hit the same "Config already exists. Use --force" early-return as a
bare onboard with no flags. Users running e.g.
`tj onboard --reconfigure --plan max_5x` got a silent no-op without
any signal that their plan update didn't land.

The bare onboard path writes a generic project-local config from a
template; it doesn't manage plan tier, which is per-provider and lives
in [budget.<provider>] sections written by --claude-code and --codex.
So --reconfigure genuinely has nothing to do in the bare path.

Fix: explicit error pointing the user at the right flow, exit 1.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…on (#68 §4)

`tj doctor` previously reported ✗ DuckDB writable any time the daemon
held the write lock — which is the expected operating state after
`tj onboard` auto-installs the daemon. Users saw a "failure" indicator
on a healthy system.

The same doctor output already handles this gracefully for one other
check (column statistics): `i  Spans column statistics: Skipped — CLI
is running through the HTTP API fallback`. The DB-writable check now
follows the same pattern.

Fix: detect DuckDB's lock-conflict error message ("Conflicting lock is
held in ..." / "Could not set lock on file ...") and downgrade to
info level. Other DB-open errors stay as ✗ — only the expected
daemon-holds-lock case is demoted.

With this fix and §12 (PR #69), `tj doctor` is fully clean while
`tj serve` is running.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`tj budget` previously called `db.conn.execute("SELECT DISTINCT agent_id
FROM sessions")` to enumerate DB-observed agents. Under API-shim mode
(daemon up, no db.conn) that branch was skipped entirely — the table
only showed agents declared in tj.toml. Users running `tj budget` while
the daemon was running saw a misleadingly small list.

Same bug class as the other §3 / §12 cases — CLI commands that read
the DB directly silently miss data under API mode.

Fix: mirror the pattern from cmd_status — fall back to
`db.get_traces(TraceFilters(limit=200))` and pull agent IDs from
recent traces. Scope is "recently active agents" (last 200 traces)
rather than "every session ever", which is arguably the right scope
for a budget display anyway.

Verified:
- API mode (daemon up): now shows recently-active agents alongside
  config-declared ones.
- Direct-DB mode: unchanged behavior.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…§3)

`tj drift` previously did:

    elif hasattr(db, "conn"):
        rows = db.conn.execute("SELECT ... FROM drift_baselines")
        agent_ids = [r[0] for r in rows]
    else:
        agent_ids = []   # API mode: silently empty

Under daemon-up mode (db is ApiBackend, no .conn), agent_ids was set
to [] and the command exited with "No drift baselines found" — even
when drift_detected had clearly fired and baselines existed in the DB.
Highly confusing during the v0.3.0 pre-release walkthrough.

Fix has three parts:

1. ApiBackend.list_baseline_agents(): hits GET /api/v1/drift (no
   agent_id) which already returns {"agents": [...]} for every
   agent with a baseline. Extract the agent IDs.

2. ApiBackend.get_baseline(agent_id): hits GET /api/v1/drift?agent_id=X
   and converts the response back into a DriftBaseline dataclass so
   the rest of cmd_drift can call db.get_baseline() transparently.

3. cmd_drift: added an `elif hasattr(db, "list_baseline_agents")`
   branch between the direct-DB and the empty-list fallback.

The downstream loop already used db.get_baseline() and
db.get_completed_sessions() which now resolve correctly via the
ApiBackend method surface.

Verified end-to-end with the drift-demo baseline visible under
daemon mode for the first time.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
§2 — SDK silently dropped spans on 401
------------------------------------------------------------

When the SDK's HttpTransport got a 401 from tj serve (the classic
"my project-local config has secret A but the daemon was started with
global-config secret B"), it logged a single-line warning, retried 3
times with exponential backoff (6s of stalling), then left the spans
buffered. The buffered spans were dropped at process exit without
notification. Users seeing the dropped data on tj status had no
obvious signal that anything was wrong.

Fix:
- 401 fails fast — no retry, no backoff. The auth won't change just
  because we waited.
- Logged at ERROR level (not warning) so it's harder to miss, and
  includes the truncated SDK-side secret fingerprint to make the
  mismatch debuggable.
- Spans are dropped immediately (cleared from buffer) rather than
  hopelessly retained.
- New counter `dropped_auth_failures` accumulates across calls so
  tj doctor can surface cumulative loss later.

Non-401 4xx/5xx still follow the original retry-and-buffer flow.

§5 — diverged secrets between configs detected at load time
------------------------------------------------------------

The footgun that produced §2's 401s: project-local .tj/config.toml
has secret A, global ~/.config/tj/config.toml has secret B, SDK
walks the search paths and picks A, daemon was launched with B.
No warning at any point.

Fix: at load_config time, scan the other SEARCH_PATHS entries for
the same `[security] ingest_secret` field. When one exists with a
different value, emit a stderr warning naming both files.

Fires at most once per process (guarded by a module flag) so this
doesn't spam test environments or multi-call CLI invocations.

Tests:
- 5 transport tests covering 401 fail-fast, counter accumulation,
  non-401 retry-and-buffer, success path, error-log content.
- 4 config tests covering diverged-secret warning, single-config
  silence, matched-secret silence, once-per-process throttling.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
 follow-ups)

Two follow-ups to PR #69 that close the daemon-mode gaps the previous
commit explicitly punted on:

§28 — /api/v1/cost/compare
------------------------------------------------------------

Restores `tj cost --compare` and `tj optimize --compare` under
daemon-up mode. Previously these printed a "not supported" notice
and degraded to running optimize without comparison.

- New /api/v1/cost/compare route mirroring compute_cost_diff's input
  surface (since, compare keyword, agent_id, top_n).
- ApiBackend.fetch_cost_compare() — thin httpx wrapper.
- cmd_cost: routes through the API in shim mode, renders the
  resulting dict via the new _render_diff_dict helper (same visual
  structure as _render_diff, dict-of-strings input shape).
- cmd_optimize: same wiring — cost_diff_dict gets attached to JSON
  output and surfaces in the "Window comparison" section.

Direct-DB mode is unchanged: still uses compute_cost_diff locally.

§29 — plan_tier_mix on optimize API payload
------------------------------------------------------------

The optimize endpoint now returns `plan_tier_mix` (a SELECT plan_tier,
COUNT(*) FROM sessions WHERE started_at IN window). cmd_optimize
reads it from the API response in shim mode rather than defaulting
to an empty dict (which made _dominant_plan return "api" regardless
of the user's actual plan).

With this fix, subscription / local / unknown-tier framings render
correctly under the daemon — verified on a live system where the
"All sessions have unknown plan tier; dollar figures suppressed"
note now appears under tj serve mode same as direct-DB mode.

Verified end-to-end:
- tj cost --since 7d --compare previous → full diff with arrows + per-agent shifts.
- tj optimize --finding model-downgrade → unknown-tier suppression applied.
- tj optimize --compare previous → both analyzer findings AND window comparison.

Tests: 575 passing (no regressions).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…mon (#68 §6)

The alerts/drift demos always self-registered their per-agent config
via ensure_demo_agent_config, but two latent bugs meant alerts often
didn't fire during pre-release testing:

1. Helper wrote to `find_config_file()` only — first match in
   SEARCH_PATHS, project-local before global. Users who'd run
   `tj onboard --claude-code` had both a project-local and a global
   config; the daemon was launched with the global one, the helper
   updated the project-local one. Demo agent silently invisible.

2. Even when the helper wrote to the right file, the running daemon
   had loaded config at startup and didn't hot-reload. So freshly-
   added config never reached the AlertEngine until the daemon was
   restarted.

Both surfaced as "No active alerts" on demo agents during the v0.3.0
pre-release walkthrough, with no signal to the user why.

Fixes:

ensure_demo_agent_config now writes to EVERY tj config file in
SEARCH_PATHS that exists. Idempotent merge — existing user customisation
is preserved. Falls back to creating .tj/config.toml if nothing exists.

New warn_if_daemon_running helper checks for a tj serve on the default
port and prints an actionable note pointing the user at `tj stop &&
tj serve &` (or just `tj stop` for direct-DB mode).

Each of the three demos (sensitive_actions, budget_breach, drift) now
calls warn_if_daemon_running() right after the config write and before
ensure_initialised, so the user sees the heads-up before bootstrap.

Verified:
- Direct-DB mode (daemon stopped): sensitive-demo alert fires as expected.
- Daemon-up mode: clear stderr warning fires before the demo runs,
  pointing at the restart command.

Tests: 575 passing (no regressions).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…#68 §7-§10)

Four small playbook polish items from the pre-release walkthrough:

§7 — Step 3 used to claim bare `tj onboard` prompts for plan tier.
It doesn't; plan tier is per-provider and only the integration-
specific flows (--claude-code / --codex) prompt for it. Restructured
Step 3 to route plan-tier verification through `tj onboard --claude-code`
and added a check for the bare --reconfigure error path that landed
in §1.

§8 — Added "no lingering daemon from a prior install" to Prerequisites
with the launchctl / systemctl one-liners to verify. The pre-release
walkthrough showed how an old daemon could silently corrupt the rest
of the playbook (secret divergence, 401s, ghost data).

§9 — Made plan-tier examples plan-agnostic. The walkthrough was using
literal "Max 20x plan, $200/mo flat" expected output — testers with
actual max_5x / pro / plus plans had to translate. Switched the
defaults to max_5x and added a table of expected formats so the
tester picks based on their real plan.

§10 — Ruff baseline note updated. The "49 errors as of v0.3.x" line
was stale (and overly conservative). Latest run on `main` is clean.
Note now says "ruff clean (no errors)" with the practical fallback
of "if anything reports, confirm it's pre-existing on `main`".

Also applied the plan-agnostic substitution in the post-release smoke
test (it had the same max_20x assumption).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@anilmurty anilmurty changed the title fix(optimize): launch-blocking rendering + daemon-coexistence bugs fix(v0.3.0): close all 11 pre-release findings + 2 API follow-ups (closes #68) May 28, 2026
@anilmurty anilmurty merged commit 33d6b5f into main May 29, 2026
4 checks passed
@anilmurty anilmurty deleted the fix/optimize-renderer-and-daemon-coexistence branch May 29, 2026 00:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Pre-release testing findings (v0.3.0 sprint) — bugs and playbook fixes

1 participant