fix(v0.3.0): close all 11 pre-release findings + 2 API follow-ups (closes #68)#69
Merged
Merged
Conversation
Fixes issue #68 §15 (LAUNCH-BLOCKING), §14, and §16. #15 (renderer gap) — Wave-2 analyzers (cache-efficacy, cache-recommend, workflow-restructure, prompt-bloat) attach their results to `OptimizeReport.findings[<name>]` (the generic dict), but the CLI's `_render_report` only read the typed slots `report.downgrade` and `report.budgets`. All four analyzers ran successfully, populated the findings dict, and the renderer silently dropped them — every run fell through to the catch-all "No candidates flagged" message. Fix: dispatch on `report.findings` through a `_FINDING_RENDERERS` table. Each Wave-2 analyzer gets a dedicated renderer that knows how to surface its specific shape (per-(provider,model) table for cache-efficacy, prefix-candidate list for cache-recommend, cluster list for workflow-restructure, top-N bloat-prompts for prompt-bloat). The catch-all "No candidates flagged" condition is updated to fire only when nothing rendered — typed slots empty AND no Wave-2 findings. #14 (per-example cost leak under unknown-tier suppression) — when `pricing_mode != "api"` the top-level downgrade savings line correctly suppresses dollar figures, but the per-example table immediately below was still rendering `format_cost(ex.cost_usd)`. The honesty discipline should apply consistently: drop the cost column from the example rows under unknown/subscription/local pricing modes. #16 (Rich-markup escape in hint strings) — analyzer hint strings reference TOML sections like `[capture] prompts = true`. Rich's console.print() interprets square brackets as style tags and silently strips them, so the rendered hint became "Enable ` prompts = true`". Fix: wrap analyzer hints in `rich.markup.escape()` before printing. Keeps the technically-correct TOML notation in the analyzer source (other consumers — JSON output, the API, docs — get the literal string they expect). Also drops a similarly-bracketed `[bold]...[/bold]` reference inside the workflow_restructure "degraded mode" notice that was visible-but- ugly with the bracket markup leaking through. Verified end-to-end against the user's real DB (claude-code-tokenjam data with 5.8M tokens, 36 sessions): - cache-efficacy renders 8 (provider, model) rows correctly, including the OpenAI best-effort caveat and the Anthropic full-support rows. - cache-recommend renders the disabled hint with `[capture] prompts = true` visible verbatim. - workflow-restructure renders the "examined N / no clusters above threshold + degraded mode" path. - prompt-bloat renders the disabled hint with brackets visible. - model-downgrade per-example rows now drop the dollar column under unknown-tier rendering (the rest of the suppression was already in place). Tests: 566 passing (no regressions). Lint: ruff back at baseline 49 errors (none new). Type: mypy strict clean across 102 source files. Filed bugs that remain on issue #68: #1, #2, #3, #4, #5, #12 — left for follow-up PRs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Fixes issue #68 §12 (LAUNCH-BLOCKING). `tj optimize` used to fail outright whenever `tj serve` was running: $ tj optimize Error: Could not open /Users/.../telemetry.duckdb read-only: IO Error: Could not set lock on file "...": Conflicting lock is held in ... The original design assumed DuckDB would allow read-only attaches alongside the daemon's write lock, but DuckDB enforces process-level exclusivity — once one process opens the DB in write mode (the daemon), no other process can attach, even with read_only=True. Since `tj onboard` auto-installs and starts the daemon by default, this regression meant the headline strategy-pivot product (cost optimization) didn't work in the recommended operating mode. Architecture ------------ Add /api/v1/optimize: a GET endpoint that runs build_report() on the server side (where the daemon already has DB access) and returns the JSON-serialised report. Mirrors the CLI's --since / --agent / --finding / --budget / --budget-usd surface. Add report_from_dict() in tokenjam/core/optimize/runner.py: the symmetric deserialiser for report_to_dict(). Reconstructs OptimizeReport including all Wave-2 analyzer findings via a lazy-built constructor dispatch table (loaded on first use to avoid coupling runner.py to every analyzer module's import order). Unknown finding names are dropped silently — keeps the CLI forward-compatible with future analyzers. Add ApiBackend.fetch_optimize_report(): thin httpx wrapper that hits /api/v1/optimize with the right query params. Update cmd_optimize: when getattr(db, "conn", None) is None (i.e. the caller is an ApiBackend), fetch the report from the daemon and run it through report_from_dict. Otherwise (local DB), build the report locally as before. Renderer is unchanged — same OptimizeReport input, same code path, no API-specific rendering branch. Handle --compare gracefully under API mode: compute_cost_diff also uses db.conn directly. Rather than crash, the CLI now surfaces a clear note that --compare needs the daemon stopped (or omitted). Adding a /api/v1/cost/compare endpoint is a #68 §12 follow-up. Side effects ------------ The unknown-plan-tier suppression behaviour is currently lost under the API path: plan_tier_mix isn't on the API payload yet, so _dominant_plan defaults to "api" rendering. This is the safest default (matches historical behaviour) but means subscription users using the daemon get the api-mode renderer until we extend the API payload to carry plan_mix. Noted as a follow-up; not launch-blocking. Verified end-to-end against the user's live system: - Daemon started (tj serve, PID confirmed) - `tj optimize --finding cache-efficacy` rendered 8 (provider, model) rows correctly. Exit 0. - `tj optimize --finding model-downgrade` rendered the structural candidate, full caveat, examples table. Exit 0. - `tj optimize --compare previous` surfaced the graceful "not yet supported via API; tj stop and re-run" note, then continued to render the rest of the report. Exit 0. Tests: 566 passing (no regressions). Lint: ruff back at baseline 49 errors (none new). Type: mypy strict clean across 103 source files. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previously `tj onboard --reconfigure` (no --claude-code / no --codex) hit the same "Config already exists. Use --force" early-return as a bare onboard with no flags. Users running e.g. `tj onboard --reconfigure --plan max_5x` got a silent no-op without any signal that their plan update didn't land. The bare onboard path writes a generic project-local config from a template; it doesn't manage plan tier, which is per-provider and lives in [budget.<provider>] sections written by --claude-code and --codex. So --reconfigure genuinely has nothing to do in the bare path. Fix: explicit error pointing the user at the right flow, exit 1. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…on (#68 §4) `tj doctor` previously reported ✗ DuckDB writable any time the daemon held the write lock — which is the expected operating state after `tj onboard` auto-installs the daemon. Users saw a "failure" indicator on a healthy system. The same doctor output already handles this gracefully for one other check (column statistics): `i Spans column statistics: Skipped — CLI is running through the HTTP API fallback`. The DB-writable check now follows the same pattern. Fix: detect DuckDB's lock-conflict error message ("Conflicting lock is held in ..." / "Could not set lock on file ...") and downgrade to info level. Other DB-open errors stay as ✗ — only the expected daemon-holds-lock case is demoted. With this fix and §12 (PR #69), `tj doctor` is fully clean while `tj serve` is running. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`tj budget` previously called `db.conn.execute("SELECT DISTINCT agent_id
FROM sessions")` to enumerate DB-observed agents. Under API-shim mode
(daemon up, no db.conn) that branch was skipped entirely — the table
only showed agents declared in tj.toml. Users running `tj budget` while
the daemon was running saw a misleadingly small list.
Same bug class as the other §3 / §12 cases — CLI commands that read
the DB directly silently miss data under API mode.
Fix: mirror the pattern from cmd_status — fall back to
`db.get_traces(TraceFilters(limit=200))` and pull agent IDs from
recent traces. Scope is "recently active agents" (last 200 traces)
rather than "every session ever", which is arguably the right scope
for a budget display anyway.
Verified:
- API mode (daemon up): now shows recently-active agents alongside
config-declared ones.
- Direct-DB mode: unchanged behavior.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…§3)
`tj drift` previously did:
elif hasattr(db, "conn"):
rows = db.conn.execute("SELECT ... FROM drift_baselines")
agent_ids = [r[0] for r in rows]
else:
agent_ids = [] # API mode: silently empty
Under daemon-up mode (db is ApiBackend, no .conn), agent_ids was set
to [] and the command exited with "No drift baselines found" — even
when drift_detected had clearly fired and baselines existed in the DB.
Highly confusing during the v0.3.0 pre-release walkthrough.
Fix has three parts:
1. ApiBackend.list_baseline_agents(): hits GET /api/v1/drift (no
agent_id) which already returns {"agents": [...]} for every
agent with a baseline. Extract the agent IDs.
2. ApiBackend.get_baseline(agent_id): hits GET /api/v1/drift?agent_id=X
and converts the response back into a DriftBaseline dataclass so
the rest of cmd_drift can call db.get_baseline() transparently.
3. cmd_drift: added an `elif hasattr(db, "list_baseline_agents")`
branch between the direct-DB and the empty-list fallback.
The downstream loop already used db.get_baseline() and
db.get_completed_sessions() which now resolve correctly via the
ApiBackend method surface.
Verified end-to-end with the drift-demo baseline visible under
daemon mode for the first time.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
§2 — SDK silently dropped spans on 401 ------------------------------------------------------------ When the SDK's HttpTransport got a 401 from tj serve (the classic "my project-local config has secret A but the daemon was started with global-config secret B"), it logged a single-line warning, retried 3 times with exponential backoff (6s of stalling), then left the spans buffered. The buffered spans were dropped at process exit without notification. Users seeing the dropped data on tj status had no obvious signal that anything was wrong. Fix: - 401 fails fast — no retry, no backoff. The auth won't change just because we waited. - Logged at ERROR level (not warning) so it's harder to miss, and includes the truncated SDK-side secret fingerprint to make the mismatch debuggable. - Spans are dropped immediately (cleared from buffer) rather than hopelessly retained. - New counter `dropped_auth_failures` accumulates across calls so tj doctor can surface cumulative loss later. Non-401 4xx/5xx still follow the original retry-and-buffer flow. §5 — diverged secrets between configs detected at load time ------------------------------------------------------------ The footgun that produced §2's 401s: project-local .tj/config.toml has secret A, global ~/.config/tj/config.toml has secret B, SDK walks the search paths and picks A, daemon was launched with B. No warning at any point. Fix: at load_config time, scan the other SEARCH_PATHS entries for the same `[security] ingest_secret` field. When one exists with a different value, emit a stderr warning naming both files. Fires at most once per process (guarded by a module flag) so this doesn't spam test environments or multi-call CLI invocations. Tests: - 5 transport tests covering 401 fail-fast, counter accumulation, non-401 retry-and-buffer, success path, error-log content. - 4 config tests covering diverged-secret warning, single-config silence, matched-secret silence, once-per-process throttling. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
follow-ups) Two follow-ups to PR #69 that close the daemon-mode gaps the previous commit explicitly punted on: §28 — /api/v1/cost/compare ------------------------------------------------------------ Restores `tj cost --compare` and `tj optimize --compare` under daemon-up mode. Previously these printed a "not supported" notice and degraded to running optimize without comparison. - New /api/v1/cost/compare route mirroring compute_cost_diff's input surface (since, compare keyword, agent_id, top_n). - ApiBackend.fetch_cost_compare() — thin httpx wrapper. - cmd_cost: routes through the API in shim mode, renders the resulting dict via the new _render_diff_dict helper (same visual structure as _render_diff, dict-of-strings input shape). - cmd_optimize: same wiring — cost_diff_dict gets attached to JSON output and surfaces in the "Window comparison" section. Direct-DB mode is unchanged: still uses compute_cost_diff locally. §29 — plan_tier_mix on optimize API payload ------------------------------------------------------------ The optimize endpoint now returns `plan_tier_mix` (a SELECT plan_tier, COUNT(*) FROM sessions WHERE started_at IN window). cmd_optimize reads it from the API response in shim mode rather than defaulting to an empty dict (which made _dominant_plan return "api" regardless of the user's actual plan). With this fix, subscription / local / unknown-tier framings render correctly under the daemon — verified on a live system where the "All sessions have unknown plan tier; dollar figures suppressed" note now appears under tj serve mode same as direct-DB mode. Verified end-to-end: - tj cost --since 7d --compare previous → full diff with arrows + per-agent shifts. - tj optimize --finding model-downgrade → unknown-tier suppression applied. - tj optimize --compare previous → both analyzer findings AND window comparison. Tests: 575 passing (no regressions). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…mon (#68 §6) The alerts/drift demos always self-registered their per-agent config via ensure_demo_agent_config, but two latent bugs meant alerts often didn't fire during pre-release testing: 1. Helper wrote to `find_config_file()` only — first match in SEARCH_PATHS, project-local before global. Users who'd run `tj onboard --claude-code` had both a project-local and a global config; the daemon was launched with the global one, the helper updated the project-local one. Demo agent silently invisible. 2. Even when the helper wrote to the right file, the running daemon had loaded config at startup and didn't hot-reload. So freshly- added config never reached the AlertEngine until the daemon was restarted. Both surfaced as "No active alerts" on demo agents during the v0.3.0 pre-release walkthrough, with no signal to the user why. Fixes: ensure_demo_agent_config now writes to EVERY tj config file in SEARCH_PATHS that exists. Idempotent merge — existing user customisation is preserved. Falls back to creating .tj/config.toml if nothing exists. New warn_if_daemon_running helper checks for a tj serve on the default port and prints an actionable note pointing the user at `tj stop && tj serve &` (or just `tj stop` for direct-DB mode). Each of the three demos (sensitive_actions, budget_breach, drift) now calls warn_if_daemon_running() right after the config write and before ensure_initialised, so the user sees the heads-up before bootstrap. Verified: - Direct-DB mode (daemon stopped): sensitive-demo alert fires as expected. - Daemon-up mode: clear stderr warning fires before the demo runs, pointing at the restart command. Tests: 575 passing (no regressions). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…#68 §7-§10) Four small playbook polish items from the pre-release walkthrough: §7 — Step 3 used to claim bare `tj onboard` prompts for plan tier. It doesn't; plan tier is per-provider and only the integration- specific flows (--claude-code / --codex) prompt for it. Restructured Step 3 to route plan-tier verification through `tj onboard --claude-code` and added a check for the bare --reconfigure error path that landed in §1. §8 — Added "no lingering daemon from a prior install" to Prerequisites with the launchctl / systemctl one-liners to verify. The pre-release walkthrough showed how an old daemon could silently corrupt the rest of the playbook (secret divergence, 401s, ghost data). §9 — Made plan-tier examples plan-agnostic. The walkthrough was using literal "Max 20x plan, $200/mo flat" expected output — testers with actual max_5x / pro / plus plans had to translate. Switched the defaults to max_5x and added a table of expected formats so the tester picks based on their real plan. §10 — Ruff baseline note updated. The "49 errors as of v0.3.x" line was stale (and overly conservative). Latest run on `main` is clean. Note now says "ruff clean (no errors)" with the practical fallback of "if anything reports, confirm it's pre-existing on `main`". Also applied the plan-agnostic substitution in the post-release smoke test (it had the same max_20x assumption). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #68. Closes all 11 numbered findings from the pre-release walkthrough plus the two API-mode follow-ups we noted while fixing §12. Originally opened against only the two launch-blockers (§12 + §15) — scope expanded after triage so the v0.3.0 release tag goes out with a clean slate.
Commits (oldest → newest)
92b08113f146e3/api/v1/optimizeroute +report_from_dictdeserializer;tj optimizeno longer crashes alongside daemoncced338tj onboard --reconfigure(it's a no-op without--claude-code/--codex)cd1f78ftj doctordowngrades DB-writable from✗toiwhen the daemon holds the lock9bc8562tj budgetenumerates recently-active agents viaget_traces()under API-shim mode0770966tj driftsurfaces baselines via newApiBackend.list_baseline_agents()+get_baseline()12f0c36load_configwarns on diverged secrets across project-local and global configse78608d/api/v1/cost/compareroute restores--compareunder daemon mode;plan_tier_mixnow included in optimize API payload5d938feensure_demo_agent_configwrites to every config file in SEARCH_PATHS (not just the first match); demos now callwarn_if_daemon_running()with the actionable restart hint2c4653eWhat this enables
tj optimizeworks in the recommended operating mode. That's the strategy-pivot product. The combination of92b0811(Wave-2 renderer) and3f146e3(API route) means every--findingruns and renders correctly whether the daemon is up or down. Verified end-to-end on the live system.tj doctorcleanly reports a healthy system. Withcd1f78f, doctor under daemon mode is now all ✓ / i, no false ✗. Exit 0.tj driftsurfaces baselines. §3 was confusing during the walkthrough — drift_detected fired buttj driftreported "no baselines found" because the CLI went through the API path that didn't enumerate. Fixed at0770966.No more silent data loss. §2 was the worst footgun in the codebase: SDK got 401s, dropped spans, logged a single warning line. Now: ERROR-level loud message with the truncated secret fingerprint, no retry (won't change), counter for cumulative loss. Plus
12f0c36adds a config-load-time warning when project-local and global secrets diverge — the upstream cause of most 401s.tj budgetandtj cost --comparework everywhere. Both used to silently degrade under daemon mode. Now full functionality regardless of daemon state.Demos actually fire alerts. §6's silent-no-fire was caused by the same secret-divergence class of bug plus the daemon's no-hot-reload behavior. Helper now writes to every config; demos warn if daemon is up.
What I changed in the playbook
tj onboard --claude-code(the bare path doesn't prompt —tj onboardis provider-agnostic).max_5x(closer to the typical tester's actual plan), with a format table so the tester picks their own.main".Test plan
pytest tests/unit tests/synthetic tests/agents tests/integration)tj cost --compareworks under daemon,tj driftshows baselines under daemon,tj doctorexits 0 under daemon🤖 Generated with Claude Code