Running list of findings surfaced while walking through tests/manual-pre-release-testing.md against the v0.3.0 sprint work on task-5-docs-sweep. Filed proactively so we can keep updating as the playbook run continues.
Code bugs
1. tj onboard --reconfigure bare-path early-returns
The --reconfigure flag is honored inside _onboard_claude_code and _onboard_codex, but the top-level cmd_onboard early-returns with "Config already exists. Use --force to overwrite." when called without --claude-code / --codex, regardless of --reconfigure.
Fix: Make the bare-onboard path honor --reconfigure (bypass the existing-config check) or explicitly document that --reconfigure only applies to integration-specific flows.
Where: tokenjam/cli/cmd_onboard.py ~ line 39–43
2. SDK silently drops spans on HTTP 401
When the SDK's HttpTransport gets a 401 from a running tj serve (e.g., secret mismatch), it logs a single line — tj serve returned 401 on span export — and the spans are dropped. The user can easily miss the message; no fallback to direct DuckDB write; no counter exposed via tj doctor.
Repro: Have .tj/config.toml and ~/.config/tj/config.toml with different ingest_secret values, then run any example script while tj serve is up. Watch the spans never appear in tj status.
Fix options:
- Fall back to direct DB write when HTTP push fails (preferable for local-first ergonomics)
- Raise visibility: count dropped spans, surface in
tj doctor
- At minimum, change the warning to ERROR level and include the secret-fingerprint mismatch hint
Where: tokenjam/sdk/transport.py
3. tj drift doesn't surface baselines that demonstrably exist (HTTP API fallback)
After running examples/alerts_and_drift/drift_demo.py (12 baseline + 1 anomalous session) and observing drift_detected fire successfully, tj drift reports "No drift baselines found." The detector clearly built and used a baseline (otherwise the alert wouldn't fire), but the CLI doesn't surface it.
tj doctor correctly identifies other agents as "Collecting baseline: sensitive-demo (0/10), budget-demo (0/10)" but doesn't mention drift-demo — suggesting its baseline IS built but tj drift can't read it.
Likely cause: CLI is using the HTTP API fallback (because tj serve holds the DB lock), and /api/v1/drift either doesn't return baseline records correctly or cmd_drift doesn't call it with the right shape.
Where: tokenjam/cli/cmd_drift.py + tokenjam/api/routes/drift.py
4. tj doctor reports DuckDB writable: ✗ as a failure when daemon holds the lock
Doctor's "DuckDB writable" check attempts a direct DB connection and reports ✗ Could not set lock on file ... Conflicting lock is held in ... PID <daemon> as a failure. But this is expected/healthy state — the daemon is the rightful lock holder.
Doctor already handles this gracefully for another check: i Spans column statistics: Skipped — CLI is running through the HTTP API fallback. The writable check should follow the same pattern.
Fix: Detect the daemon. When it's the lock holder, downgrade to i informational ("DB lock held by daemon — this is the expected operating state").
Where: tokenjam/cli/cmd_doctor.py
UX / config-design issues
5. Project-local vs global config secret divergence is a footgun
When .tj/config.toml exists in cwd, the SDK picks up its ingest_secret. The daemon (started by launchd) reads ~/.config/tj/config.toml. These can drift silently — there's no warning when they differ, and the manifestation is the dropped-spans-on-401 issue above (bug #2). Took several minutes of diagnostic work to trace.
Fix options:
- Detect divergence at SDK startup; emit a clear warning
- Prefer the global config when the daemon is running (or vice versa, but consistently)
- Document the precedence rule explicitly in CLAUDE.md and the README
- Consider removing
.tj/config.toml from git tracking entirely — a config file in a code repo is mostly a misfeature
6. Alert/drift demos don't fire alerts without matching tj.toml config
examples/alerts_and_drift/sensitive_actions_demo.py and budget_breach_demo.py write spans for agent IDs sensitive-demo and budget-demo, but if the user's tj.toml doesn't declare those agents with the relevant config (sensitive_actions list, budget thresholds), no alerts fire — even though the demos' output says they should.
Currently the demos contain a # tj.toml needs this config comment block, but no enforcement. New users following the playbook see "No active alerts" and don't realize they missed a prerequisite.
Fix options:
- Demos self-register the required agent config on first run (best UX)
tj doctor checks for "demo agent present without matching config" and warns
- Playbook explicitly notes the prerequisite
Playbook (tests/manual-pre-release-testing.md) fixes
7. Step 3 implies bare tj onboard prompts for plan tier — it doesn't
The plan-tier prompt is only in _onboard_claude_code and _onboard_codex. Plain tj onboard doesn't prompt. Either the playbook needs to route plan-tier verification through --claude-code / --codex, or the bare path should also prompt (then ask which provider).
8. Playbook doesn't tell user to stop pre-existing daemon
Step 1/2/3 assumes a clean shell, but a daemon from a prior install can already be running. Need an explicit tj stop (or "verify no daemon already running") near the top.
9. max_20x literal in playbook examples should be plan-agnostic
The expected-output checks use Max 20x plan, $200/mo flat — tester should be able to use whichever plan they actually have on the test machine. Reframe checks around format ("Max 5x" / "Max 20x" / etc.) rather than the specific dollar denominator.
10. Ruff baseline note in playbook says "49 errors"; actual run is clean
Local run shows All checks passed! Either the baseline got cleaned up between writing the playbook and now, or the playbook was just wrong. Update to "ruff clean" or "match main baseline."
Open questions / look-into-later
11. tj budget only lists configured agents — not seen-but-unconfigured ones
After step 5, tj budget shows only defaults, budget-demo, and sensitive-demo. The pre-existing claude-code-* agents (which have real spend) aren't listed. Possibly intentional (only show agents with explicit config), but worth confirming.
Status
Playbook progress: step 5 complete; about to start step 6 (cost-optimization analyzers).
Update this issue as we surface more, then turn checked-off boxes into PR commits when we're ready to fix.
🤖 Filed by Claude Code during the v0.3.0 pre-release walkthrough.
More findings from step 6
12. tj optimize fails entirely when tj serve holds the DB lock — CRITICAL
Symptom:
$ tj optimize --finding model-downgrade
Error: Could not open /Users/anilmurty/.tj/telemetry.duckdb read-only: IO Error:
Could not set lock on file "...": Conflicting lock is held in
/Library/Frameworks/Python.framework/Versions/3.10/Resources/Python.app/Contents/MacOS/Python
(PID 89879) by user anilmurty.
Why it matters: The strategy pivot positions cost optimization as the headline product. tj optimize failing whenever the daemon is up (which is the recommended operating mode after onboard auto-installs the daemon) is a launch-blocking regression.
Why CLAUDE.md is aspirational:
tj optimize (cmd_optimize.py) ... Opens the live DB read-only so it works alongside a running tj serve.
cmd_optimize.py does call duckdb.connect(db_path, read_only=True). But DuckDB enforces process-level exclusivity — when one process has the DB open in write mode (the daemon), no other process can attach, even read-only. The docs are explicit about this: https://duckdb.org/docs/stable/connect/concurrency
Fix options (ordered by impact):
- Route
tj optimize's queries through the HTTP API. Long change — needs new /api/v1/optimize endpoints that mirror the analyzer surface. Highest ROI: makes optimize work in the recommended operating mode by default.
- Auto-detect the daemon, suggest
tj stop in the error. Quick UX fix — at least the user knows what to do. Doesn't actually solve the problem.
- Make the daemon yield the write lock for short read transactions. Probably impractical given DuckDB's locking model.
Recommendation: option 1. The analyzer logic already operates on conn.execute(SQL) — it'd port cleanly to an API endpoint that the CLI calls when db.conn is None (i.e., when the API shim is active).
Where: tokenjam/cli/cmd_optimize.py, plus new tokenjam/api/routes/optimize.py.
13. Same bug class likely affects other db.conn-direct callers
tj cost worked above because it uses the StorageBackend protocol via the ApiBackend shim. tj optimize works directly against db.conn. Need to audit other CLI commands that take the db.conn short-cut: probably tj cost --compare (the compute_cost_diff path), and any future analyzers.
14. Per-example cost_usd leaks through unknown-tier dollar suppression
When pricing_mode == "unknown", the top-level downgrade savings line correctly says "savings figures suppressed — plan tier unknown". But the per-example table immediately below still shows the original cost:
Examples:
82c68dd9.. 0 tool calls — $8.0590 (claude-opus-4-7)
^^^^^^^^^^^ leaks through
If we're suppressing dollar figures because we don't yet know whether they're real "spend" or implied-API-value, the per-example cost has the same honesty problem.
Fix: In cmd_optimize.py _render_downgrade(), when pricing_mode in {"unknown", "subscription", "local"}, suppress the cost_usd column in examples (or replace with tokens).
15. LAUNCH-BLOCKING: Wave 2 analyzers produce no CLI output — renderer ignores report.findings
Symptom: Every Wave-2 analyzer (cache-efficacy, cache-recommend, workflow-restructure, prompt-bloat) prints the same generic catch-all message:
$ tj optimize --finding cache-efficacy
No candidates flagged in this window. Either spend is small or all sessions already use a cost-effective model.
That message is the catch-all from _render_report() when both report.downgrade is None and report.budgets is empty. It is not specific to the analyzer that ran.
Root cause: Wave-2 analyzers attach their findings to report.findings[<name>] (the generic dict on OptimizeReport). But cmd_optimize.py::_render_report() only reads the typed slots (report.downgrade, report.budgets) — it never iterates report.findings. So Wave-2 analyzers run successfully, write data, and the renderer drops it on the floor.
Confirmed via JSON path:
$ tj optimize --finding cache-efficacy --json | python3 -m json.tool | head -30
{
"window": {...},
"downgrade": null,
"budgets": [],
"findings": {
"cache-efficacy": {
"rows": [
{"provider": "anthropic", "model": "claude-opus-4-7", "input_tokens": 25498, "cache_tokens": 1195766597, "efficacy": 1.0, "support": "full", "flagged": false},
...6 rows total...
]
}
}
}
JSON output is correct. CLI text output is not.
Impact: All four Wave-2 analyzers (the bulk of the strategy-pivot sprint) are effectively invisible to anyone running tj optimize interactively. Unit tests pass because they test the analyzer functions directly; the CLI text-rendering path was never wired to read report.findings.
Fix: Extend _render_report() in tokenjam/cli/cmd_optimize.py to iterate report.findings and dispatch to a per-finding renderer. Roughly:
# After rendering downgrade + budgets, before the catch-all:
for name, finding in report.findings.items():
renderer = _FINDING_RENDERERS.get(name)
if renderer is not None:
renderer(finding, pricing_mode=pricing_mode)
console.print()
Plus per-finding render functions:
_render_cache_efficacy() — per-(provider, model) table
_render_cache_recommend() — disabled hint OR breakpoint candidates list
_render_workflow_restructure() — clusters or "no clusters" message
_render_prompt_bloat() — disabled hint OR per-prompt summary table
Update the catch-all condition to also check report.findings so it only fires when truly empty.
Where: tokenjam/cli/cmd_optimize.py
Pausing playbook run here
Bugs #12 (optimize unusable while daemon up) + #15 (Wave-2 analyzers invisible in CLI) make the rest of the playbook unrunnable in any meaningful way. Resuming after fixes land.
Running list of findings surfaced while walking through
tests/manual-pre-release-testing.mdagainst the v0.3.0 sprint work ontask-5-docs-sweep. Filed proactively so we can keep updating as the playbook run continues.Code bugs
1.
tj onboard --reconfigurebare-path early-returnsThe
--reconfigureflag is honored inside_onboard_claude_codeand_onboard_codex, but the top-levelcmd_onboardearly-returns with "Config already exists. Use --force to overwrite." when called without--claude-code/--codex, regardless of--reconfigure.Fix: Make the bare-onboard path honor
--reconfigure(bypass the existing-config check) or explicitly document that--reconfigureonly applies to integration-specific flows.Where:
tokenjam/cli/cmd_onboard.py~ line 39–432. SDK silently drops spans on HTTP 401
When the SDK's
HttpTransportgets a 401 from a runningtj serve(e.g., secret mismatch), it logs a single line —tj serve returned 401 on span export— and the spans are dropped. The user can easily miss the message; no fallback to direct DuckDB write; no counter exposed viatj doctor.Repro: Have
.tj/config.tomland~/.config/tj/config.tomlwith differentingest_secretvalues, then run any example script whiletj serveis up. Watch the spans never appear intj status.Fix options:
tj doctorWhere:
tokenjam/sdk/transport.py3.
tj driftdoesn't surface baselines that demonstrably exist (HTTP API fallback)After running
examples/alerts_and_drift/drift_demo.py(12 baseline + 1 anomalous session) and observingdrift_detectedfire successfully,tj driftreports "No drift baselines found." The detector clearly built and used a baseline (otherwise the alert wouldn't fire), but the CLI doesn't surface it.tj doctorcorrectly identifies other agents as "Collecting baseline: sensitive-demo (0/10), budget-demo (0/10)" but doesn't mentiondrift-demo— suggesting its baseline IS built buttj driftcan't read it.Likely cause: CLI is using the HTTP API fallback (because
tj serveholds the DB lock), and/api/v1/drifteither doesn't return baseline records correctly orcmd_driftdoesn't call it with the right shape.Where:
tokenjam/cli/cmd_drift.py+tokenjam/api/routes/drift.py4.
tj doctorreportsDuckDB writable: ✗as a failure when daemon holds the lockDoctor's "DuckDB writable" check attempts a direct DB connection and reports
✗ Could not set lock on file ... Conflicting lock is held in ... PID <daemon>as a failure. But this is expected/healthy state — the daemon is the rightful lock holder.Doctor already handles this gracefully for another check:
i Spans column statistics: Skipped — CLI is running through the HTTP API fallback. The writable check should follow the same pattern.Fix: Detect the daemon. When it's the lock holder, downgrade to
iinformational ("DB lock held by daemon — this is the expected operating state").Where:
tokenjam/cli/cmd_doctor.pyUX / config-design issues
5. Project-local vs global config secret divergence is a footgun
When
.tj/config.tomlexists incwd, the SDK picks up itsingest_secret. The daemon (started by launchd) reads~/.config/tj/config.toml. These can drift silently — there's no warning when they differ, and the manifestation is the dropped-spans-on-401 issue above (bug #2). Took several minutes of diagnostic work to trace.Fix options:
.tj/config.tomlfrom git tracking entirely — a config file in a code repo is mostly a misfeature6. Alert/drift demos don't fire alerts without matching
tj.tomlconfigexamples/alerts_and_drift/sensitive_actions_demo.pyandbudget_breach_demo.pywrite spans for agent IDssensitive-demoandbudget-demo, but if the user'stj.tomldoesn't declare those agents with the relevant config (sensitive_actions list, budget thresholds), no alerts fire — even though the demos' output says they should.Currently the demos contain a
# tj.toml needs this configcomment block, but no enforcement. New users following the playbook see "No active alerts" and don't realize they missed a prerequisite.Fix options:
tj doctorchecks for "demo agent present without matching config" and warnsPlaybook (
tests/manual-pre-release-testing.md) fixes7. Step 3 implies bare
tj onboardprompts for plan tier — it doesn'tThe plan-tier prompt is only in
_onboard_claude_codeand_onboard_codex. Plaintj onboarddoesn't prompt. Either the playbook needs to route plan-tier verification through--claude-code/--codex, or the bare path should also prompt (then ask which provider).8. Playbook doesn't tell user to stop pre-existing daemon
Step 1/2/3 assumes a clean shell, but a daemon from a prior install can already be running. Need an explicit
tj stop(or "verify no daemon already running") near the top.9.
max_20xliteral in playbook examples should be plan-agnosticThe expected-output checks use
Max 20x plan, $200/mo flat— tester should be able to use whichever plan they actually have on the test machine. Reframe checks around format ("Max 5x" / "Max 20x" / etc.) rather than the specific dollar denominator.10. Ruff baseline note in playbook says "49 errors"; actual run is clean
Local run shows
All checks passed!Either the baseline got cleaned up between writing the playbook and now, or the playbook was just wrong. Update to "ruff clean" or "matchmainbaseline."Open questions / look-into-later
11.
tj budgetonly lists configured agents — not seen-but-unconfigured onesAfter step 5,
tj budgetshows onlydefaults,budget-demo, andsensitive-demo. The pre-existingclaude-code-*agents (which have real spend) aren't listed. Possibly intentional (only show agents with explicit config), but worth confirming.Status
Playbook progress: step 5 complete; about to start step 6 (cost-optimization analyzers).
Update this issue as we surface more, then turn checked-off boxes into PR commits when we're ready to fix.
🤖 Filed by Claude Code during the v0.3.0 pre-release walkthrough.
More findings from step 6
12.
tj optimizefails entirely whentj serveholds the DB lock — CRITICALSymptom:
Why it matters: The strategy pivot positions cost optimization as the headline product.
tj optimizefailing whenever the daemon is up (which is the recommended operating mode after onboard auto-installs the daemon) is a launch-blocking regression.Why CLAUDE.md is aspirational:
cmd_optimize.pydoes callduckdb.connect(db_path, read_only=True). But DuckDB enforces process-level exclusivity — when one process has the DB open in write mode (the daemon), no other process can attach, even read-only. The docs are explicit about this: https://duckdb.org/docs/stable/connect/concurrencyFix options (ordered by impact):
tj optimize's queries through the HTTP API. Long change — needs new/api/v1/optimizeendpoints that mirror the analyzer surface. Highest ROI: makes optimize work in the recommended operating mode by default.tj stopin the error. Quick UX fix — at least the user knows what to do. Doesn't actually solve the problem.Recommendation: option 1. The analyzer logic already operates on
conn.execute(SQL)— it'd port cleanly to an API endpoint that the CLI calls whendb.conn is None(i.e., when the API shim is active).Where:
tokenjam/cli/cmd_optimize.py, plus newtokenjam/api/routes/optimize.py.13. Same bug class likely affects other
db.conn-direct callerstj costworked above because it uses the StorageBackend protocol via the ApiBackend shim.tj optimizeworks directly againstdb.conn. Need to audit other CLI commands that take thedb.connshort-cut: probablytj cost --compare(thecompute_cost_diffpath), and any future analyzers.14. Per-example
cost_usdleaks through unknown-tier dollar suppressionWhen
pricing_mode == "unknown", the top-level downgrade savings line correctly says "savings figures suppressed — plan tier unknown". But the per-example table immediately below still shows the original cost:If we're suppressing dollar figures because we don't yet know whether they're real "spend" or implied-API-value, the per-example cost has the same honesty problem.
Fix: In
cmd_optimize.py_render_downgrade(), whenpricing_mode in {"unknown", "subscription", "local"}, suppress thecost_usdcolumn in examples (or replace with tokens).15. LAUNCH-BLOCKING: Wave 2 analyzers produce no CLI output — renderer ignores
report.findingsSymptom: Every Wave-2 analyzer (
cache-efficacy,cache-recommend,workflow-restructure,prompt-bloat) prints the same generic catch-all message:That message is the catch-all from
_render_report()when bothreport.downgrade is Noneandreport.budgetsis empty. It is not specific to the analyzer that ran.Root cause: Wave-2 analyzers attach their findings to
report.findings[<name>](the generic dict onOptimizeReport). Butcmd_optimize.py::_render_report()only reads the typed slots (report.downgrade,report.budgets) — it never iteratesreport.findings. So Wave-2 analyzers run successfully, write data, and the renderer drops it on the floor.Confirmed via JSON path:
JSON output is correct. CLI text output is not.
Impact: All four Wave-2 analyzers (the bulk of the strategy-pivot sprint) are effectively invisible to anyone running
tj optimizeinteractively. Unit tests pass because they test the analyzer functions directly; the CLI text-rendering path was never wired to readreport.findings.Fix: Extend
_render_report()intokenjam/cli/cmd_optimize.pyto iteratereport.findingsand dispatch to a per-finding renderer. Roughly:Plus per-finding render functions:
_render_cache_efficacy()— per-(provider, model) table_render_cache_recommend()— disabled hint OR breakpoint candidates list_render_workflow_restructure()— clusters or "no clusters" message_render_prompt_bloat()— disabled hint OR per-prompt summary tableUpdate the catch-all condition to also check
report.findingsso it only fires when truly empty.Where:
tokenjam/cli/cmd_optimize.pyPausing playbook run here
Bugs #12 (optimize unusable while daemon up) + #15 (Wave-2 analyzers invisible in CLI) make the rest of the playbook unrunnable in any meaningful way. Resuming after fixes land.