Pre-release testing findings (v0.3.0 sprint) — bugs and playbook fixes

Running list of findings surfaced while walking through `tests/manual-pre-release-testing.md` against the v0.3.0 sprint work on `task-5-docs-sweep`. Filed proactively so we can keep updating as the playbook run continues.

## Code bugs

### 1. `tj onboard --reconfigure` bare-path early-returns
The `--reconfigure` flag is honored inside `_onboard_claude_code` and `_onboard_codex`, but the top-level `cmd_onboard` early-returns with "Config already exists. Use --force to overwrite." when called without `--claude-code` / `--codex`, regardless of `--reconfigure`.

**Fix:** Make the bare-onboard path honor `--reconfigure` (bypass the existing-config check) or explicitly document that `--reconfigure` only applies to integration-specific flows.

**Where:** `tokenjam/cli/cmd_onboard.py` ~ line 39–43

---

### 2. SDK silently drops spans on HTTP 401
When the SDK's `HttpTransport` gets a 401 from a running `tj serve` (e.g., secret mismatch), it logs a single line — `tj serve returned 401 on span export` — and the spans are dropped. The user can easily miss the message; no fallback to direct DuckDB write; no counter exposed via `tj doctor`.

**Repro:** Have `.tj/config.toml` and `~/.config/tj/config.toml` with different `ingest_secret` values, then run any example script while `tj serve` is up. Watch the spans never appear in `tj status`.

**Fix options:**
- Fall back to direct DB write when HTTP push fails (preferable for local-first ergonomics)
- Raise visibility: count dropped spans, surface in `tj doctor`
- At minimum, change the warning to ERROR level and include the secret-fingerprint mismatch hint

**Where:** `tokenjam/sdk/transport.py`

---

### 3. `tj drift` doesn't surface baselines that demonstrably exist (HTTP API fallback)
After running `examples/alerts_and_drift/drift_demo.py` (12 baseline + 1 anomalous session) and observing `drift_detected` fire successfully, `tj drift` reports "No drift baselines found." The detector clearly built and used a baseline (otherwise the alert wouldn't fire), but the CLI doesn't surface it.

`tj doctor` correctly identifies other agents as "Collecting baseline: sensitive-demo (0/10), budget-demo (0/10)" but doesn't mention `drift-demo` — suggesting its baseline IS built but `tj drift` can't read it.

Likely cause: CLI is using the HTTP API fallback (because `tj serve` holds the DB lock), and `/api/v1/drift` either doesn't return baseline records correctly or `cmd_drift` doesn't call it with the right shape.

**Where:** `tokenjam/cli/cmd_drift.py` + `tokenjam/api/routes/drift.py`

---

### 4. `tj doctor` reports `DuckDB writable: ✗` as a failure when daemon holds the lock
Doctor's "DuckDB writable" check attempts a direct DB connection and reports `✗ Could not set lock on file ... Conflicting lock is held in ... PID <daemon>` as a failure. But this is expected/healthy state — the daemon is the rightful lock holder.

Doctor already handles this gracefully for another check: `i Spans column statistics: Skipped — CLI is running through the HTTP API fallback`. The writable check should follow the same pattern.

**Fix:** Detect the daemon. When it's the lock holder, downgrade to `i` informational ("DB lock held by daemon — this is the expected operating state").

**Where:** `tokenjam/cli/cmd_doctor.py`

---

## UX / config-design issues

### 5. Project-local vs global config secret divergence is a footgun
When `.tj/config.toml` exists in `cwd`, the SDK picks up its `ingest_secret`. The daemon (started by launchd) reads `~/.config/tj/config.toml`. These can drift silently — there's no warning when they differ, and the manifestation is the dropped-spans-on-401 issue above (bug #2). Took several minutes of diagnostic work to trace.

**Fix options:**
- Detect divergence at SDK startup; emit a clear warning
- Prefer the global config when the daemon is running (or vice versa, but consistently)
- Document the precedence rule explicitly in CLAUDE.md and the README
- Consider removing `.tj/config.toml` from git tracking entirely — a config file in a code repo is mostly a misfeature

---

### 6. Alert/drift demos don't fire alerts without matching `tj.toml` config
`examples/alerts_and_drift/sensitive_actions_demo.py` and `budget_breach_demo.py` write spans for agent IDs `sensitive-demo` and `budget-demo`, but if the user's `tj.toml` doesn't declare those agents with the relevant config (sensitive_actions list, budget thresholds), no alerts fire — even though the demos' output says they should.

Currently the demos contain a `# tj.toml needs this config` comment block, but no enforcement. New users following the playbook see "No active alerts" and don't realize they missed a prerequisite.

**Fix options:**
- Demos self-register the required agent config on first run (best UX)
- `tj doctor` checks for "demo agent present without matching config" and warns
- Playbook explicitly notes the prerequisite

---

## Playbook (`tests/manual-pre-release-testing.md`) fixes

### 7. Step 3 implies bare `tj onboard` prompts for plan tier — it doesn't
The plan-tier prompt is only in `_onboard_claude_code` and `_onboard_codex`. Plain `tj onboard` doesn't prompt. Either the playbook needs to route plan-tier verification through `--claude-code` / `--codex`, or the bare path should also prompt (then ask which provider).

### 8. Playbook doesn't tell user to stop pre-existing daemon
Step 1/2/3 assumes a clean shell, but a daemon from a prior install can already be running. Need an explicit `tj stop` (or "verify no daemon already running") near the top.

### 9. `max_20x` literal in playbook examples should be plan-agnostic
The expected-output checks use `Max 20x plan, $200/mo flat` — tester should be able to use whichever plan they actually have on the test machine. Reframe checks around format ("Max 5x" / "Max 20x" / etc.) rather than the specific dollar denominator.

### 10. Ruff baseline note in playbook says "49 errors"; actual run is clean
Local run shows `All checks passed!` Either the baseline got cleaned up between writing the playbook and now, or the playbook was just wrong. Update to "ruff clean" or "match `main` baseline."

---

## Open questions / look-into-later

### 11. `tj budget` only lists configured agents — not seen-but-unconfigured ones
After step 5, `tj budget` shows only `defaults`, `budget-demo`, and `sensitive-demo`. The pre-existing `claude-code-*` agents (which have real spend) aren't listed. Possibly intentional (only show agents with explicit config), but worth confirming.

---

## Status

Playbook progress: step 5 complete; about to start step 6 (cost-optimization analyzers).

Update this issue as we surface more, then turn checked-off boxes into PR commits when we're ready to fix.

🤖 Filed by Claude Code during the v0.3.0 pre-release walkthrough.


---

## More findings from step 6

### 12. `tj optimize` fails entirely when `tj serve` holds the DB lock — *CRITICAL*

**Symptom:**
```
$ tj optimize --finding model-downgrade
Error: Could not open /Users/anilmurty/.tj/telemetry.duckdb read-only: IO Error: 
Could not set lock on file "...": Conflicting lock is held in 
/Library/Frameworks/Python.framework/Versions/3.10/Resources/Python.app/Contents/MacOS/Python 
(PID 89879) by user anilmurty.
```

**Why it matters:** The strategy pivot positions cost optimization as the headline product. `tj optimize` failing whenever the daemon is up (which is the *recommended* operating mode after onboard auto-installs the daemon) is a launch-blocking regression.

**Why CLAUDE.md is aspirational:**
> `tj optimize` (`cmd_optimize.py`) ... Opens the live DB read-only so it works alongside a running `tj serve`.

`cmd_optimize.py` does call `duckdb.connect(db_path, read_only=True)`. But DuckDB enforces process-level exclusivity — when one process has the DB open in write mode (the daemon), no other process can attach, even read-only. The docs are explicit about this: https://duckdb.org/docs/stable/connect/concurrency

**Fix options (ordered by impact):**

1. **Route `tj optimize`'s queries through the HTTP API.** Long change — needs new `/api/v1/optimize` endpoints that mirror the analyzer surface. Highest ROI: makes optimize work in the recommended operating mode by default.
2. **Auto-detect the daemon, suggest `tj stop` in the error.** Quick UX fix — at least the user knows what to do. Doesn't actually solve the problem.
3. **Make the daemon yield the write lock for short read transactions.** Probably impractical given DuckDB's locking model.

**Recommendation:** option 1. The analyzer logic already operates on `conn.execute(SQL)` — it'd port cleanly to an API endpoint that the CLI calls when `db.conn is None` (i.e., when the API shim is active).

**Where:** `tokenjam/cli/cmd_optimize.py`, plus new `tokenjam/api/routes/optimize.py`.

### 13. Same bug class likely affects other `db.conn`-direct callers
`tj cost` worked above because it uses the StorageBackend protocol via the ApiBackend shim. `tj optimize` works directly against `db.conn`. Need to audit other CLI commands that take the `db.conn` short-cut: probably `tj cost --compare` (the `compute_cost_diff` path), and any future analyzers.



### 14. Per-example `cost_usd` leaks through unknown-tier dollar suppression

When `pricing_mode == "unknown"`, the top-level downgrade savings line correctly says "savings figures suppressed — plan tier unknown". But the per-example table immediately below still shows the original cost:

```
Examples:
  82c68dd9..  0 tool calls   —   $8.0590  (claude-opus-4-7)
                              ^^^^^^^^^^^ leaks through
```

If we're suppressing dollar figures because we don't yet know whether they're real "spend" or implied-API-value, the per-example cost has the same honesty problem.

**Fix:** In `cmd_optimize.py` `_render_downgrade()`, when `pricing_mode in {"unknown", "subscription", "local"}`, suppress the `cost_usd` column in examples (or replace with tokens).



### 15. **LAUNCH-BLOCKING:** Wave 2 analyzers produce no CLI output — renderer ignores `report.findings`

**Symptom:** Every Wave-2 analyzer (`cache-efficacy`, `cache-recommend`, `workflow-restructure`, `prompt-bloat`) prints the same generic catch-all message:

```
$ tj optimize --finding cache-efficacy
No candidates flagged in this window. Either spend is small or all sessions already use a cost-effective model.
```

That message is the *catch-all from `_render_report()` when both `report.downgrade is None` and `report.budgets` is empty*. It is **not** specific to the analyzer that ran.

**Root cause:** Wave-2 analyzers attach their findings to `report.findings[<name>]` (the generic dict on `OptimizeReport`). But `cmd_optimize.py::_render_report()` only reads the typed slots (`report.downgrade`, `report.budgets`) — it never iterates `report.findings`. So Wave-2 analyzers run successfully, write data, and the renderer drops it on the floor.

**Confirmed via JSON path:**

```bash
$ tj optimize --finding cache-efficacy --json | python3 -m json.tool | head -30
{
    "window": {...},
    "downgrade": null,
    "budgets": [],
    "findings": {
        "cache-efficacy": {
            "rows": [
                {"provider": "anthropic", "model": "claude-opus-4-7", "input_tokens": 25498, "cache_tokens": 1195766597, "efficacy": 1.0, "support": "full", "flagged": false},
                ...6 rows total...
            ]
        }
    }
}
```

JSON output is correct. CLI text output is not.

**Impact:** All four Wave-2 analyzers (the bulk of the strategy-pivot sprint) are effectively invisible to anyone running `tj optimize` interactively. Unit tests pass because they test the analyzer functions directly; the CLI text-rendering path was never wired to read `report.findings`.

**Fix:** Extend `_render_report()` in `tokenjam/cli/cmd_optimize.py` to iterate `report.findings` and dispatch to a per-finding renderer. Roughly:

```python
# After rendering downgrade + budgets, before the catch-all:
for name, finding in report.findings.items():
    renderer = _FINDING_RENDERERS.get(name)
    if renderer is not None:
        renderer(finding, pricing_mode=pricing_mode)
        console.print()
```

Plus per-finding render functions:
- `_render_cache_efficacy()` — per-(provider, model) table
- `_render_cache_recommend()` — disabled hint OR breakpoint candidates list
- `_render_workflow_restructure()` — clusters or "no clusters" message
- `_render_prompt_bloat()` — disabled hint OR per-prompt summary table

Update the catch-all condition to also check `report.findings` so it only fires when truly empty.

**Where:** `tokenjam/cli/cmd_optimize.py`

---

## Pausing playbook run here

Bugs #12 (optimize unusable while daemon up) + #15 (Wave-2 analyzers invisible in CLI) make the rest of the playbook unrunnable in any meaningful way. Resuming after fixes land.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pre-release testing findings (v0.3.0 sprint) — bugs and playbook fixes #68

Code bugs

1. `tj onboard --reconfigure` bare-path early-returns

2. SDK silently drops spans on HTTP 401

3. `tj drift` doesn't surface baselines that demonstrably exist (HTTP API fallback)

4. `tj doctor` reports `DuckDB writable: ✗` as a failure when daemon holds the lock

UX / config-design issues

5. Project-local vs global config secret divergence is a footgun

6. Alert/drift demos don't fire alerts without matching `tj.toml` config

Playbook (`tests/manual-pre-release-testing.md`) fixes

7. Step 3 implies bare `tj onboard` prompts for plan tier — it doesn't

8. Playbook doesn't tell user to stop pre-existing daemon

9. `max_20x` literal in playbook examples should be plan-agnostic

10. Ruff baseline note in playbook says "49 errors"; actual run is clean

Open questions / look-into-later

11. `tj budget` only lists configured agents — not seen-but-unconfigured ones

Status

More findings from step 6

12. `tj optimize` fails entirely when `tj serve` holds the DB lock — CRITICAL

13. Same bug class likely affects other `db.conn`-direct callers

14. Per-example `cost_usd` leaks through unknown-tier dollar suppression

15. LAUNCH-BLOCKING: Wave 2 analyzers produce no CLI output — renderer ignores `report.findings`

Pausing playbook run here

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Pre-release testing findings (v0.3.0 sprint) — bugs and playbook fixes #68

Description

Code bugs

1. tj onboard --reconfigure bare-path early-returns

2. SDK silently drops spans on HTTP 401

3. tj drift doesn't surface baselines that demonstrably exist (HTTP API fallback)

4. tj doctor reports DuckDB writable: ✗ as a failure when daemon holds the lock

UX / config-design issues

5. Project-local vs global config secret divergence is a footgun

6. Alert/drift demos don't fire alerts without matching tj.toml config

Playbook (tests/manual-pre-release-testing.md) fixes

7. Step 3 implies bare tj onboard prompts for plan tier — it doesn't

8. Playbook doesn't tell user to stop pre-existing daemon

9. max_20x literal in playbook examples should be plan-agnostic

10. Ruff baseline note in playbook says "49 errors"; actual run is clean

Open questions / look-into-later

11. tj budget only lists configured agents — not seen-but-unconfigured ones

Status

More findings from step 6

12. tj optimize fails entirely when tj serve holds the DB lock — CRITICAL

13. Same bug class likely affects other db.conn-direct callers

14. Per-example cost_usd leaks through unknown-tier dollar suppression

15. LAUNCH-BLOCKING: Wave 2 analyzers produce no CLI output — renderer ignores report.findings

Pausing playbook run here

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

1. `tj onboard --reconfigure` bare-path early-returns

3. `tj drift` doesn't surface baselines that demonstrably exist (HTTP API fallback)

4. `tj doctor` reports `DuckDB writable: ✗` as a failure when daemon holds the lock

6. Alert/drift demos don't fire alerts without matching `tj.toml` config

Playbook (`tests/manual-pre-release-testing.md`) fixes

7. Step 3 implies bare `tj onboard` prompts for plan tier — it doesn't

9. `max_20x` literal in playbook examples should be plan-agnostic

11. `tj budget` only lists configured agents — not seen-but-unconfigured ones

12. `tj optimize` fails entirely when `tj serve` holds the DB lock — CRITICAL

13. Same bug class likely affects other `db.conn`-direct callers

14. Per-example `cost_usd` leaks through unknown-tier dollar suppression

15. LAUNCH-BLOCKING: Wave 2 analyzers produce no CLI output — renderer ignores `report.findings`