docs: post-sprint sweep — analyzer docs, installation, semconv, release-testing playbooks by anilmurty · Pull Request #67 · Metabuilder-Labs/tokenjam

anilmurty · 2026-05-28T17:59:53Z

Cross-cutting docs pass after the May-26 strategy-pivot sprint (PR #66) landed. Covers everything the new code surfaces but the original PR didn't document yet — plus the housekeeping that fell out of the sprint queue.

Commits

4 commits, no code changes. All four are docs-only:

Commit	What
`0d2895d`	Task-5 consistency sweep across `docs/`
`2b3037c`	CLAUDE.md "Further Reading" expansion + remove sprint task queue
`0ae522d`	Drop sprint-internal v1.1 spec; keep `docs/internal/specs/` for future canonical specs
`c4d2fea`	Rewrite manual release-testing playbooks for v0.3.x

What landed

New docs

docs/installation.md — base install vs optional-extras matrix. Documents tokenjam[bloat] (~2GB LLMLingua/torch extra used by the Trim analyzer) with the rationale for keeping it out of base; lists all framework adapter extras plus [mcp] / [dev]. New first-run guide.
docs/optimize/downsize.md — the one optimize product that didn't have its own page. Documents the model-downgrade analyzer with the candidate-pair table, plan-tier-aware rendering branches, and the mandatory caveat.
docs/backfill/helicone.md and docs/backfill/otlp.md — match the langfuse.md structure (modes / field mapping / idempotency / limitations).
docs/architecture.md → "OTel semconv extensions" section — full derivation rules for tokenjam.billing_account (span attribute) + tokenjam.plan_tier (session-level), pricing_mode mapping, and why plan_tier lives on SessionRecord rather than each span.

Updated docs

docs/backfill/overview.md — lists all four sources (claude-code / langfuse / helicone / otlp) with partnership-posture framing.
docs/optimize/cache.md / script.md / trim.md — consistent treatment: heading leads with the product name, CLI flag example up front, "See also" cross-links to the other three product pages.
CHANGELOG.md — reorganised ## Unreleased as a coherent release narrative (headline → product groupings → Changed / Internal). Facts unchanged.
CLAUDE.md
- Expanded "Further Reading" from a single architecture.md pointer into a structured list pointing at every new docs page.
- Cross-reference from the core/models.py Key-Modules entry to the new "OTel semconv extensions" section in architecture.md.
- Drop the bullet referencing docs/internal/specs/v1.1-honest-output.md (file removed; see below).

Sprint-internal cleanup

Removed docs/internal/tasks/ (14 files — the sprint's per-task specs + decisions-locked.md). Sprint is merged via Strategy pivot: Layer-9 cost-optimization analyzers + ingest adapters #66; the per-task specs were ephemeral by design and risk drift if kept.
Removed docs/internal/specs/v1.1-honest-output.md (the spec is now captured in the merged code + docs/optimize/downsize.md + CHANGELOG). Kept docs/internal/specs/ (as .gitkeep) so future features can drop canonical specs there when warranted.

Manual release-testing playbooks rewritten

tests/manual-pre-release-testing.md and tests/manual-new-release-tests.md were missing coverage of every feature added in the sprint. Both rewritten:

Pre-release — restructured into 14 numbered sections. New blocks for:
- Cost-optimization analyzers (one block per product — Downsize / Cache / Script / Trim — covering prereqs, prereq-missing fallback paths, caveat enforcement)
- Plan-tier-aware rendering (verify subscription headers don't contain "spend" against a dollar figure; --reconfigure; unknown-tier suppression)
- Backfill adapters (against committed fixtures so the smoke is hermetic — no live API keys needed for langfuse / helicone / otlp; verifies idempotency on re-run)
- Period comparison (--compare keywords + explicit ranges)
- Config export (--export-config claude-code verifies caveat comments + absence of --apply)
- Policy list
- Trimmed: web-UI section (was 5 numbered subsections with sidebar/theme/CSS bullets — now one spot-check block, since those are one-time UI work) and redundant tj uninstall && rm -rf && onboard cycles between steps.
Post-release smoke — tightened into a true smoke check. Every analyzer is exercised (not deep behavior — that lived in pre-release). All three backfill adapters smoke against fixtures. One canonical "pass criteria" table at the bottom mapped to step numbers.

Honest gap noted at handoff: the playbooks reliably exercise Downsize + Cache-efficacy with positive findings, but Cache-recommend / Script / Trim mostly hit their negative / fallback paths because the example data doesn't satisfy their prereqs (capture.prompts off; <20 identical sessions; bloat extra not installed). A synthetic optimize demo to fill that gap will land in a separate PR.

Honesty audit

Clean. cmd_optimize.py scanned — no quality-equivalence claims, "spend" only appears in the api pricing-mode branch, mandatory caveat preserved across all render modes.

Test plan

No code changes — automated test suite unaffected
All doc cross-links resolve (verified by inspection)
git rm cleanups don't break anything: docs/internal/specs/ survives via .gitkeep; no references to removed files left in CLAUDE.md or docs/
Manual: run through manual-pre-release-testing.md end-to-end on a fresh checkout of main after merge

🤖 Generated with Claude Code

…rative Cross-cutting docs pass after Waves 1-3 + Task 4 work landed in the "Strategy pivot" commit. No code changes. Backfill: - Update docs/backfill/overview.md to list all four sources (claude-code, langfuse, helicone, otlp), add partnership-posture framing, drop stale "Wave 3 lands later" line. - Add docs/backfill/helicone.md and docs/backfill/otlp.md matching the langfuse.md structure (modes, field mapping, idempotency, limitations). Installation: - Add docs/installation.md documenting the tokenjam[bloat] optional extra (~2GB torch + transformers) plus the full optional-extras matrix. Architecture: - Add "OTel semconv extensions: billing_account and plan_tier" section with the pricing_mode derivation rules and rationale for the session- level (not span-level) storage of plan_tier. Optimize product docs (four-product naming consistency): - Tighten cache.md / trim.md / script.md headings to lead with product name (Cache / Trim / Script) + CLI flag examples. - Add docs/optimize/downsize.md to round out the set (was previously only documented inline in the analyzer). - Add "See also" cross-links to all four product pages. CHANGELOG narrative: - Reorganise ## Unreleased as a coherent release story (headline, then product groupings: analyzers, plan-tier rendering, backfill, other surfaces, docs; then Changed / Internal). Facts unchanged. Honesty audit: clean. cmd_optimize.py scanned — no quality-equivalence claims; "spend" only appears in the api pricing-mode branch; mandatory caveat preserved across all render modes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

CLAUDE.md: - Expand "Further Reading" from a single architecture.md pointer into a structured list covering the new docs landed in the Task-5 consistency sweep: installation.md (optional extras matrix incl. tokenjam[bloat]), configuration.md (content-capture privacy section), the four optimize product pages (downsize / cache / script / trim), backfill/overview.md, policy/overview.md, and docs/internal/specs/v1.1-honest-output.md. - Cross-reference architecture.md's new "OTel semconv extensions" section from the core/models.py description in Key Modules so readers find the pricing_mode derivation rules from the entry point they're most likely to land on first. Remove docs/internal/tasks/ — the May 26 sprint's per-task specs + decisions-locked.md served their purpose during the sprint and are captured in the merged PR (#66) history. Keeping the v1.1 honest-output spec under docs/internal/specs/ because Wave 1 / the production code still references it as the canonical source. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…future use The v1.1 honest-output spec served its purpose during the May 26 sprint (Wave 1 / cmd_optimize.py implementation). The behavior is now captured in the merged code, in docs/optimize/downsize.md, and in CHANGELOG.md. Keeping the spec file would just create drift risk. Preserve the docs/internal/specs/ folder (via .gitkeep) so future features can drop a canonical spec there when one is warranted. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The May-26 sprint added six new analyzer-level surfaces (Downsize / Cache×2 / Script / Trim), three new backfill adapters (Langfuse / Helicone / OTLP), a period-comparison flag, config export, plan-tier-aware rendering, and the policy preview. None of that was covered in the existing playbooks. manual-pre-release-testing.md: - Restructured into numbered sections (Install / Auto tests / Onboard / Populate / CLI smoke / Analyzers / Plan-tier rendering / Backfill / Compare / Export / Policy / Server / UI / Cleanup). - New "Cost-optimization analyzers" section with per-analyzer smoke blocks. Each covers prereqs, prereq-missing fallback (cache-recommend hint when capture.prompts off; Trim install hint when extra not installed), and caveat verification. - New "Plan-tier-aware rendering" section exercising the v1.1 honest- output reframing — subscription headers must never contain "spend" against a dollar figure. - New "Backfill adapters" section uses the committed test fixtures so the smoke is hermetic (no live API keys needed for langfuse/helicone/ otlp). Verifies idempotency on re-run. - New "Period comparison" section verifying ▲/▼ diff rendering and the --compare last-month / explicit-range branches. - New "Config export" section verifying caveat comments are baked into the JSONC snippet and confirming the absence of --apply. - New "Policy list" section. - Trimmed the web-UI section from 5 numbered checks with bullet-level theme/sidebar/CSS verification down to a single spot-check block. Those are one-time UI checks, not per-release regression items. - Trimmed redundant tj uninstall && rm -rf && onboard cycles between steps; kept one canonical clean-slate at the top. manual-new-release-tests.md: - Tightened into a true post-release smoke check. Verifies every analyzer runs (not deep behavior — that lived in pre-release), checks the caveat in JSON output, runs all three backfill adapters against the committed fixtures, and confirms the policy preview renders. - Cut the duplicated "what to look for" table back to one summary at the bottom that maps cleanly to step numbers. - Kept Claude Code / Codex integration smoke (these are user-facing flows that broke historically when they regressed silently). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

anilmurty and others added 4 commits May 28, 2026 10:58

anilmurty changed the title ~~docs: Task 5 consistency sweep — backfill, installation, semconv, nar…~~ docs: post-sprint sweep — analyzer docs, installation, semconv, release-testing playbooks May 28, 2026

anilmurty merged commit 7f5929b into main May 28, 2026
4 checks passed

anilmurty deleted the task-5-docs-sweep branch May 28, 2026 19:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: post-sprint sweep — analyzer docs, installation, semconv, release-testing playbooks#67

docs: post-sprint sweep — analyzer docs, installation, semconv, release-testing playbooks#67
anilmurty merged 4 commits into
mainfrom
task-5-docs-sweep

anilmurty commented May 28, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

anilmurty commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Commits

What landed

New docs

Updated docs

Sprint-internal cleanup

Manual release-testing playbooks rewritten

Honesty audit

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

anilmurty commented May 28, 2026 •

edited

Loading