docs: post-sprint sweep — analyzer docs, installation, semconv, release-testing playbooks#67
Merged
Conversation
…rative Cross-cutting docs pass after Waves 1-3 + Task 4 work landed in the "Strategy pivot" commit. No code changes. Backfill: - Update docs/backfill/overview.md to list all four sources (claude-code, langfuse, helicone, otlp), add partnership-posture framing, drop stale "Wave 3 lands later" line. - Add docs/backfill/helicone.md and docs/backfill/otlp.md matching the langfuse.md structure (modes, field mapping, idempotency, limitations). Installation: - Add docs/installation.md documenting the tokenjam[bloat] optional extra (~2GB torch + transformers) plus the full optional-extras matrix. Architecture: - Add "OTel semconv extensions: billing_account and plan_tier" section with the pricing_mode derivation rules and rationale for the session- level (not span-level) storage of plan_tier. Optimize product docs (four-product naming consistency): - Tighten cache.md / trim.md / script.md headings to lead with product name (Cache / Trim / Script) + CLI flag examples. - Add docs/optimize/downsize.md to round out the set (was previously only documented inline in the analyzer). - Add "See also" cross-links to all four product pages. CHANGELOG narrative: - Reorganise ## Unreleased as a coherent release story (headline, then product groupings: analyzers, plan-tier rendering, backfill, other surfaces, docs; then Changed / Internal). Facts unchanged. Honesty audit: clean. cmd_optimize.py scanned — no quality-equivalence claims; "spend" only appears in the api pricing-mode branch; mandatory caveat preserved across all render modes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CLAUDE.md: - Expand "Further Reading" from a single architecture.md pointer into a structured list covering the new docs landed in the Task-5 consistency sweep: installation.md (optional extras matrix incl. tokenjam[bloat]), configuration.md (content-capture privacy section), the four optimize product pages (downsize / cache / script / trim), backfill/overview.md, policy/overview.md, and docs/internal/specs/v1.1-honest-output.md. - Cross-reference architecture.md's new "OTel semconv extensions" section from the core/models.py description in Key Modules so readers find the pricing_mode derivation rules from the entry point they're most likely to land on first. Remove docs/internal/tasks/ — the May 26 sprint's per-task specs + decisions-locked.md served their purpose during the sprint and are captured in the merged PR (#66) history. Keeping the v1.1 honest-output spec under docs/internal/specs/ because Wave 1 / the production code still references it as the canonical source. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…future use The v1.1 honest-output spec served its purpose during the May 26 sprint (Wave 1 / cmd_optimize.py implementation). The behavior is now captured in the merged code, in docs/optimize/downsize.md, and in CHANGELOG.md. Keeping the spec file would just create drift risk. Preserve the docs/internal/specs/ folder (via .gitkeep) so future features can drop a canonical spec there when one is warranted. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The May-26 sprint added six new analyzer-level surfaces (Downsize / Cache×2 / Script / Trim), three new backfill adapters (Langfuse / Helicone / OTLP), a period-comparison flag, config export, plan-tier-aware rendering, and the policy preview. None of that was covered in the existing playbooks. manual-pre-release-testing.md: - Restructured into numbered sections (Install / Auto tests / Onboard / Populate / CLI smoke / Analyzers / Plan-tier rendering / Backfill / Compare / Export / Policy / Server / UI / Cleanup). - New "Cost-optimization analyzers" section with per-analyzer smoke blocks. Each covers prereqs, prereq-missing fallback (cache-recommend hint when capture.prompts off; Trim install hint when extra not installed), and caveat verification. - New "Plan-tier-aware rendering" section exercising the v1.1 honest- output reframing — subscription headers must never contain "spend" against a dollar figure. - New "Backfill adapters" section uses the committed test fixtures so the smoke is hermetic (no live API keys needed for langfuse/helicone/ otlp). Verifies idempotency on re-run. - New "Period comparison" section verifying ▲/▼ diff rendering and the --compare last-month / explicit-range branches. - New "Config export" section verifying caveat comments are baked into the JSONC snippet and confirming the absence of --apply. - New "Policy list" section. - Trimmed the web-UI section from 5 numbered checks with bullet-level theme/sidebar/CSS verification down to a single spot-check block. Those are one-time UI checks, not per-release regression items. - Trimmed redundant tj uninstall && rm -rf && onboard cycles between steps; kept one canonical clean-slate at the top. manual-new-release-tests.md: - Tightened into a true post-release smoke check. Verifies every analyzer runs (not deep behavior — that lived in pre-release), checks the caveat in JSON output, runs all three backfill adapters against the committed fixtures, and confirms the policy preview renders. - Cut the duplicated "what to look for" table back to one summary at the bottom that maps cleanly to step numbers. - Kept Claude Code / Codex integration smoke (these are user-facing flows that broke historically when they regressed silently). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Cross-cutting docs pass after the May-26 strategy-pivot sprint (PR #66) landed. Covers everything the new code surfaces but the original PR didn't document yet — plus the housekeeping that fell out of the sprint queue.
Commits
4 commits, no code changes. All four are docs-only:
0d2895ddocs/2b3037c0ae522ddocs/internal/specs/for future canonical specsc4d2feaWhat landed
New docs
docs/installation.md— base install vs optional-extras matrix. Documentstokenjam[bloat](~2GB LLMLingua/torch extra used by the Trim analyzer) with the rationale for keeping it out of base; lists all framework adapter extras plus[mcp]/[dev]. New first-run guide.docs/optimize/downsize.md— the one optimize product that didn't have its own page. Documents the model-downgrade analyzer with the candidate-pair table, plan-tier-aware rendering branches, and the mandatory caveat.docs/backfill/helicone.mdanddocs/backfill/otlp.md— match thelangfuse.mdstructure (modes / field mapping / idempotency / limitations).docs/architecture.md→ "OTel semconv extensions" section — full derivation rules fortokenjam.billing_account(span attribute) +tokenjam.plan_tier(session-level),pricing_modemapping, and whyplan_tierlives onSessionRecordrather than each span.Updated docs
docs/backfill/overview.md— lists all four sources (claude-code / langfuse / helicone / otlp) with partnership-posture framing.docs/optimize/cache.md/script.md/trim.md— consistent treatment: heading leads with the product name, CLI flag example up front, "See also" cross-links to the other three product pages.CHANGELOG.md— reorganised## Unreleasedas a coherent release narrative (headline → product groupings → Changed / Internal). Facts unchanged.CLAUDE.mdcore/models.pyKey-Modules entry to the new "OTel semconv extensions" section in architecture.md.docs/internal/specs/v1.1-honest-output.md(file removed; see below).Sprint-internal cleanup
docs/internal/tasks/(14 files — the sprint's per-task specs +decisions-locked.md). Sprint is merged via Strategy pivot: Layer-9 cost-optimization analyzers + ingest adapters #66; the per-task specs were ephemeral by design and risk drift if kept.docs/internal/specs/v1.1-honest-output.md(the spec is now captured in the merged code +docs/optimize/downsize.md+ CHANGELOG). Keptdocs/internal/specs/(as.gitkeep) so future features can drop canonical specs there when warranted.Manual release-testing playbooks rewritten
tests/manual-pre-release-testing.mdandtests/manual-new-release-tests.mdwere missing coverage of every feature added in the sprint. Both rewritten:--reconfigure; unknown-tier suppression)--comparekeywords + explicit ranges)--export-config claude-codeverifies caveat comments + absence of--apply)tj uninstall && rm -rf && onboardcycles between steps.Honest gap noted at handoff: the playbooks reliably exercise Downsize + Cache-efficacy with positive findings, but Cache-recommend / Script / Trim mostly hit their negative / fallback paths because the example data doesn't satisfy their prereqs (capture.prompts off; <20 identical sessions; bloat extra not installed). A synthetic optimize demo to fill that gap will land in a separate PR.
Honesty audit
Clean.
cmd_optimize.pyscanned — no quality-equivalence claims, "spend" only appears in theapipricing-mode branch, mandatory caveat preserved across all render modes.Test plan
git rmcleanups don't break anything:docs/internal/specs/survives via.gitkeep; no references to removed files left in CLAUDE.md ordocs/manual-pre-release-testing.mdend-to-end on a fresh checkout ofmainafter merge🤖 Generated with Claude Code