Skip to content

test: add estimate_cost test coverage (6 tests)#291

Open
TerminalGravity wants to merge 7 commits intomainfrom
test/estimate-cost-coverage-v2
Open

test: add estimate_cost test coverage (6 tests)#291
TerminalGravity wants to merge 7 commits intomainfrom
test/estimate-cost-coverage-v2

Conversation

@TerminalGravity
Copy link
Collaborator

Adds tests for the estimate_cost tool covering:

  • Token usage reporting for simple sessions
  • Correction detection from vague/unclear prompts
  • Preflight tool call counting and cost tracking
  • Graceful error handling (missing files, no sessions)
  • Pricing model selection (sonnet/opus/haiku)

All 6 tests passing. No code changes, just test coverage.

Adds a ready-to-use CLAUDE.md template that makes Claude Code
automatically run preflight_check on prompts. Users can copy it
into their project to get preflight working without manual tool calls.

Referenced from Quick Start in README and examples/README.
- CLI now responds to --help/-h with usage info, profiles, and links
- CLI now responds to --version/-v with package version
- Previously, any flag just launched the interactive wizard
- Fixed README badge from Node 18+ to Node 20+ (matches engines field)
Adds a new 'export_timeline' MCP tool that generates markdown reports
from timeline data with:
- Summary stats table (events by type, correction rate, commits/prompt)
- ASCII activity chart grouped by day
- Recent commits log
- Correction insights
- Error summary
- Configurable period (day/week/month) with offset
- Optional save to ~/.preflight/reports/

Includes tests (2 passing).

Closes #5
Adds 10 tests covering:
- Trivial prompt pass-through
- force_level=skip bypass
- Ambiguous prompt detection (vague pronouns, short prompts, vague verbs)
- Multi-step execution plan generation
- Git state inclusion in clarification
- Triage confidence/reasons display
- Pattern match triage boosting
- Risk level assignment in execution plans

Brings test count from 45 to 55.
Several tools were passing shell syntax (pipes, redirects, || chains) and
non-git commands (cat, find, pnpm, gh) to run(), which uses execFileSync
without a shell. This caused silent failures:

- Shell redirections (2>/dev/null) passed as literal git args
- Pipe chains (| grep, | tail) passed as literal git args
- 'git' prefix doubled (run already prepends 'git')
- Non-git commands (find, gh, pnpm) routed through git execFileSync

Fix:
- Add shell() helper using execSync for commands needing shell features
- Harden run() string parsing to strip leading 'git' and shell syntax
- Migrate 10 call sites across 8 tools to use shell() or proper arrays
- Add 13 tests covering run() cleanup and shell() behavior

Affected tools: audit-workspace, clarify-intent, enrich-agent-task,
sequence-tasks, session-handoff, session-health, verify-completion
…ng bug

- Extract pure scorePrompt() into src/lib/prompt-scoring.ts for testability
- Add comprehensive test suite (20 tests) covering all 4 dimensions + grading + edge cases
- Fix scope scoring: long prompts alone no longer get max score (25→20), require explicit bounding keywords for full marks
- Tool now imports from shared lib instead of inlining logic
- Token usage reporting for simple sessions
- Correction detection from vague prompts
- Preflight tool call counting
- Graceful handling of missing/no session files
- Pricing model selection

6 tests, all passing.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant