Skip to content

refactor: extract scorePrompt into lib + 20 tests + scope scoring fix#290

Open
TerminalGravity wants to merge 6 commits intomainfrom
extract-prompt-scoring-lib
Open

refactor: extract scorePrompt into lib + 20 tests + scope scoring fix#290
TerminalGravity wants to merge 6 commits intomainfrom
extract-prompt-scoring-lib

Conversation

@TerminalGravity
Copy link
Collaborator

What

  • Extracted scorePrompt() pure function from prompt-score.ts into src/lib/prompt-scoring.ts
  • Added 20 unit tests covering all scoring dimensions, grading, feedback, and edge cases
  • Fixed a scope scoring bug: prompts >100 chars were getting max scope score (25) even without explicit bounding keywords. Now they get 20; full marks require words like 'only', 'just', 'single', etc.

Why

The scoring logic was inlined in the tool registration, making it untestable. Now it's a pure function with full coverage.

Test results

88 tests passing (20 new + 68 existing)

Adds a ready-to-use CLAUDE.md template that makes Claude Code
automatically run preflight_check on prompts. Users can copy it
into their project to get preflight working without manual tool calls.

Referenced from Quick Start in README and examples/README.
- CLI now responds to --help/-h with usage info, profiles, and links
- CLI now responds to --version/-v with package version
- Previously, any flag just launched the interactive wizard
- Fixed README badge from Node 18+ to Node 20+ (matches engines field)
Adds a new 'export_timeline' MCP tool that generates markdown reports
from timeline data with:
- Summary stats table (events by type, correction rate, commits/prompt)
- ASCII activity chart grouped by day
- Recent commits log
- Correction insights
- Error summary
- Configurable period (day/week/month) with offset
- Optional save to ~/.preflight/reports/

Includes tests (2 passing).

Closes #5
Adds 10 tests covering:
- Trivial prompt pass-through
- force_level=skip bypass
- Ambiguous prompt detection (vague pronouns, short prompts, vague verbs)
- Multi-step execution plan generation
- Git state inclusion in clarification
- Triage confidence/reasons display
- Pattern match triage boosting
- Risk level assignment in execution plans

Brings test count from 45 to 55.
Several tools were passing shell syntax (pipes, redirects, || chains) and
non-git commands (cat, find, pnpm, gh) to run(), which uses execFileSync
without a shell. This caused silent failures:

- Shell redirections (2>/dev/null) passed as literal git args
- Pipe chains (| grep, | tail) passed as literal git args
- 'git' prefix doubled (run already prepends 'git')
- Non-git commands (find, gh, pnpm) routed through git execFileSync

Fix:
- Add shell() helper using execSync for commands needing shell features
- Harden run() string parsing to strip leading 'git' and shell syntax
- Migrate 10 call sites across 8 tools to use shell() or proper arrays
- Add 13 tests covering run() cleanup and shell() behavior

Affected tools: audit-workspace, clarify-intent, enrich-agent-task,
sequence-tasks, session-handoff, session-health, verify-completion
…ng bug

- Extract pure scorePrompt() into src/lib/prompt-scoring.ts for testability
- Add comprehensive test suite (20 tests) covering all 4 dimensions + grading + edge cases
- Fix scope scoring: long prompts alone no longer get max score (25→20), require explicit bounding keywords for full marks
- Tool now imports from shared lib instead of inlining logic
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant