Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions ROADMAP.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,18 @@ Anything else goes to the **Research Parking Lot** (below) with a named revisit

Each row's status should be one of: **Actionable now** / **Waiting on trigger** / **Tracked elsewhere** / **DONE** / **KILLED**.

## Research Parking Lot

Speculative ideas that didn't qualify for entry under the Demand-Signal-First gate. Each entry MUST declare a **named revisit trigger** and an **expiry date** (30 days for implementation ideas, 60 days for research/watch items). If the trigger has not fired by expiry, the row is **deleted, not carried forward** — the parking lot is not a graveyard, it's a finite waiting room.

Add entries when a candidate is mid-evidence (one signal exists but doesn't yet clear the gate's "second external user" or "dated platform deadline" thresholds). Promote to a numbered ROADMAP row only when the trigger fires.

| Idea | Why parked | Revisit trigger | Expiry | Source |
|---|---|---|---|---|
| _(empty — populate as triage demands)_ | | | | |

> **Maintenance rule:** during quarterly ROADMAP triage, prune any row past expiry. Logged removals go in the commit message (`docs(roadmap): prune parking lot — <N> expired entries`) so deletions are traceable.

## Next Up (v1.37.0 queue, priority-ordered 2026-04-23)

0. **#212 — Local-Max E2E shepherd (zero-API)** — ✅ **ENGINEERING COMPLETE; PROVE-IT GATE DEFERRED 2026-05-05** ("just defer prove it who cares" — maintainer pragmatic call). Sim/eval/orchestration all on Max subscription per #228 (v1.59.0) + #230 + #231 phases 1-4 (v1.50.0–v1.55.0). Weekly+monthly workflow porting deleted ~$25-55/week of API cron burn. Docs reference `tests/e2e/local-shepherd.sh --compare-baseline` as the default path (`CLAUDE_CODE_SDLC_WIZARD.md` lines 42, 102, 1325, 2570). **Codex cross-model review** runs against ChatGPT/Codex subscription auth (`codex login`, "Logged in using ChatGPT" 2026-05-05) — also $0 marginal. Net: **$0/release** (was $20+/release in API burn). **Prove-It Gate parked** — would require ~$20 one-time API spend to run paired N=15 local-Max vs CI-API trials for 95% CI overlap proof. Not worth the spend; engineering signal works fine in practice. Reopen if local scoring drifts vs known-good baselines. Original scope archived below for history. <details><summary>Original scope (archived)</summary>**TOP PRIORITY** after 2026-04-23 live-fire proved the pain. Today's shepherding blew through the Anthropic API credit cap mid-release (PR #222 e2e-quick-check failed with "Credit balance is too low"), blocking v1.36.1 until admin-merge. Current CI spends ~$0.62 (Tier 1) + $0.82 (Tier 2) per PR on `anthropics/claude-code-action@v1`. Across today's 12 PRs that's **$20+ in API burn** just for simulations. **Architecture** (tri-split billing): (a) **Simulation** — `claude --print` via local Bash tool runs against **Max subscription** ($0 after monthly cap, tests wizard behavior ON Claude which is the whole point); (b) **Cross-model review of PR diff** — `codex exec -c model_reasoning_effort=xhigh` via local Bash tool uses **OpenAI API** (separate bill, ~$3-5/PR — REVISED 2026-05-05: $0 if Codex authed via ChatGPT subscription); (c) **Orchestration** (watching CI, pulling logs, posting PR comments, merging) — me running in local terminal on **Max quota** ($0). **Scope:** (1) replace `claude-code-action@v1` invocation in `run-simulation.sh` with `claude --print "$prompt" --output-format json`; (2) confirm execution-output JSON shape matches (SDK format); (3) keep CI path as fallback for external contributors without a Max sub; (4) new `./tests/e2e/local-shepherd.sh <PR>` orchestrator runs Tier 1+2 locally, posts summary; (5) update SDLC skill + CLAUDE_CODE_SDLC_WIZARD.md docs to reference local shepherd as the default path. **Prove-It Gate** (CRITICAL — unchanged): demonstrate statistical score parity (local-Max vs CI-API) on **at least 3 PRs** before trusting local signal. Both paths hit same model (Opus 4.7), same prompt, same scoring — expected differences are only stochastic variance (±1-2 pts typical), so prove via overlapping 95% CIs, not byte-equality. If parity fails, investigate whether Max routing differs from raw-API before declaring. User call-out 2026-04-23: "why are we using API credits for CI instead of us being CI shepherd?"</details>
Expand Down
1 change: 1 addition & 0 deletions cli/init.js
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ const FILES = [
{ src: 'hooks/instructions-loaded-check.sh', dest: '.claude/hooks/instructions-loaded-check.sh', executable: true, base: REPO_ROOT },
{ src: 'hooks/model-effort-check.sh', dest: '.claude/hooks/model-effort-check.sh', executable: true, base: REPO_ROOT },
{ src: 'hooks/precompact-seam-check.sh', dest: '.claude/hooks/precompact-seam-check.sh', executable: true, base: REPO_ROOT },
{ src: 'hooks/goal-confidence-check.sh', dest: '.claude/hooks/goal-confidence-check.sh', executable: true, base: REPO_ROOT },
// #254 Bug 1: shared helper sourced by all hooks above. Must ship — without
// it, hooks emit "_find-sdlc-root.sh: No such file or directory" + the
// SDLC root walk-up logic is silently dead.
Expand Down
4 changes: 4 additions & 0 deletions cli/templates/settings.json
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,10 @@
{
"type": "command",
"command": "\"$CLAUDE_PROJECT_DIR\"/.claude/hooks/sdlc-prompt-check.sh"
},
{
"type": "command",
"command": "\"$CLAUDE_PROJECT_DIR\"/.claude/hooks/goal-confidence-check.sh"
}
]
}
Expand Down
82 changes: 82 additions & 0 deletions hooks/goal-confidence-check.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
#!/bin/bash
# /goal SDLC discipline gate (ROADMAP #360).
#
# UserPromptSubmit hook. When the user invokes `/goal <condition>`, scan the
# prior assistant text for a HIGH 95% confidence statement AND check the
# condition for a DLC binding (`/sdlc`, `/gdlc`, `/ldlc`, etc.). Emit loud
# warnings on either gap so the SDLC-discipline gap surfaces BEFORE the goal
# evaluator starts rubber-stamping flailing as progress.
#
# PR #355 (v1.77.0) added the guidance text in skills/sdlc/SKILL.md. This
# hook is the enforcement layer per #360. Non-blocking soft nudge — same
# pattern as model-effort-check.sh.

HOOK_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
source "$HOOK_DIR/_find-sdlc-root.sh"

# Plugin + project copies both register this hook → plugin yields (#236).
dedupe_plugin_or_project "${BASH_SOURCE[0]}" || exit 0

# Must be invoked via hook (stdin JSON), not interactively.
[ -t 0 ] && exit 0
STDIN_JSON=$(cat)
[ -z "$STDIN_JSON" ] && exit 0
command -v jq > /dev/null 2>&1 || exit 0

PROMPT=$(printf '%s' "$STDIN_JSON" | jq -r '.prompt // empty' 2>/dev/null) || exit 0
[ -z "$PROMPT" ] && exit 0

# Only act on `/goal <condition>` invocations. `/goal` alone (status) and
# `/goal clear` (reset) must stay silent — those aren't discipline events.
case "$PROMPT" in
"/goal") exit 0 ;;
"/goal "[[:space:]]*) exit 0 ;;
"/goal clear") exit 0 ;;
"/goal clear "*) exit 0 ;;
"/goal "*) ;;
*) exit 0 ;;
esac

CONDITION="${PROMPT#/goal }"

# === Confidence gate ===
# Read transcript_path, slurp JSONL, find last assistant message with text
# content, scan for HIGH-95% confidence pattern. Silent on missing/unreadable
# transcript — better to under-warn than crash a soft nudge.
TRANSCRIPT_PATH=$(printf '%s' "$STDIN_JSON" | jq -r '.transcript_path // empty' 2>/dev/null) || TRANSCRIPT_PATH=""
HAS_CONFIDENCE=0
if [ -n "$TRANSCRIPT_PATH" ] && [ -r "$TRANSCRIPT_PATH" ]; then
LAST_ASSISTANT_TEXT=$(jq -rs '
[.[] | select(.type == "assistant")]
| last
| if . == null then ""
else (.message.content // []) | map(select(.type == "text") | .text) | join("\n")
end
' "$TRANSCRIPT_PATH" 2>/dev/null) || LAST_ASSISTANT_TEXT=""
if printf '%s' "$LAST_ASSISTANT_TEXT" | grep -qiE 'HIGH \(?95%|confidence:?[[:space:]]+HIGH|HIGH[[:space:]]+95%|95%[[:space:]]+confidence'; then
HAS_CONFIDENCE=1
fi
fi

if [ "$HAS_CONFIDENCE" -eq 0 ]; then
echo ""
echo "!! /GOAL CONFIDENCE GATE: prior turn did not state HIGH 95% confidence !!"
echo " The /goal Haiku evaluator rubber-stamps flailing as progress below 95%."
echo " Recommendation: /goal clear → do research → state HIGH 95% confidence → re-issue /goal."
echo " (Source: skills/sdlc/SKILL.md Long-Running Goals section, ROADMAP #360.)"
echo ""
fi

# === DLC binding gate ===
# Condition MUST name a DLC (`/sdlc`, `/gdlc`, `/ldlc`, etc.) so the
# evaluator anchors on "doing it right" not just "doing it".
if ! printf '%s' "$CONDITION" | grep -qiE '/[a-z]+dlc'; then
echo ""
echo "!! /GOAL DLC-BINDING GATE: condition does not name a DLC (/sdlc, /gdlc, /ldlc) !!"
echo " The evaluator needs a discipline anchor to judge 'doing it right' vs flailing."
echo " Recommendation: rewrite condition to include 'following /sdlc' (or /gdlc/etc)."
echo " (Source: skills/sdlc/SKILL.md Long-Running Goals section, ROADMAP #360.)"
echo ""
fi

exit 0
4 changes: 4 additions & 0 deletions hooks/hooks.json
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,10 @@
{
"type": "command",
"command": "${CLAUDE_PLUGIN_ROOT}/hooks/sdlc-prompt-check.sh"
},
{
"type": "command",
"command": "${CLAUDE_PLUGIN_ROOT}/hooks/goal-confidence-check.sh"
}
]
}
Expand Down
2 changes: 2 additions & 0 deletions skills/sdlc/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -141,6 +141,8 @@ PROTOCOL is universal across domains; only `review_instructions` and `verificati

**Convergence:** 2 rounds sweet spot, 3 max. After 3 NOT CERTIFIED → escalate.

**Enforcement:** `hooks/goal-confidence-check.sh` warns when `/goal` skips the 95%-confidence or DLC-binding gates (#360).

**Multi-reviewer:** respond to each reviewer independently (no shared anchoring). **Non-code domains:** add `"audience"`/`"stakes"` keys to handoff.

### Release Review Focus
Expand Down
93 changes: 93 additions & 0 deletions tests/test-hooks.sh
Original file line number Diff line number Diff line change
Expand Up @@ -3383,6 +3383,99 @@ test_hook_emits_cc_nudge_when_pending
test_hook_silent_when_no_pending_cc_updates
test_hook_silent_without_weekly_update_workflow

# ---- /goal confidence + DLC-binding gate (ROADMAP #360) ----
# UserPromptSubmit hook fires on `/goal <condition>` invocations, reads
# transcript_path, scans the LAST assistant text message for a HIGH-95%
# confidence statement, and also checks the condition for DLC binding
# (`/sdlc`, `/gdlc`, `/ldlc`, etc). PR #355 (v1.77.0) added the *guidance*
# in skills/sdlc/SKILL.md but nothing enforces it at runtime — this hook
# is the enforcement layer per #360. Non-blocking soft nudge pattern
# (mirrors model-effort-check.sh).

test_goal_confidence_check_fires_when_no_confidence() {
local tmpdir
tmpdir=$(mktemp -d)
echo '<!-- SDLC Wizard Version: 1.77.0 -->' > "$tmpdir/SDLC.md"
touch "$tmpdir/TESTING.md"
cat > "$tmpdir/transcript.jsonl" <<'JSONL'
{"type":"user","message":{"role":"user","content":"hey"}}
{"type":"assistant","message":{"role":"assistant","content":[{"type":"text","text":"Working on this now."}]}}
JSONL
local payload='{"prompt":"/goal \"ship X, following /sdlc, stop after 10 turns\"","transcript_path":"'"$tmpdir/transcript.jsonl"'"}'
local output
output=$(printf '%s' "$payload" | (cd "$tmpdir" && CLAUDE_PROJECT_DIR="$tmpdir" SDLC_WIZARD_CACHE_DIR="$tmpdir/cache" "$HOOKS_DIR/goal-confidence-check.sh"))
rm -rf "$tmpdir"
if echo "$output" | grep -qiE 'GATE|WARNING' && echo "$output" | grep -qi 'confidence'; then
pass "#360: /goal without prior HIGH 95% confidence triggers warning"
else
fail "#360: confidence-gate warning missing (got: $output)"
fi
}

test_goal_confidence_check_silent_when_confidence_present() {
local tmpdir
tmpdir=$(mktemp -d)
echo '<!-- SDLC Wizard Version: 1.77.0 -->' > "$tmpdir/SDLC.md"
touch "$tmpdir/TESTING.md"
cat > "$tmpdir/transcript.jsonl" <<'JSONL'
{"type":"user","message":{"role":"user","content":"recon done?"}}
{"type":"assistant","message":{"role":"assistant","content":[{"type":"text","text":"Confidence: HIGH (95%) — verified pattern matches existing reference."}]}}
JSONL
local payload='{"prompt":"/goal \"ship X, following /sdlc, stop after 10 turns\"","transcript_path":"'"$tmpdir/transcript.jsonl"'"}'
local output
output=$(printf '%s' "$payload" | (cd "$tmpdir" && CLAUDE_PROJECT_DIR="$tmpdir" SDLC_WIZARD_CACHE_DIR="$tmpdir/cache" "$HOOKS_DIR/goal-confidence-check.sh"))
rm -rf "$tmpdir"
if echo "$output" | grep -qiE 'GATE|WARNING'; then
fail "#360: should be silent when prior turn has HIGH 95% confidence (got: $output)"
else
pass "#360: silent when prior turn has HIGH 95% confidence statement"
fi
}

test_goal_confidence_check_dlc_binding_warning() {
local tmpdir
tmpdir=$(mktemp -d)
echo '<!-- SDLC Wizard Version: 1.77.0 -->' > "$tmpdir/SDLC.md"
touch "$tmpdir/TESTING.md"
cat > "$tmpdir/transcript.jsonl" <<'JSONL'
{"type":"assistant","message":{"role":"assistant","content":[{"type":"text","text":"Confidence: HIGH (95%)."}]}}
JSONL
# Goal condition WITHOUT any DLC binding (/sdlc, /gdlc, /ldlc)
local payload='{"prompt":"/goal \"ship X, stop after 10 turns\"","transcript_path":"'"$tmpdir/transcript.jsonl"'"}'
local output
output=$(printf '%s' "$payload" | (cd "$tmpdir" && CLAUDE_PROJECT_DIR="$tmpdir" SDLC_WIZARD_CACHE_DIR="$tmpdir/cache" "$HOOKS_DIR/goal-confidence-check.sh"))
rm -rf "$tmpdir"
if echo "$output" | grep -qiE 'DLC.*binding|name.*DLC|/sdlc|/gdlc'; then
pass "#360: /goal without DLC binding triggers DLC-binding warning"
else
fail "#360: DLC-binding warning missing (got: $output)"
fi
}

test_goal_confidence_check_silent_on_status_and_clear() {
local tmpdir
tmpdir=$(mktemp -d)
echo '<!-- SDLC Wizard Version: 1.77.0 -->' > "$tmpdir/SDLC.md"
touch "$tmpdir/TESTING.md"
cat > "$tmpdir/transcript.jsonl" <<'JSONL'
{"type":"assistant","message":{"role":"assistant","content":[{"type":"text","text":"some text"}]}}
JSONL
local out1 out2
out1=$(printf '%s' '{"prompt":"/goal","transcript_path":"'"$tmpdir/transcript.jsonl"'"}' | (cd "$tmpdir" && CLAUDE_PROJECT_DIR="$tmpdir" SDLC_WIZARD_CACHE_DIR="$tmpdir/cache" "$HOOKS_DIR/goal-confidence-check.sh"))
out2=$(printf '%s' '{"prompt":"/goal clear","transcript_path":"'"$tmpdir/transcript.jsonl"'"}' | (cd "$tmpdir" && CLAUDE_PROJECT_DIR="$tmpdir" SDLC_WIZARD_CACHE_DIR="$tmpdir/cache" "$HOOKS_DIR/goal-confidence-check.sh"))
rm -rf "$tmpdir"
if [ -z "$out1" ] && [ -z "$out2" ]; then
pass "#360: silent on bare /goal (status) and /goal clear (reset)"
else
fail "#360: should be silent on status/clear (out1='$out1', out2='$out2')"
fi
}

test_goal_confidence_check_fires_when_no_confidence
test_goal_confidence_check_silent_when_confidence_present
test_goal_confidence_check_dlc_binding_warning
test_goal_confidence_check_silent_on_status_and_clear

echo ""
echo "=== Results ==="
echo "Passed: $PASSED"
Expand Down