Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions skills/sdlc/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -110,7 +110,7 @@ Use plan mode for: multi-file changes, new features, LOW confidence, bugs needin

## Long-Running Goals (`/goal`)

Native `/goal <condition>` (**requires v2.1.143+**). Haiku evaluator re-checks transcript per turn. **Pre-flight:** trusted workspace; `disableAllHooks`/`allowManagedHooksOnly` both off (it's a Stop hook). **Condition = SDLC contract:** measurable end state + check + constraints + hard turn/time bound (no native cap); e.g. `/goal "npm test=0, stop after 20 turns"`. **Anti-pattern:** evaluator **cannot call tools**judges transcript only; don't use for off-transcript work. `--resume` resets counters.
Native `/goal <condition>` (**v2.1.143+**). Haiku evaluator re-checks transcript per turn. **Confidence gate — NEVER invoke below HIGH 95%**; below that the evaluator rubber-stamps flailing as progress. **DLC binding — condition MUST name the DLC** (`/sdlc`, `/gdlc`, `/ldlc`, etc.) so the evaluator anchors on "doing it right." **Pre-flight:** trusted workspace; `disableAllHooks`/`allowManagedHooksOnly` both off. **Condition = contract:** end state + check + constraints + hard turn/time bound (no native cap); e.g. `/goal "tests pass + clean tree following /sdlc, stop after 20 turns"`. **Anti-pattern:** evaluator can't call tools — transcript-only. `--resume` resets counters.

## Recommended Model

Expand Down Expand Up @@ -141,7 +141,7 @@ PROTOCOL is universal across domains; only `review_instructions` and `verificati

**Convergence:** 2 rounds sweet spot, 3 max (research: 14 repos + 7 papers). After 3 still NOT CERTIFIED → escalate.

**Multi-reviewer / non-code domains:** when running multiple reviewers in parallel (e.g. Claude review + Codex + human), respond per-reviewer (different blind spots, no shared anchoring). For non-code domains (research, persuasion, medical), keep the same handoff format and add `"audience"` + `"stakes"` keys.
**Multi-reviewer:** parallel reviewers respond per-reviewer (different blind spots, no shared anchoring). **Non-code domains:** same handoff + add `"audience"`/`"stakes"` keys.

### Release Review Focus

Expand Down
9 changes: 7 additions & 2 deletions tests/test-doc-consistency.sh
Original file line number Diff line number Diff line change
Expand Up @@ -817,10 +817,15 @@ test_sdlc_skill_has_goal_wrapper() {
grep -qiE 'trusted|untrusted' "$SKILL" || missing+=" trusted-workspace-preflight"
grep -qF 'disableAllHooks' "$SKILL" || missing+=" disableAllHooks-preflight"
grep -qiE 'turn(/| )?time bound|hard.*bound|stop after' "$SKILL" || missing+=" hard-bound-guidance"
grep -qiE 'evaluator.*not call tools|evaluator can.*not.*tool|cannot run tools|cannot call tools' "$SKILL" || missing+=" anti-pattern-off-transcript"
grep -qiE 'cannot call tools|can.t call tools|transcript[ -]only' "$SKILL" || missing+=" anti-pattern-off-transcript"
grep -qiE 'resume.*reset|--resume.*counters|counters.*reset' "$SKILL" || missing+=" resume-caveat"
# PR-D additions: 95% confidence gate + DLC binding in condition.
# Without these, /goal becomes "20 turns of flailing" — the evaluator
# has no anchor for correctness, only for completion.
grep -qiE 'confidence gate|HIGH 95%|below.*95%|95%.*confidence' "$SKILL" || missing+=" 95-percent-confidence-gate"
grep -qiE 'DLC binding|name the (active )?DLC|condition MUST name' "$SKILL" || missing+=" DLC-binding-rule"
if [ -z "$missing" ]; then
pass "skills/sdlc/SKILL.md /goal wrapper has all required quality elements (#347)"
pass "skills/sdlc/SKILL.md /goal wrapper has all required quality elements (#347 + PR-D)"
else
fail "skills/sdlc/SKILL.md /goal wrapper missing:$missing"
fi
Expand Down