feat(sdlc): /goal SDLC-discipline gates — 95% confidence floor + DLC binding (PR-D)#355
Merged
Merged
Conversation
…DLC binding (PR-D) Native /goal is now table-stakes across CC (v2.1.139+), Codex CLI, and likely others. Without SDLC discipline baked into the goal CONDITION itself, the Haiku evaluator rubber-stamps 'did the agent flail for 20 turns' instead of 'is the goal met correctly.' Two new load-bearing gates in skills/sdlc/SKILL.md ## Long-Running Goals section: 1. **Confidence gate — NEVER invoke below HIGH 95%.** Mirrors existing Confidence Check (plan first if below). Below 95% the evaluator has no anchor for 'is this correct'; only for 'did the agent stop.' 2. **DLC binding — condition MUST name the active DLC** (/sdlc for code, /gdlc for games, /ldlc for legal, etc.). Anchors the evaluator on 'doing it right,' not just 'doing it.' Example: /goal 'tests pass + clean tree following /sdlc, stop after 20 turns'. Quality test extended (tests/test-doc-consistency.sh::test_sdlc_skill_has_goal_wrapper): adds keyword greps for '95% confidence' / 'HIGH 95%' / 'confidence gate' AND 'DLC binding' / 'name the DLC' / 'condition MUST name'. Cross-cutting compensating trims to stay under #236 5K-token cap (skill at 4989/5000): - /goal section condensed example wording ('tests pass + clean tree' vs 'npm test=0 AND git clean') - Cross-Model Review's Multi-reviewer paragraph collapsed into one line 40/40 doc-consistency green, 10/10 audit green.
7ac9990 to
f001405
Compare
BaseInfinity
added a commit
that referenced
this pull request
May 26, 2026
…) (#361) `npx agentic-sdlc-wizard init` (no @latest pin) silently serves whatever version is cached on disk, sometimes months old. The /update Step 1.5 pattern already catches this — but only at /update time, after the user has already installed stale templates. This moves the same check to init time so the gap surfaces immediately: - cli/init.js: new maybeEmitStaleCliNudge() called before planOperations. Reuses SDLC_WIZARD_CACHE_DIR/latest-version (24h TTL, #239 poison check), falls back to `npm view` with 5s timeout, silent on offline/error. - README.md + CLAUDE_CODE_SDLC_WIZARD.md: install command recommends `npx -y agentic-sdlc-wizard@latest init` so first-time users skip the cache trap by default. - tests/test-cli.sh: 3 quality tests covering nudge-fires-when-stale, silent-when-current, no-reverse-nudge-when-poisoned. Bundled: ROADMAP #347 + #350 status flip from "Actionable now" to DONE with PR citations (#351/#355/#354). Both shipped this week — paperwork was stale. Verified locally: test-cli.sh 91/91, test-doc-consistency 40/40, test-docs-usability 29/29, test-update-skill-cli-version 8/8, test-hooks 156/156, test-setup-path 83/83, test-self-update 153/153.
5 tasks
BaseInfinity
added a commit
that referenced
this pull request
May 26, 2026
(#363) PR #361 shipped 3 #358 tests where Test C was silent-when-cache-poisoned rather than the goal-named silent-when-offline. The /goal Haiku evaluator counted tests by file presence and approved completion. Caught only at post-merge self-review, fixed in PR #362 (added the real offline test via fake-npm PATH override). Promoted to SDLC.md ## Lessons Learned → ### Testing so future sessions inherit the rule: when a /goal condition enumerates test cases by name, self-review must verify each named test exists by reading assertions, not by counting matching `test_` functions. This is the per-test-fidelity layer beneath PR #355's HIGH-95%-confidence and DLC-binding gates — those work at the macro level, but enumerated- condition fidelity is the author's responsibility. Verified locally: test-doc-consistency 40/40, test-postmortem-lessons 7/7, test-memory-audit-protocol 12/12.
This was referenced May 28, 2026
BaseInfinity
added a commit
that referenced
this pull request
Jun 1, 2026
PR #355 (v1.77.0) added skill-text guidance that `/goal` must state HIGH 95% confidence and bind to a DLC (`/sdlc`, `/gdlc`, `/ldlc`) before firing. The guidance worked at the meta level but had no runtime enforcement — exactly the failure mode the discipline gate documented (text-only guidance doesn't survive the failure mode it's trying to prevent). This adds `hooks/goal-confidence-check.sh` as a UserPromptSubmit hook that: 1. Matches `/goal <condition>` prompts (silent on `/goal` status + `/goal clear`). 2. Reads `transcript_path` from hook input (verified available on UserPromptSubmit per Anthropic hook docs at code.claude.com/docs/en/hooks), walks the JSONL to find the last assistant text message, and scans for HIGH-95% confidence patterns (`HIGH (95%`, `Confidence: HIGH`, `HIGH 95%`, etc.). 3. Greps the goal condition for a DLC binding (`/[a-z]+dlc`). 4. Emits LOUD warnings on either gap — non-blocking soft nudge (exit 0), same pattern as `model-effort-check.sh`. Registered in both channels: `cli/templates/settings.json` (npm/CLI) and `hooks/hooks.json` (plugin via ${CLAUDE_PLUGIN_ROOT}). Dedupes via `_find-sdlc-root.sh` helper so dual installs don't double-fire. Bundled paperwork: - `ROADMAP.md` adds the missing `## Research Parking Lot` section that the Demand-Signal-First entry gate references but was never created. Includes the maintenance rule (prune expired rows during quarterly triage). - `skills/sdlc/SKILL.md` adds a one-line enforcement cross-reference in the Long-Running Goals section. Trimmed adjacent prose to stay under the 5K token cap (#236). #359 triage (companion close): Of the 10 API changelog entries since 2026-04-16: 8 are clearly no-op for the wizard (MCP tunnels, SEC web search data, cache diagnostics, fast mode, AWS hosting, multiagent sessions, rate limits API, managed-agent memory — all orthogonal to SDLC enforcement). 2 are deprecations — verified zero usage in this repo: `context-1m-2025-08-07` (Sonnet 4.5/4 1M beta retired 2026-04-30; we use the separate `opus[1m]` alias) and `claude-3-haiku- 20240307` (Haiku 3 retired 2026-04-20; we never used it). #359 will be closed with the triage comment after merge. Verified locally: test-hooks 160/160 (4 new #360 tests + 156 existing), test-audit-session-load 10/10, test-cli 91/91, test-plugin all green, test-doc-consistency 41/41, test-self-update 153/153, test-setup-path 83/83.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Native `/goal` is now table-stakes across CC (v2.1.139+), Codex CLI, and likely others. Without SDLC discipline baked into the goal CONDITION itself, the Haiku evaluator rubber-stamps "did the agent flail for 20 turns" instead of "is the goal met correctly."
This PR extends the /goal wrapper in `/sdlc` skill with two new load-bearing gates.
The two new rules
1. Confidence gate — NEVER invoke `/goal` below HIGH 95%
Mirrors the existing Confidence Check rule. Below 95%, plan first, then `/goal`. The evaluator has no anchor for "is this correct"; only for "did the agent stop." Invoking at MEDIUM/LOW lets the evaluator certify wishful thinking.
2. DLC binding — condition MUST name the active DLC
`/sdlc` for code, `/gdlc` for games, `/ldlc` for legal, etc. Without the DLC name in the condition string, the evaluator anchors on completion only ("did the work end") instead of correctness ("did it follow the contract").
Example:
The "following /sdlc" clause makes the evaluator judge SDLC compliance per turn, not just task completion.
Why now
Native `/goal` shipped in CC v2.1.139 (~mid-May 2026), missed by wizard's autoupdate pipeline (the gap #350 fixes). Now that it's universal across agentic tools, anchoring it in SDLC discipline is the right wizard play — turns a "let it cook" primitive into a "let it cook properly" primitive.
What changed
Test plan