Skip to content

feat(sdlc): /goal SDLC-discipline gates — 95% confidence floor + DLC binding (PR-D)#355

Merged
BaseInfinity merged 1 commit into
mainfrom
feat/goal-sdlc-discipline
May 25, 2026
Merged

feat(sdlc): /goal SDLC-discipline gates — 95% confidence floor + DLC binding (PR-D)#355
BaseInfinity merged 1 commit into
mainfrom
feat/goal-sdlc-discipline

Conversation

@BaseInfinity
Copy link
Copy Markdown
Owner

Summary

Native `/goal` is now table-stakes across CC (v2.1.139+), Codex CLI, and likely others. Without SDLC discipline baked into the goal CONDITION itself, the Haiku evaluator rubber-stamps "did the agent flail for 20 turns" instead of "is the goal met correctly."

This PR extends the /goal wrapper in `/sdlc` skill with two new load-bearing gates.

The two new rules

1. Confidence gate — NEVER invoke `/goal` below HIGH 95%

Mirrors the existing Confidence Check rule. Below 95%, plan first, then `/goal`. The evaluator has no anchor for "is this correct"; only for "did the agent stop." Invoking at MEDIUM/LOW lets the evaluator certify wishful thinking.

2. DLC binding — condition MUST name the active DLC

`/sdlc` for code, `/gdlc` for games, `/ldlc` for legal, etc. Without the DLC name in the condition string, the evaluator anchors on completion only ("did the work end") instead of correctness ("did it follow the contract").

Example:

/goal "tests pass + clean tree following /sdlc, stop after 20 turns"

The "following /sdlc" clause makes the evaluator judge SDLC compliance per turn, not just task completion.

Why now

Native `/goal` shipped in CC v2.1.139 (~mid-May 2026), missed by wizard's autoupdate pipeline (the gap #350 fixes). Now that it's universal across agentic tools, anchoring it in SDLC discipline is the right wizard play — turns a "let it cook" primitive into a "let it cook properly" primitive.

What changed

  • `skills/sdlc/SKILL.md` — two new rules in the `## Long-Running Goals (/goal)` section
  • `tests/test-doc-consistency.sh::test_sdlc_skill_has_goal_wrapper` — extended grep to enforce both new keywords (`95%`/`confidence gate` + `DLC binding`/`name the DLC`)
  • Compensating trims to stay under roadmap(#217): loud WARNING below xhigh — max preferred, xhigh floor #236 5K-token cap: condensed /goal example, collapsed Multi-reviewer paragraph

Test plan

  • 40/40 `test-doc-consistency.sh` (new test_sdlc_skill_has_goal_wrapper checks 9 keywords now: section-header, version-floor, trusted-workspace, disableAllHooks, hard-bound, anti-pattern, resume-caveat, 95%-confidence-gate, DLC-binding)
  • 10/10 `test-audit-session-load.sh` (skill at 4989/5000 tokens — under cap)
  • CI `validate` green

…DLC binding (PR-D)

Native /goal is now table-stakes across CC (v2.1.139+), Codex CLI, and
likely others. Without SDLC discipline baked into the goal CONDITION
itself, the Haiku evaluator rubber-stamps 'did the agent flail for 20
turns' instead of 'is the goal met correctly.'

Two new load-bearing gates in skills/sdlc/SKILL.md ## Long-Running
Goals section:

1. **Confidence gate — NEVER invoke below HIGH 95%.** Mirrors existing
   Confidence Check (plan first if below). Below 95% the evaluator
   has no anchor for 'is this correct'; only for 'did the agent stop.'

2. **DLC binding — condition MUST name the active DLC** (/sdlc for
   code, /gdlc for games, /ldlc for legal, etc.). Anchors the
   evaluator on 'doing it right,' not just 'doing it.' Example:
   /goal 'tests pass + clean tree following /sdlc, stop after 20 turns'.

Quality test extended (tests/test-doc-consistency.sh::test_sdlc_skill_has_goal_wrapper):
adds keyword greps for '95% confidence' / 'HIGH 95%' / 'confidence
gate' AND 'DLC binding' / 'name the DLC' / 'condition MUST name'.

Cross-cutting compensating trims to stay under #236 5K-token cap
(skill at 4989/5000):
- /goal section condensed example wording ('tests pass + clean tree'
  vs 'npm test=0 AND git clean')
- Cross-Model Review's Multi-reviewer paragraph collapsed into one line

40/40 doc-consistency green, 10/10 audit green.
@BaseInfinity BaseInfinity force-pushed the feat/goal-sdlc-discipline branch from 7ac9990 to f001405 Compare May 25, 2026 00:28
@BaseInfinity BaseInfinity enabled auto-merge (squash) May 25, 2026 00:28
@BaseInfinity BaseInfinity merged commit 2e65da7 into main May 25, 2026
4 checks passed
BaseInfinity added a commit that referenced this pull request May 26, 2026
…) (#361)

`npx agentic-sdlc-wizard init` (no @latest pin) silently serves whatever
version is cached on disk, sometimes months old. The /update Step 1.5
pattern already catches this — but only at /update time, after the user
has already installed stale templates.

This moves the same check to init time so the gap surfaces immediately:

- cli/init.js: new maybeEmitStaleCliNudge() called before planOperations.
  Reuses SDLC_WIZARD_CACHE_DIR/latest-version (24h TTL, #239 poison check),
  falls back to `npm view` with 5s timeout, silent on offline/error.
- README.md + CLAUDE_CODE_SDLC_WIZARD.md: install command recommends
  `npx -y agentic-sdlc-wizard@latest init` so first-time users skip the
  cache trap by default.
- tests/test-cli.sh: 3 quality tests covering nudge-fires-when-stale,
  silent-when-current, no-reverse-nudge-when-poisoned.

Bundled: ROADMAP #347 + #350 status flip from "Actionable now" to DONE
with PR citations (#351/#355/#354). Both shipped this week — paperwork
was stale.

Verified locally: test-cli.sh 91/91, test-doc-consistency 40/40,
test-docs-usability 29/29, test-update-skill-cli-version 8/8,
test-hooks 156/156, test-setup-path 83/83, test-self-update 153/153.
BaseInfinity added a commit that referenced this pull request May 26, 2026
 (#363)

PR #361 shipped 3 #358 tests where Test C was silent-when-cache-poisoned
rather than the goal-named silent-when-offline. The /goal Haiku evaluator
counted tests by file presence and approved completion. Caught only at
post-merge self-review, fixed in PR #362 (added the real offline test via
fake-npm PATH override).

Promoted to SDLC.md ## Lessons Learned → ### Testing so future sessions
inherit the rule: when a /goal condition enumerates test cases by name,
self-review must verify each named test exists by reading assertions, not
by counting matching `test_` functions.

This is the per-test-fidelity layer beneath PR #355's HIGH-95%-confidence
and DLC-binding gates — those work at the macro level, but enumerated-
condition fidelity is the author's responsibility.

Verified locally: test-doc-consistency 40/40, test-postmortem-lessons 7/7,
test-memory-audit-protocol 12/12.
BaseInfinity added a commit that referenced this pull request Jun 1, 2026
PR #355 (v1.77.0) added skill-text guidance that `/goal` must state HIGH 95%
confidence and bind to a DLC (`/sdlc`, `/gdlc`, `/ldlc`) before firing. The
guidance worked at the meta level but had no runtime enforcement — exactly
the failure mode the discipline gate documented (text-only guidance doesn't
survive the failure mode it's trying to prevent).

This adds `hooks/goal-confidence-check.sh` as a UserPromptSubmit hook that:

1. Matches `/goal <condition>` prompts (silent on `/goal` status + `/goal clear`).
2. Reads `transcript_path` from hook input (verified available on UserPromptSubmit
   per Anthropic hook docs at code.claude.com/docs/en/hooks), walks the JSONL
   to find the last assistant text message, and scans for HIGH-95% confidence
   patterns (`HIGH (95%`, `Confidence: HIGH`, `HIGH 95%`, etc.).
3. Greps the goal condition for a DLC binding (`/[a-z]+dlc`).
4. Emits LOUD warnings on either gap — non-blocking soft nudge (exit 0),
   same pattern as `model-effort-check.sh`.

Registered in both channels: `cli/templates/settings.json` (npm/CLI) and
`hooks/hooks.json` (plugin via ${CLAUDE_PLUGIN_ROOT}). Dedupes via
`_find-sdlc-root.sh` helper so dual installs don't double-fire.

Bundled paperwork:

- `ROADMAP.md` adds the missing `## Research Parking Lot` section that the
  Demand-Signal-First entry gate references but was never created. Includes
  the maintenance rule (prune expired rows during quarterly triage).
- `skills/sdlc/SKILL.md` adds a one-line enforcement cross-reference in the
  Long-Running Goals section. Trimmed adjacent prose to stay under the 5K
  token cap (#236).

#359 triage (companion close):

Of the 10 API changelog entries since 2026-04-16: 8 are clearly no-op for
the wizard (MCP tunnels, SEC web search data, cache diagnostics, fast mode,
AWS hosting, multiagent sessions, rate limits API, managed-agent memory —
all orthogonal to SDLC enforcement). 2 are deprecations — verified zero
usage in this repo: `context-1m-2025-08-07` (Sonnet 4.5/4 1M beta retired
2026-04-30; we use the separate `opus[1m]` alias) and `claude-3-haiku-
20240307` (Haiku 3 retired 2026-04-20; we never used it). #359 will be
closed with the triage comment after merge.

Verified locally: test-hooks 160/160 (4 new #360 tests + 156 existing),
test-audit-session-load 10/10, test-cli 91/91, test-plugin all green,
test-doc-consistency 41/41, test-self-update 153/153, test-setup-path 83/83.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant