Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions CLAUDE_CODE_SDLC_WIZARD.md
Original file line number Diff line number Diff line change
Expand Up @@ -2366,6 +2366,8 @@ PLANNING → DOCS → TDD RED → TDD GREEN → Tests Pass → Self-Review

> **Always append `< /dev/null`** to `codex exec` calls run from background, hooks, CI, or any non-interactive parent. Without it, codex blocks on stdin reads even when the prompt is given as an argument — the process sits at S/0% CPU indefinitely with a 0-byte `-o` output file (the file is only written on completion, so a hang gives zero visibility). Validated on codex-cli 0.130.0 / macOS 14, 2026-05-15. For live progress, use `scripts/codex-review-with-progress.sh` instead.

> **Always launch codex via `run_in_background: true` on the Bash tool.** The Bash tool clamps `timeout` to 600000 ms (10 min) regardless of the value passed, and force-kills the foreground process at that wall. Multi-artifact bundle reviews (release reviews per the checklist below, multi-finding rechecks, etc.) routinely run 6–30 minutes — they need background mode to complete. The wrapper `scripts/codex-review.sh` already has a 30-min stall watchdog (`STALL_SECONDS=1800`) as the real timeout control. A foreground call killed mid-review plus the Stop-hook re-invocation loop can burn 60+ minutes of session compute on what should be a single 7-minute run (issue #364, 2026-05-27 incident). The general rule: **any long-running wrapper invoked through the CC Bash tool — codex, slow builds, long test suites — should use `run_in_background: true` unconditionally and let the wrapper's own stall watchdog be the timeout authority.**

3. If CERTIFIED → proceed to CI. If NOT CERTIFIED → go to Round 2.

### Round 2+: Dialogue Loop
Expand Down Expand Up @@ -3802,6 +3804,8 @@ codex exec \

> **Always append `< /dev/null`** to `codex exec` calls run from background, hooks, CI, or any non-interactive parent. Without it, codex blocks on stdin reads even when the prompt is given as an argument — the process sits at S/0% CPU indefinitely with a 0-byte `-o` output file. Validated on codex-cli 0.130.0 / macOS 14, 2026-05-15. For live progress visibility, use `scripts/codex-review-with-progress.sh` instead.

> **Always launch via `run_in_background: true` on the Bash tool.** Same reason as the parallel callout in the Cross-Model Review Loop section above — Bash tool clamps `timeout` to 600000 ms (10 min) regardless of the value passed and force-kills foreground at the wall. Multi-artifact bundle reviews need background mode to complete. See issue #364 (2026-05-27 incident: 70 min session compute on a 7-min review).

4. If CERTIFIED → done. If NOT CERTIFIED → enter the dialogue loop.

**The Dialogue Loop (Round 2+):**
Expand Down
8 changes: 4 additions & 4 deletions skills/sdlc/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -135,13 +135,13 @@ The loop goes back to PLANNING, not TDD RED. Run `/code-review`; issues at confi
PROTOCOL is universal across domains; only `review_instructions` and `verification_checklist` change.

1. **Preflight** (`.reviews/preflight-{review_id}.md`) — what you already checked: `/code-review` passed, tests passing, manual verifications, known limits. Reduces reviewer findings to 0-1/round.
2. **Mission-first handoff** (`.reviews/handoff.json`) — required JSON keys: `"review_id"`, `"status": "PENDING_REVIEW"`, `"round": 1`, `"mission"` / `"success"` / `"failure"` (2-3 sentences each — without them you get "looks good"), `"files_changed"`, `"verification_checklist"` — the **verification checklist** is specific items with file:line refs (NOT a generic "review this"), `"review_instructions"`, `"preflight_path"`. Optional `"pr_number":` opts into the PreCompact self-heal (#209): if PR is MERGED, `/compact` treats handoff as implicit CERTIFIED.
3. **Run reviewer:** `codex exec -c 'model_reasoning_effort="xhigh"' -s danger-full-access -o .reviews/latest-review.md "<prompt>" < /dev/null`. Always `xhigh`. **Always append `< /dev/null`** — codex blocks on stdin reads from non-interactive parents (background, hooks, CI, CC Bash tool); without the redirect it hangs at S/0% CPU indefinitely. CC sandbox blocks Codex's Rust binary (`SCDynamicStore`) — use `dangerouslyDisableSandbox: true` on Bash; Codex has its own sandbox. xhigh runs take 1-5 min; for a heartbeat use `scripts/codex-review-with-progress.sh` (already redirects child stdin).
2. **Mission-first handoff** (`.reviews/handoff.json`) — required keys: `"review_id"`, `"status": "PENDING_REVIEW"`, `"round": 1`, `"mission"`/`"success"`/`"failure"` (without them you get "looks good"), `"files_changed"`, `"verification_checklist"` (verification checklist with file:line refs NOT generic), `"review_instructions"`, `"preflight_path"`. Optional `"pr_number":` opts into PreCompact self-heal (#209: PR MERGEDimplicit CERTIFIED).
3. **Run reviewer:** `codex exec -c 'model_reasoning_effort="xhigh"' -s danger-full-access -o .reviews/latest-review.md "<prompt>" < /dev/null`. Always `xhigh`. Bash tool requires `run_in_background: true` + `dangerouslyDisableSandbox: true`; append `< /dev/null` always. **Why:** `< /dev/null` prevents codex stdin-hang at S/0% CPU; `run_in_background: true` avoids the Bash 10-min (`600000` ms) `timeout` cap that force-kills foreground codex (multi-artifact bundles take 5–30 min). xhigh 1–30 min; wrapper's `STALL_SECONDS=1800` is the real control. Heartbeat: `scripts/codex-review-with-progress.sh`. Foreground burned 70 min on a 7-min review (#364).
4. **Dialogue loop:** per-finding response (`{"finding": "1", "action": "FIXED|DISPUTED|ACCEPTED", "summary": "..."}` in `.reviews/response.json`). Bump round, set status `PENDING_RECHECK`, add `fixes_applied` (numbered, file:line). Recheck prompt: "TARGETED RECHECK. FIXED → verify certify condition. DISPUTED → ACCEPT if sound, REJECT with reasoning. ACCEPTED → verify applied. No new findings unless P0."

**Convergence:** 2 rounds sweet spot, 3 max (research: 14 repos + 7 papers). After 3 still NOT CERTIFIED → escalate.
**Convergence:** 2 rounds sweet spot, 3 max. After 3 NOT CERTIFIED → escalate.

**Multi-reviewer:** parallel reviewers → respond per-reviewer (different blind spots, no shared anchoring). **Non-code domains:** same handoff + add `"audience"`/`"stakes"` keys.
**Multi-reviewer:** respond to each reviewer independently (no shared anchoring). **Non-code domains:** add `"audience"`/`"stakes"` keys to handoff.

### Release Review Focus

Expand Down
25 changes: 25 additions & 0 deletions tests/test-doc-consistency.sh
Original file line number Diff line number Diff line change
Expand Up @@ -831,11 +831,36 @@ test_sdlc_skill_has_goal_wrapper() {
fi
}

# Cross-model review guidance must teach background-mode (issue #364, 2026-05-27).
# Bash tool clamps `timeout` to 600000 ms; foreground codex gets force-killed at
# the wall. Without this guidance, sessions burn 60+ minutes on 7-minute reviews
# (incident: 70 min + 9 Stop-hook re-invocations before harness safeguard fired).
test_codex_review_guidance_teaches_background_mode() {
local SKILL="$REPO_ROOT/skills/sdlc/SKILL.md"
local WIZARD="$REPO_ROOT/CLAUDE_CODE_SDLC_WIZARD.md"
local missing=""
if [ ! -f "$SKILL" ]; then fail "skills/sdlc/SKILL.md not found"; return; fi
if [ ! -f "$WIZARD" ]; then fail "CLAUDE_CODE_SDLC_WIZARD.md not found"; return; fi
# SKILL.md must teach background-mode + cite the 10-min cap reason
grep -qF 'run_in_background: true' "$SKILL" || missing+=" SKILL.md:run_in_background"
grep -qiE '600000|10 min|10-min' "$SKILL" || missing+=" SKILL.md:10-min-cap-reason"
# Wizard doc must carry the same guidance (verified in at least one of the
# two Cross-Model Review sections — both edits land in the same release).
grep -qF 'run_in_background: true' "$WIZARD" || missing+=" WIZARD.md:run_in_background"
grep -qiE '600000|10 min|10-min' "$WIZARD" || missing+=" WIZARD.md:10-min-cap-reason"
if [ -z "$missing" ]; then
pass "Cross-model review guidance teaches background-mode + cites 10-min cap (#364)"
else
fail "Cross-model review background-mode guidance missing:$missing"
fi
}

test_setup_skill_mentions_insights_with_caveat
test_wizard_doc_lists_insights_with_caveat
test_sdlc_skill_has_precedence_preamble
test_codex_exec_blocks_redirect_stdin
test_sdlc_skill_has_goal_wrapper
test_codex_review_guidance_teaches_background_mode

# ────────────────────────────────────────────
# Summary
Expand Down