Skip to content

fix(sdlc): teach cross-model review to use Bash background-mode (closes #364)#366

Merged
BaseInfinity merged 2 commits into
mainfrom
fix/364-codex-review-background-mode
May 31, 2026
Merged

fix(sdlc): teach cross-model review to use Bash background-mode (closes #364)#366
BaseInfinity merged 2 commits into
mainfrom
fix/364-codex-review-background-mode

Conversation

@BaseInfinity
Copy link
Copy Markdown
Owner

Summary

The SDLC skill's Cross-Model Review section told Claude that "xhigh runs take 1-5 min" and didn't say to use run_in_background: true on the Bash tool. The Bash tool clamps timeout to 600000 ms (10 min) and silently force-kills foreground codex at the wall — even if Claude passes timeout: 2400000 under the reasonable assumption the harness honors it.

Real session (2026-05-27, fixbot-audit repo): foreground codex on a multi-artifact bundle got killed at the 10-min wall; the Stop hook re-invoked Claude 9 times before the harness CLAUDE_CODE_STOP_HOOK_BLOCK_CAP safeguard fired. Total burn: ~70 min of compute on what would have been a single 7-min review. The same prompt fired again with run_in_background: true completed in ~7 min.

Changes

File What
skills/sdlc/SKILL.md Run-reviewer line now mandates run_in_background: true + dangerouslyDisableSandbox: true; explains why (< /dev/null prevents stdin-hang at S/0% CPU; background avoids 10-min timeout cap). Bundle time corrected from "1–5 min" to "1–30 min" (single ≤5; bundle 5–30). Three sibling paragraphs compressed to keep SKILL.md under the 5K-token cap (#236: 4999 → 5099 after my add → 4950 after trim).
CLAUDE_CODE_SDLC_WIZARD.md Two sibling callouts (one per Cross-Model Review section) parallel to the existing < /dev/null callouts. Includes the general rule: "any long-running wrapper invoked through the CC Bash tool — codex, slow builds, long test suites — should use run_in_background: true unconditionally and let the wrapper's own stall watchdog be the timeout authority."
tests/test-doc-consistency.sh New test_codex_review_guidance_teaches_background_mode quality test asserts both SKILL.md and wizard doc carry run_in_background: true AND cite the 10-min cap reason (600000/10 min/10-min). Mutation: removing either string from either file flips test to FAIL.

Closes

Test plan

Why this matters

The wizard is the recommendation engine for downstream consumer repos. Every consumer that runs /sdlc cross-model review on a multi-artifact bundle was at risk of the same 70-min burn pattern. Closing the guidance gap upstream is the lowest-cost fix.

#364)

The SDLC skill's Cross-Model Review section told Claude that "xhigh runs take
1-5 min" and didn't say to use `run_in_background: true` on the Bash tool.
The Bash tool clamps `timeout` to 600000 ms (10 min) and silently force-kills
foreground codex at the wall — even if Claude passes `timeout: 2400000` under
the reasonable assumption that the harness honors it.

Real session (2026-05-27): a foreground codex on a multi-artifact bundle
got killed at the 10-min wall, the Stop hook re-invoked Claude 9 times
before the harness `CLAUDE_CODE_STOP_HOOK_BLOCK_CAP` safeguard fired. Total
burn: ~70 min of compute on what would have been a single 7-min review.

Changes:
- `skills/sdlc/SKILL.md` — flip the run-reviewer line to mandate
  `run_in_background: true` + `dangerouslyDisableSandbox: true` and explain
  why (`< /dev/null` prevents stdin-hang, background avoids the 10-min cap).
  Also compressed three sibling paragraphs to stay under the 5K-token cap
  (audit was 4999 → 5099 after add; now 4950, 50-token buffer).
- `CLAUDE_CODE_SDLC_WIZARD.md` — two sibling callouts (one per Cross-Model
  Review section), with the general rule: "any long-running wrapper invoked
  through the CC Bash tool — codex, slow builds, long test suites — should
  use `run_in_background: true` unconditionally and let the wrapper's own
  stall watchdog be the timeout authority."
- `tests/test-doc-consistency.sh` — new quality test asserts both SKILL.md
  and wizard doc carry the `run_in_background: true` guidance AND cite the
  10-min cap reason. Mutation: removing either string from either file
  flips the test to FAIL.

Verified locally: test-doc-consistency 41/41, test-docs-usability 29/29,
test-audit-session-load 10/10 (SKILL.md back under 5K cap), test-hooks 156/156.
PR #366 trims to lines 138 + 144 saved tokens but dropped three strings that
test-self-update.sh greps for:

- "verification checklist" (space) — I had collapsed to JSON key form
  `"verification_checklist"` only. Test pattern is case-insensitive with
  the literal space form.
- "respond to each" / "each.*review" — I had compressed to "respond
  per-reviewer". The test pattern uses BRE where `?` is literal, so
  per.?review only matches per?review / per-review literally as
  per + any-char + ?. The "respond to each" alternative is the safer match.
- "non-code domains" — I had trimmed to "Non-code:". The pattern is
  non-code.*domain so the suffix is load-bearing.

Net char change: +40 chars (4960 tokens, 40-token buffer under the 5K cap).

Verified locally: test-self-update 153/153, test-audit-session-load 10/10,
test-doc-consistency 41/41, test-hooks 156/156.
@BaseInfinity BaseInfinity merged commit 85e9163 into main May 31, 2026
4 checks passed
@BaseInfinity BaseInfinity deleted the fix/364-codex-review-background-mode branch May 31, 2026 00:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[bug] SDLC codex-review guidance lets foreground Bash kill xhigh runs at 10-min cap

1 participant