Skip to content

codex: add verify mode + Claude-skepticism preamble (v1.1.0)#1395

Closed
clydle wants to merge 1 commit intogarrytan:mainfrom
clydle:codex/verify-mode-skepticism-preamble
Closed

codex: add verify mode + Claude-skepticism preamble (v1.1.0)#1395
clydle wants to merge 1 commit intogarrytan:mainfrom
clydle:codex/verify-mode-skepticism-preamble

Conversation

@clydle
Copy link
Copy Markdown

@clydle clydle commented May 9, 2026

Summary

  • New /codex verify mode (Step 2D): User pastes a Claude recommendation; Codex independently fact-checks every external-system claim (third-party UI buttons, API endpoints, plugin behavior, SaaS dashboard semantics) against documentation. Returns structured CLAIM/CITATION/VERDICT/REASONING/P1 entries and a single GATE: PASS/FAIL gate.
  • SKEPTICISM PREAMBLE injected into all existing modes: Review, Challenge, and Consult prompts now prepend a trust-boundary preamble that instructs Codex to flag as P1: unverified third-party claims, bulk actions without single-item test prerequisites, confidence-language without evidence, and WordPress URL changes without DB checks.
  • Stdin prompt delivery: All modes now write prompts to tempfiles and pass them via stdin (codex review - < file, codex exec - < file) instead of positional args or heredocs, eliminating shell-escaping, EOF-collision, and Windows command-line length issues.
  • Version bump: 1.0.0 → 1.1.0. Frontmatter description updated to "four modes."
  • New "Claude Code Trust Boundaries" section documents the 2026-05-09 incident and the SKEPTICISM PREAMBLE so future maintainers understand why this skill is configured the way it is.

Incident context

On 2026-05-09, Claude Code recommended clicking ShipStation's "Send Notifications" button as "the preferred path" for replaying WooCommerce shipnotify webhooks. That button does not replay webhooks — it re-sends ShipStation's own branded shipping confirmation emails. 814 customers received duplicate "your package has shipped" emails for orders shipped weeks earlier. Source: label-based inference, no doc verification. /codex verify with that recommendation would have returned GATE: FAIL.

Pre-flight checks (confirmed before editing)

  • codex review --help: accepts - for stdin read. No --prompt-file flag. Using stdin.
  • codex exec --help: accepts - for stdin read. No --prompt-file flag. Using stdin.
  • python3: available at /c/Users/clydl/AppData/Roaming/Python/Python314/Scripts/python3
  • gen:skill-docs: located at ~/.claude/skills/gstack/package.jsonbun run gen:skill-docs

Test plan

  • bun run gen:skill-docs ran cleanly — codex/SKILL.md regenerated (1000 lines)
  • All 9 design changes verified present in generated SKILL.md (version, Trust Boundaries section, SKEPTICISM PREAMBLE in Steps 2A/2B/2C, Step 2D, verify in Step 1 detection, stdin approach, mandatory preamble rule)
  • Smoke test (manual, requires live session): /codex verify "I recommend clicking ShipStation's Send Notifications button to replay shipnotify webhooks. This is the preferred path." → expect GATE: FAIL, P1: YES
  • Heredoc-collision test (manual): /codex verify with payload containing EOF_VERIFY_PROMPT on its own line → expect normal completion (tempfile approach immune)
  • Empty-payload test (manual): /codex verify "" → expect GATE: PASS, no Codex call

Notes

  • python3 portability on Windows: existing issue inherited by Step 2D (not introduced here). Portable fallback (python3 || python || py -3) is a follow-up item.
  • On Windows, read-only sandbox mode may block PowerShell. Existing pattern from Steps 2B/2C is preserved in Step 2D. Verify mode uses read-only; if Windows issues arise, bump to workspace-write as done elsewhere.

🤖 Generated with Claude Code


View in Codesmith
Need help on this PR? Tag @codesmith with what you need.

  • Let Codesmith autofix CI failures and bot reviews

Adds two hardening changes to the /codex skill:

A) SKEPTICISM PREAMBLE injected into all three existing modes (Review,
   Challenge, Consult). Codex is instructed to flag as P1: unverified
   third-party UI/API claims, bulk actions without single-item test
   prerequisites, confidence-language without evidence, and WordPress URL
   changes without DB reference checks.

B) New Step 2D (Verify mode): /codex verify <recommendation> — pastes a
   Claude recommendation and gets independent fact-checking from Codex.
   Each external-system claim gets CLAIM/CITATION/VERDICT/REASONING/P1
   entries and a single GATE: PASS/FAIL line.

Both changes use tempfile + stdin (codex review - < file, codex exec - < file)
rather than heredocs or positional args, sidestepping shell-escaping,
EOF-collision, and Windows command-line length limits.

Closes the 2026-05-09 ShipStation incident failure path: Claude recommended
clicking "Send Notifications" (label-based inference, no doc verification),
which re-sent shipping emails to 814 customers instead of replaying webhooks.
Verify mode would have returned GATE: FAIL on that recommendation.

Also bumps version 1.0.0 → 1.1.0 and updates frontmatter description.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@garrytan
Copy link
Copy Markdown
Owner

Thanks @clydle — closing as stale. Codex verify mode + Claude-skepticism preamble (937 lines) is a substantial proposal warranting standalone focused review separate from a wave. The codex skill has had several refresh waves since this was opened. Happy to revisit if scoped down to a focused slice.

@garrytan garrytan closed this May 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants