codex: add verify mode + Claude-skepticism preamble (v1.1.0)#1395
Closed
clydle wants to merge 1 commit intogarrytan:mainfrom
Closed
codex: add verify mode + Claude-skepticism preamble (v1.1.0)#1395clydle wants to merge 1 commit intogarrytan:mainfrom
clydle wants to merge 1 commit intogarrytan:mainfrom
Conversation
Adds two hardening changes to the /codex skill: A) SKEPTICISM PREAMBLE injected into all three existing modes (Review, Challenge, Consult). Codex is instructed to flag as P1: unverified third-party UI/API claims, bulk actions without single-item test prerequisites, confidence-language without evidence, and WordPress URL changes without DB reference checks. B) New Step 2D (Verify mode): /codex verify <recommendation> — pastes a Claude recommendation and gets independent fact-checking from Codex. Each external-system claim gets CLAIM/CITATION/VERDICT/REASONING/P1 entries and a single GATE: PASS/FAIL line. Both changes use tempfile + stdin (codex review - < file, codex exec - < file) rather than heredocs or positional args, sidestepping shell-escaping, EOF-collision, and Windows command-line length limits. Closes the 2026-05-09 ShipStation incident failure path: Claude recommended clicking "Send Notifications" (label-based inference, no doc verification), which re-sent shipping emails to 814 customers instead of replaying webhooks. Verify mode would have returned GATE: FAIL on that recommendation. Also bumps version 1.0.0 → 1.1.0 and updates frontmatter description. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Owner
|
Thanks @clydle — closing as stale. Codex verify mode + Claude-skepticism preamble (937 lines) is a substantial proposal warranting standalone focused review separate from a wave. The codex skill has had several refresh waves since this was opened. Happy to revisit if scoped down to a focused slice. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
/codex verifymode (Step 2D): User pastes a Claude recommendation; Codex independently fact-checks every external-system claim (third-party UI buttons, API endpoints, plugin behavior, SaaS dashboard semantics) against documentation. Returns structured CLAIM/CITATION/VERDICT/REASONING/P1 entries and a single GATE: PASS/FAIL gate.codex review - < file,codex exec - < file) instead of positional args or heredocs, eliminating shell-escaping, EOF-collision, and Windows command-line length issues.Incident context
On 2026-05-09, Claude Code recommended clicking ShipStation's "Send Notifications" button as "the preferred path" for replaying WooCommerce shipnotify webhooks. That button does not replay webhooks — it re-sends ShipStation's own branded shipping confirmation emails. 814 customers received duplicate "your package has shipped" emails for orders shipped weeks earlier. Source: label-based inference, no doc verification.
/codex verifywith that recommendation would have returned GATE: FAIL.Pre-flight checks (confirmed before editing)
codex review --help: accepts-for stdin read. No--prompt-fileflag. Using stdin.codex exec --help: accepts-for stdin read. No--prompt-fileflag. Using stdin.python3: available at/c/Users/clydl/AppData/Roaming/Python/Python314/Scripts/python3gen:skill-docs: located at~/.claude/skills/gstack/package.json—bun run gen:skill-docsTest plan
bun run gen:skill-docsran cleanly —codex/SKILL.mdregenerated (1000 lines)/codex verify "I recommend clicking ShipStation's Send Notifications button to replay shipnotify webhooks. This is the preferred path."→ expect GATE: FAIL, P1: YES/codex verifywith payload containingEOF_VERIFY_PROMPTon its own line → expect normal completion (tempfile approach immune)/codex verify ""→ expect GATE: PASS, no Codex callNotes
python3portability on Windows: existing issue inherited by Step 2D (not introduced here). Portable fallback (python3 || python || py -3) is a follow-up item.read-onlysandbox mode may block PowerShell. Existing pattern from Steps 2B/2C is preserved in Step 2D. Verify mode usesread-only; if Windows issues arise, bump toworkspace-writeas done elsewhere.🤖 Generated with Claude Code
Need help on this PR? Tag
@codesmithwith what you need.