EveryInc · tmchow · May 12, 2026
diff --git a/plugins/compound-engineering/skills/ce-brainstorm/SKILL.md b/plugins/compound-engineering/skills/ce-brainstorm/SKILL.md
@@ -137,33 +137,7 @@ This is agent-internal analysis, not a user-facing checklist. Read the opening,
 - Are we duplicating something that already covers this?
 - Is there a clearly better framing with near-zero extra cost?
 
-**Standard — scan for these gaps:**
-
-- **Evidence gap.** The opening asserts want or need, but doesn't point to anything the would-be user has already done — time spent, money paid, workarounds built — that would make the want observable. When present, ask for the most concrete thing someone has already done about this.
-
-- **Specificity gap.** The opening describes the beneficiary at a level of abstraction where the agent couldn't design without silently inventing who they are and what changes for them. When present, ask the user to name a specific person or narrow segment, and what changes for that person when this ships.
-
-- **Counterfactual gap.** The opening doesn't make visible what users do today when this problem arises, nor what changes if nothing ships. When present, ask what the current workaround is, even if it's messy — and what it costs them.
-
-- **Attachment gap.** The opening treats a particular solution shape as the thing being built, rather than the value that shape is supposed to deliver, and hasn't been examined against smaller forms that might deliver the same value. When present, ask what the smallest version that still delivers real value would look like.
-
-Plus these synthesis questions — not gap lenses, product-judgment the agent weighs in its own reasoning:
-- Is there a nearby framing that creates more user value without more carrying cost? If so, what complexity does it add?
-- Given the current project state, user goal, and constraints, what is the single highest-leverage move right now: the request as framed, a reframing, one adjacent addition, a simplification, or doing nothing?
-
-Favor moves that compound value, reduce future carrying cost, or make the product meaningfully more useful or compelling. Use the result to sharpen the conversation, not to bulldoze the user's intent.
-
-**Deep** — Standard lenses and synthesis questions plus:
-- Is this a local patch, or does it move the broader system toward where it wants to be?
-
-**Deep — product** — Deep plus:
-
-- **Durability gap.** The opening's value proposition rests on a current state of the world that may shift in predictable ways within the horizon the user cares about. When present, ask how the idea fares under the most plausible near-term shifts — and push past rising-tide answers every competitor could make.
-
-- What adjacent product could we accidentally build instead, and why is that the wrong one?
-- What would have to be true in the world for this to fail?
-
-These questions force an explicit product thesis and feed the Scope Boundaries subsections ("Deferred for later" and "Outside this product's identity") and Dependencies / Assumptions in the requirements document.
+**Standard, Deep, and Deep — product:** scan the opening against gap lenses by tier — Standard uses four (evidence, specificity, counterfactual, attachment); Deep adds a system-level reframe; Deep — product adds durability. Read `references/rigor-probes.md` for the gap definitions, synthesis questions, and what each gap feeds in the requirements document.
 
 #### 1.3 Collaborative Dialogue
 
@@ -172,7 +146,7 @@ Follow the Interaction Rules above. Use the platform's blocking question tool wh
 **Guidelines:**
 - Ask what the user is already thinking before offering your own ideas. This surfaces hidden context and prevents fixation on AI-generated framings.
 - Start broad (problem, users, value) then narrow (constraints, exclusions, edge cases)
-- **Rigor probes fire before Phase 2 and are prose, not menus.** Narrowing is legitimate, but Phase 1 cannot end with un-probed rigor gaps. Each scope-appropriate gap from Phase 1.2 fires as a **separate** direct prose probe — one probe satisfies one gap, not multiple. Standard brainstorms scan four gap lenses (evidence, specificity, counterfactual, attachment); Deep-product adds durability (five total), but only the gaps actually present in the opening must be probed. Surface those probes progressively across the conversation — interleaving with narrowing moves is fine, as long as every scope-appropriate gap that was found in Phase 1.2 has been probed in prose before Phase 2. Rigor probes map to Interaction Rule 5(b): a 4-option menu signals which kinds of evidence count and lets the user pick rather than produce. Prose forces them to produce real observation or surface their uncertainty. Examples (one per gap): *evidence — "What's the most concrete thing someone's already done about this — paid, built a workaround, quit a tool over it?"* / *specificity — "Can you name a team you've actually watched hit this, or are you reasoning?"* / *counterfactual — "What do teams do today when this breaks — who reconciles?"* / *attachment — "Before we move to shapes or approaches — what's the smallest version that would still prove the bet right, and what's excluded?"* — **attachment is the final rigor probe before Phase 2 when the attachment gap is present. Fire it regardless of whether a specific shape has emerged through narrowing; its job is to pressure-test the user's implicit framing of the product before Phase 2 inherits it** / *durability — "Under the most plausible near-term shifts, how does this bet hold?"* If the answer reveals genuine uncertainty, record it as an explicit assumption in the requirements document rather than skipping the probe.
+- **Rigor probes fire before Phase 2 and are prose, not menus.** Narrowing is legitimate, but Phase 1 cannot end with un-probed rigor gaps. Each scope-appropriate gap from Phase 1.2 fires as a **separate** direct prose probe — one probe satisfies one gap, not multiple. Only the gaps actually present in the opening must be probed. Surface those probes progressively across the conversation — interleaving with narrowing moves is fine, as long as every scope-appropriate gap that was found in Phase 1.2 has been probed in prose before Phase 2. Rigor probes map to Interaction Rule 5(b): a menu signals which kinds of evidence count and lets the user pick rather than produce; prose forces them to produce real observation or surface their uncertainty. **Attachment is the final rigor probe before Phase 2 when the attachment gap is present.** Fire it regardless of whether a specific shape has emerged through narrowing; its job is to pressure-test the user's implicit framing of the product before Phase 2 inherits it. See `references/rigor-probes.md` for example probe wordings (one per gap).
 - Clarify the problem frame, validate assumptions, and ask about success criteria
 - Make requirements concrete enough that planning will not need to invent behavior
 - Surface dependencies or prerequisites only when they materially affect scope
@@ -219,9 +193,9 @@ Surface a synthesis to the user before Phase 3 writes the requirements doc — t
 
 Fires for **all tiers** including Lightweight. Skip Phase 2.5 entirely on the Phase 0.1b non-software (universal-brainstorming) route.
 
-**Headless mode** (LFG / `disable-model-invocation`): the synthesis is composed but not confirmed. Inferred bets route to a `## Assumptions` section in the doc (so downstream review can scrutinize them as un-validated), not into Key Decisions. See `references/synthesis-summary.md` Headless mode for the full routing.
+**Headless mode** (no synchronous user — e.g., invoked from an automated workflow like LFG, or any non-interactive context such as a Claude Code skill with `disable-model-invocation: true`): the synthesis is composed but not confirmed. Inferred bets route to a `## Assumptions` section in the doc (so downstream review can scrutinize them as un-validated), not into Key Decisions. See `references/synthesis-summary.md` Headless mode for the full routing.
 
-**Announce-mode (Phase 0.2 fast path)**: on the "requirements already clear" fast path, Phase 2.5 fires in announce-mode — emit the synthesis (Stated / Inferred / Out) for visibility, then **end the turn**. Do NOT call the Write tool in the same turn as the synthesis emission. On the user's next message: if it's an acknowledgment, follow-up, or any non-correcting input, proceed to Phase 3 doc-write; if it indicates a correction (push-back on an Inferred bullet, scope adjustment), revise the synthesis and emit again. Lighter than full Phase 2.5 (no `AskUserQuestion` menu, no formal confirm option) but still gives the user a real interruption window before the doc lands. ce-brainstorm sits early in the workflow; a wrong-doc has downstream consequence (feeds ce-plan, then implementation), so the turn boundary is justified even on the fast path.
+**Announce-mode (Phase 0.2 fast path)**: on the "requirements already clear" fast path, Phase 2.5 fires in announce-mode — emit the synthesis (Stated / Inferred / Out) for visibility, then **end the turn**. Do NOT call the Write tool in the same turn as the synthesis emission. On the user's next message: if it's an acknowledgment, follow-up, or any non-correcting input, proceed to Phase 3 doc-write; if it indicates a correction (push-back on an Inferred bullet, scope adjustment), revise the synthesis and emit again. Lighter than full Phase 2.5 (no blocking-tool menu, no formal confirm option) but still gives the user a real interruption window before the doc lands. ce-brainstorm sits early in the workflow; a wrong-doc has downstream consequence (feeds ce-plan, then implementation), so the turn boundary is justified even on the fast path.
 
 ### Phase 3: Capture the Requirements
 

diff --git a/plugins/compound-engineering/skills/ce-brainstorm/references/handoff.md b/plugins/compound-engineering/skills/ce-brainstorm/references/handoff.md
@@ -8,7 +8,7 @@ This content is loaded when Phase 4 begins — after the requirements document i
 
 The Phase 4 menu's visible option count varies by state: no requirements doc hides the review and Proof options, unresolved `Resolve Before Planning` hides `Plan implementation` and `Build it now`, a failing direct-to-work gate hides `Build it now`. Count the visible options for the current state and choose the rendering mode accordingly:
 
-- **4 or fewer visible:** use the platform's blocking question tool (`AskUserQuestion` in Claude Code — call `ToolSearch` with `select:AskUserQuestion` first if its schema isn't loaded; `request_user_input` in Codex; `ask_user` in Gemini, `ask_user` in Pi (requires the `pi-ask-user` extension)). This is the default.
+- **4 or fewer visible:** use the platform's blocking question tool per Interaction Rule 4 in SKILL.md. This is the default.
 - **5 or more visible:** render as a numbered list in chat. This is the narrow option-overflow fallback; trimming would hide legitimate choices (plan, review, Proof, build, refine, pause are all distinct destinations). Include a hint that free-form input is accepted ("Pick a number or describe what you want.") so the numbered list retains the blocking tool's open-endedness.
 
 Never silently skip the question.

diff --git a/plugins/compound-engineering/skills/ce-brainstorm/references/rigor-probes.md b/plugins/compound-engineering/skills/ce-brainstorm/references/rigor-probes.md
@@ -0,0 +1,53 @@
+# Rigor Probes
+
+Loaded when Phase 1.2 (Product Pressure Test) needs Standard / Deep / Deep-product gap definitions, or when Phase 1.3 needs example probe wordings. The behavioral contract — probes fire before Phase 2, one probe per gap, prose not menus, attachment last — lives in `SKILL.md`. This file holds the **content** of the gap lenses and the **wording** of the probes.
+
+The Lightweight set lives entirely in SKILL.md (three short questions); skip this file for Lightweight scope.
+
+---
+
+## Phase 1.2 — Gap definitions by tier
+
+### Standard — scan for these gaps
+
+- **Evidence gap.** The opening asserts want or need, but doesn't point to anything the would-be user has already done — time spent, money paid, workarounds built — that would make the want observable. When present, ask for the most concrete thing someone has already done about this.
+
+- **Specificity gap.** The opening describes the beneficiary at a level of abstraction where the agent couldn't design without silently inventing who they are and what changes for them. When present, ask the user to name a specific person or narrow segment, and what changes for that person when this ships.
+
+- **Counterfactual gap.** The opening doesn't make visible what users do today when this problem arises, nor what changes if nothing ships. When present, ask what the current workaround is, even if it's messy — and what it costs them.
+
+- **Attachment gap.** The opening treats a particular solution shape as the thing being built, rather than the value that shape is supposed to deliver, and hasn't been examined against smaller forms that might deliver the same value. When present, ask what the smallest version that still delivers real value would look like.
+
+Plus these synthesis questions — not gap lenses, product-judgment the agent weighs in its own reasoning:
+
+- Is there a nearby framing that creates more user value without more carrying cost? If so, what complexity does it add?
+- Given the current project state, user goal, and constraints, what is the single highest-leverage move right now: the request as framed, a reframing, one adjacent addition, a simplification, or doing nothing?
+
+Favor moves that compound value, reduce future carrying cost, or make the product meaningfully more useful or compelling. Use the result to sharpen the conversation, not to bulldoze the user's intent.
+
+### Deep — Standard plus
+
+- Is this a local patch, or does it move the broader system toward where it wants to be?
+
+### Deep — product — Deep plus
+
+- **Durability gap.** The opening's value proposition rests on a current state of the world that may shift in predictable ways within the horizon the user cares about. When present, ask how the idea fares under the most plausible near-term shifts — and push past rising-tide answers every competitor could make.
+
+- What adjacent product could we accidentally build instead, and why is that the wrong one?
+- What would have to be true in the world for this to fail?
+
+These questions force an explicit product thesis and feed the Scope Boundaries subsections ("Deferred for later" and "Outside this product's identity") and Dependencies / Assumptions in the requirements document.
+
+---
+
+## Phase 1.3 — Example probe wordings
+
+One example per gap. Adapt phrasing to dialogue context; the wording shape matters more than the literal text.
+
+- *evidence* — "What's the most concrete thing someone's already done about this — paid, built a workaround, quit a tool over it?"
+- *specificity* — "Can you name a team you've actually watched hit this, or are you reasoning?"
+- *counterfactual* — "What do teams do today when this breaks — who reconciles?"
+- *attachment* — "Before we move to shapes or approaches — what's the smallest version that would still prove the bet right, and what's excluded?"
+- *durability* — "Under the most plausible near-term shifts, how does this bet hold?"
+
+If an answer reveals genuine uncertainty, record it as an explicit assumption in the requirements document rather than skipping the probe.
diff --git a/plugins/compound-engineering/skills/ce-brainstorm/references/synthesis-summary.md b/plugins/compound-engineering/skills/ce-brainstorm/references/synthesis-summary.md
@@ -58,7 +58,7 @@ Based on our dialogue and approach selection, here's the scope I'm proposing for
 Does this match your intent? Tell me what to add, remove, redirect, or that I got wrong — or just confirm to proceed. (You can rebut even if my synthesis accurately reflects what you said earlier — you may have changed your mind, surfaced new context, or want to correct an unstated assumption.)
 ```
 
-Use prose for the user response (no `AskUserQuestion` menu). The justification is Interaction Rule 5(a) in SKILL.md — option sets bias the answer by signaling which dimensions matter.
+Use prose for the user response (no blocking-tool menu). The justification is Interaction Rule 5(a) in SKILL.md — option sets bias the answer by signaling which dimensions matter.
 
 ---
 
@@ -101,7 +101,7 @@ The test: a scanner reading a bullet should affirm or reject it on product / sco
 
 Track which Stated/Inferred/Out items the user touched per round. The soft-cut blocking question fires **only when the same item is revised twice** (or a third-round revision targets an item already revised in round two). New-item revisions across rounds proceed without limit — revising different aspects of a wrong synthesis (e.g., user pushed back on Stated, then on Inferred) is exactly what the mechanism should support.
 
-When the soft-cut fires, use the platform's blocking question tool (`AskUserQuestion` in Claude Code, `request_user_input` in Codex, `ask_user` in Gemini, `ask_user` in Pi) with two options:
+When the soft-cut fires, use the platform's blocking question tool per Interaction Rule 4 in SKILL.md, with two options:
 
 - `Proceed with the current revised synthesis`
 - `Stop and redirect — discuss further before writing the doc`
@@ -112,7 +112,7 @@ Fall back to a numbered list in chat only when no blocking tool exists or the ca
 
 ## Headless mode
 
-When the skill is invoked from an automated workflow such as LFG or any `disable-model-invocation` context, the skill runs in non-interactive mode (no synchronous user). This does NOT mean unaudited — the artifact is read by downstream skills (ce-doc-review, ce-plan) and human reviewers (PR review). Audit shifts from chat history to the artifact itself.
+When the skill runs in a non-interactive context — no synchronous user, e.g., invoked from an automated workflow like LFG or from a skill with model-self-invocation disabled (`disable-model-invocation: true` in Claude Code) — Phase 2.5 still fires but without a confirmation step. This does NOT mean unaudited — the artifact is read by downstream skills (ce-doc-review, ce-plan) and human reviewers (PR review). Audit shifts from chat history to the artifact itself.
 
 Behavior:
 

diff --git a/...compound-engineering/skills/ce-brainstorm/references/universal-brainstorming.md b/...compound-engineering/skills/ce-brainstorm/references/universal-brainstorming.md
@@ -53,7 +53,7 @@ When the conversation has enough material to narrow — reflect back what you've
 
 **Always synthesize a summary in the chat.** Before offering any next steps, reflect back what emerged: key decisions, the direction chosen, open threads, and any assumptions made. This is the primary output of the brainstorm — the user should be able to read the summary and know what they landed on.
 
-**Then offer next steps** using the platform's blocking question tool: `AskUserQuestion` in Claude Code (call `ToolSearch` with `select:AskUserQuestion` first if its schema isn't loaded), `request_user_input` in Codex, `ask_user` in Gemini, `ask_user` in Pi (requires the `pi-ask-user` extension). Fall back to numbered options in chat only when no blocking tool exists in the harness or the call errors (e.g., Codex edit modes) — not because a schema load is required. Never silently skip the question.
+**Then offer next steps** using the platform's blocking question tool per Interaction Rule 4 in SKILL.md. Never silently skip the question.
 
 **Question:** "Brainstorm wrapped. What would you like to do next?"