Skip to content

fix(ce-plan): compress synthesis confirmation to prose + call-outs#819

Merged
tmchow merged 2 commits into
mainfrom
tmchow/ce-plan-confirmation-noise
May 11, 2026
Merged

fix(ce-plan): compress synthesis confirmation to prose + call-outs#819
tmchow merged 2 commits into
mainfrom
tmchow/ce-plan-confirmation-noise

Conversation

@tmchow
Copy link
Copy Markdown
Collaborator

@tmchow tmchow commented May 11, 2026

Summary

The synthesis gate before research/plan-write now surfaces only the decisions worth weighing in on: a 1-3 line prose summary plus 0-3 "Call outs" instead of the full Stated/Inferred/Out audit. When the prose fully captures the scope with no forks to flag, the gate skips entirely and the agent announces it's auto-proceeding.

Before, the confirmation regularly produced 15+ bullets for a moderate plan, even when granularity rules were followed. The volume made approval feel like rubber-stamping rather than a real checkpoint.


What changed

The synthesis is now a two-stage shape:

  1. Internal three-bucket draft (Stated / Inferred / Out of scope): the agent's comprehensive thinking surface. Dissolves into plan body sections as before (Requirements, Key Technical Decisions or ## Assumptions in headless, Scope Boundaries).
  2. Chat-time presentation: prose summary plus 0-3 "Call outs," derived from the internal draft via a keep test.

Each candidate call-out passes an affirmability test (can the user evaluate this without reading code?) and one of four categories: real fork, non-obvious behavioral choice, non-obvious exclusion, or cheap-now-expensive-later correction. Mechanical bets and implementation-flow specifics are cut before reaching chat.


Design decisions

Cap is tiered with a hard 6 ceiling that triggers re-cut, not cap-raising. Lightweight tops out at 3, Standard at 4, Deep at 6. Above 6, the synthesis is misshapen. Usually 2-3 of the call-outs are sub-decisions of one larger fork, so the rule directs the agent to collapse to higher abstraction rather than raise the cap.

Conditional skip is allowed but never silent. When zero call-outs survive, the agent emits a mandatory "Planning: [prose]. No open decisions for you to weigh in on, proceeding to [next phase]" announcement. The "why" must be visible. A default-to-keep rule on borderline call-outs keeps the failure mode bounded.

The affirmability test surfaces in SKILL.md, not only in the reference. The Phase 0.7 and 5.1.5 stubs carry the test inline so it loads on every invocation, even when the reference loads shallow. References can be skipped; SKILL.md is always loaded.

Soft-cut tracks call-outs by decision dimension, not surface wording. Stage 2 re-derivation after a revision can rephrase or merge call-outs, so identity needs to be the underlying fork. When a re-cut collapses multiple call-outs into one, the combined call-out inherits the "touched" status of any constituent.

Headless mode is unchanged in routing. The internal draft still dissolves into Requirements / Assumptions / Scope Boundaries. Stage 2 is moot when there's no synchronous user.


Test plan

bun test and bun run release:validate pass.

Behavioral changes to skill prose are not exercised by automated tests. Verify by running /ce-plan on:

  • A trivial prompt (expect auto-proceed announcement, no gate)
  • A moderate prompt (expect prose plus 1-3 call-outs)
  • A complex prompt that previously produced the volume problem (expect compression to 2-3 call-outs, the rest dissolved into the plan body)

Compound Engineering
Claude Code

The synthesis gate before research/plan-write was producing too much
volume for users to weigh in on — full Stated/Inferred/Out buckets with
15+ bullets even when granularity rules were followed. Restructure into
a two-stage shape:

- Stage 1 (internal): the three-bucket draft the agent uses to think
  comprehensively; routes into plan body sections as before
- Stage 2 (chat-time): 1-3 line prose summary plus 0-3 "Call outs" —
  only the forks where another reasonable agent might choose differently

Add a keep test (affirmability + four categories: real fork, non-obvious
behavioral choice, non-obvious exclusion, cheap-now-expensive-later) that
gates each call-out. Cap is tiered by plan depth with a hard 6 ceiling —
above that, re-cut at higher abstraction rather than raising the cap.
When zero call-outs survive, skip the blocking question and emit a
mandatory auto-proceed announcement so the agent never proceeds silently.

Tighten granularity rules with an explicit anti-pattern list for call-outs
(file paths, flags, JSON shapes, HTTP codes, implementation flow) and
surface the affirmability test in the SKILL.md phase stubs so it loads
reliably. Soft-cut now tracks call-outs by decision dimension rather than
surface wording so re-cuts don't reset revision counts.

Headless mode unchanged in routing — internal draft still dissolves into
Requirements / Assumptions / Scope Boundaries.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 63a175bc06

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread plugins/compound-engineering/skills/ce-plan/references/synthesis-summary.md Outdated
- Align top-level call-out count statements with the tiered cap table.
  The two-stage shape paragraph, stage-2 structure list, and synthesis-
  as-plan-pitch anti-pattern all deferred to the table under
  How many call-outs are right? so the agent receives a single
  deterministic limit (Lightweight: 0-3, Standard: 1-4, Deep: 2-6).
@tmchow tmchow merged commit 60c1c93 into main May 11, 2026
2 checks passed
@github-actions github-actions Bot mentioned this pull request May 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant