Skip to content

PIR protocol: plan-review + code-review gates for issue-driven work #691

@amrmelsayed

Description

@amrmelsayed

Issue 3 — PIR protocol: plan-review + code-review gates for issue-driven work

Problem

Codev's existing protocols cover two extremes for GitHub-issue–driven work:

  • BUGFIX / AIR — autonomous by design. Builder spawns, works alone, surfaces a PR. No human gates between investigation and PR. Review happens ON the PR.
  • SPIR / ASPIR — heavyweight, spec + plan file ceremony. Produces codev/specs/<N>-*.md and codev/plans/<N>-*.md files, with multi-agent review passes. Appropriate for new features from scratch, architectural changes, or work that deserves formal design artifacts.

There is no middle ground for work where the engineer's assessment is that either the approach or the implementation — or both — genuinely needs human checking before a PR exists. The need is driven by the nature of the change, not its size. PIR fills this gap.

When the existing options fail

Autonomous protocols fail when the reviewer must see the approach before the builder commits to it:

  • Root cause is ambiguous; multiple valid fixes exist and the reviewer wants to pick
  • Change touches unfamiliar or high-blast-radius code (shared utilities, auth, migrations, public APIs)
  • Design-sensitive work where picking the wrong abstraction early is expensive to undo at PR time

Autonomous protocols mean this review only happens after the builder has already implemented a full fix. Redirecting at PR time means throwing away work.

PR-time review fails when the code cannot be adequately verified by reading a diff. The classic case is mobile apps: a 10-line UI change affects Android, iOS, and potentially web. The PR diff shows you the lines changed, but it doesn't show you:

  • How the change looks on each platform
  • How it interacts with platform-specific behavior (keyboard, permissions, push notifications, device orientation)
  • Whether it preserves existing behavior across device sizes / OS versions
  • Cross-platform consistency

The same applies to:

  • UI/UX changes in any app — visual correctness, interaction flow, accessibility behavior require running the app, not just reading code
  • Hardware-adjacent changes — camera, microphone, sensors, Bluetooth, location
  • Integration with external services that don't mock cleanly (OAuth flows, payment SDKs, analytics)
  • User-journey changes that require exercising the full flow (multi-step forms, checkout, onboarding)
  • Performance-sensitive changes that need profiling on the running app
  • Accessibility changes (screen readers, keyboard navigation)

For any of these, discovering a regression at PR time is already too late — the reviewer either merges and hopes, or rejects after the builder has wasted effort.

What's needed

A protocol that lets the engineer, based on their own judgment about the work's nature, opt into:

  1. Plan review before implementation starts — reviewer sees the proposed approach, can redirect cheaply before code is written
  2. Code review AND testing before PR creation — reviewer can run the builder's branch locally on their own devices / browsers / hardware to verify behavior

Either alone is useful. Together they cover the full spectrum of cases where reading the PR diff at merge time is insufficient.

This protocol should NOT produce standalone spec/plan markdown files (those belong to SPIR). It should use GitHub issue comments as its artifacts so they remain part of the issue's discussion thread.

Proposed solution — new pir protocol

A new protocol pir (Plan / Implement / Review) that slots between AIR and SPIR in the existing ladder:

Protocol Phases Gates Use case
AIR implement → review review (PR) Autonomous — review happens on the PR
BUGFIX investigate → fix → pr review (PR) Autonomous — review happens on the PR
PIR plan → implement → review plan-approval + code-review + PR Engineer's judgment: work needs approach review before coding, and/or needs testing before PR (e.g., mobile / cross-platform / UI changes)
ASPIR specify → plan → implement → review review (PR) only New features, autonomous from spec
SPIR specify → plan → implement → review spec-approval + plan-approval + review New features, full supervision with file artifacts

PIR is structurally SPIR minus the specify phase: the GitHub issue is the implicit spec. The builder reads the issue, investigates the codebase, posts a structured plan as a GitHub issue comment, waits for approval, implements, posts a structured code-review summary as a second comment, waits for approval again, then creates the PR.

PIR is chosen by the engineer based on the nature of the work, not its size. A 10-line mobile UI tweak might warrant PIR. A 500-line internal refactor might not.

Phases

plan → implement → review
  • plan: builder reads the issue, investigates the codebase, writes a structured plan, posts it as a comment on the GitHub issue, waits at the plan-approval gate. Builder stays interactive (live Claude session in a PTY) — reviewer can type feedback directly, send via afx send, or comment on the issue; builder revises and re-posts plan.
  • implement: after plan approval, builder writes code, runs tests locally in the worktree, posts a structured code-review summary as a second comment on the issue, waits at the code-review gate. Same interactive feedback model.
  • review: after code-review approval, builder pushes branch, creates PR with Fixes #<N>, runs CMAP (3-way Gemini + Codex + Claude), notifies architect. Architect reviews PR, instructs merge. Builder merges with gh pr merge <M> --merge.

Gates

Two human-approved gates, plus the implicit PR merge gate inherited from BUGFIX pattern:

  1. plan-approval on the plan phase — reused from SPIR's existing gate machinery. Porch treats gate names as opaque strings; reuse is safe (gates are keyed by (project_id, gate_name)).
  2. code-review on the implement phase — new gate name.

Artifacts (posted as GitHub issue comments, NOT files)

Plan comment (at plan-approval):

## Proposed Plan — Issue #<N>
**Understanding**: <what the issue asks for; for a bug, the root cause>
**Proposed change**: <what the builder will do>
**Files to change**: <paths>
**Risks / alternatives considered**: <trade-offs; considered-and-rejected alternatives>
**Test plan**: <how to verify — for the reviewer to use at the code-review gate>

Code-review comment (at code-review):

## Code Review — Issue #<N>
**Summary**: <2–3 sentence overview of what was implemented>
**Files changed**: <from `git diff --stat main`, with +/- counts>
**Commits**: <from `git log main..HEAD --oneline`>
**Test results**: <what was run; pass/fail>
**Things to look at**: <tricky spots the reviewer should focus on>

**To review**: use the "View Diff" context menu in VSCode's Codev sidebar, or `git -C .builders/pir-<id> diff main`.
**To run locally**: use the "Run Dev Server" context menu, or `afx dev <builder-id>`.

No codev/plans/<N>-*.md or codev/reviews/<N>-*.md files. The GitHub issue is the source of truth; the issue comments are the durable artifacts.

Builder banners at each gate

The builder is a long-running interactive Claude session in a PTY pane managed by Tower. At each gate, the builder's prompt instructs it to stay alive with a banner:

At plan-approval:

═══════════════════════════════════════════════════════════
✋ Plan posted to issue #<N>. Awaiting review.

Review paths:
  • Terminal: porch approve <id> plan-approval --a-human-explicitly-approved-this
              (no explicit reject — use `afx send <id> "feedback"` to request changes)
  • VSCode:   Codev sidebar → Needs Attention → "#<N>" → Approve Gate (Cmd+K G)
              or Send Message (Cmd+K D) to request changes
  • Read plan: gh issue view <N> --comments

Or type feedback here / on the issue — I'll revise and re-post.
═══════════════════════════════════════════════════════════

At code-review:

═══════════════════════════════════════════════════════════
✋ Implementation complete. Awaiting code review.

Summary posted to issue #<N>.

Review paths (no manual cd needed):
  • View diff:       git -C .builders/pir-<id> diff main    |  VSCode: "View Diff"
  • Run the app:     afx dev <id>                            |  VSCode: "Run Dev Server"
  • Read summary:    gh issue view <N> --comments
  • Approve:         porch approve <id> code-review --a-human-explicitly-approved-this
                     |  VSCode: "Approve Gate" (Cmd+K G)
  • Request changes: afx send <id> "<feedback>"             |  VSCode: "Send Message" (Cmd+K D)
                     (builder revises; gate stays pending until approved)

Or type feedback here / on the issue — I'll iterate.
═══════════════════════════════════════════════════════════

Rejection mechanism (no formal reject command needed)

Porch has no reject command — it only has approve. Rejection in SPIR works implicitly: the human provides feedback (via file edits, afx send, or issue comments), the builder iterates its artifact, the human approves when satisfied. PIR uses the same pattern:

  • Reviewer wants changes: runs afx send <builder-id> "<feedback>" OR posts a comment on the GitHub issue OR types directly in the builder's PTY pane
  • Builder reads the feedback on its next iteration (via the prompt instructing it to re-check for messages and issue comments), revises its artifact (plan comment or code-review summary), re-posts the comment
  • Gate stays pending until the reviewer runs porch approve <id> <gate>

This matches how SPIR's spec-approval and plan-approval gates already work. No new porch command needed.

Serial review model (one dev env at a time)

The code-review gate assumes runnable worktrees (Issue 1) so the reviewer can test the change locally via afx dev <builder-id> or VSCode's "Run Dev Server" (Issue 2). Per the design rationale in Issue 1, only one dev env runs at a time — the worktree reuses main's ports and URLs. This preserves CORS, OAuth callbacks, cookie scoping, CSP, and webhooks, which would otherwise break if the worktree's dev ran on rewritten ports.

Parallel review (multiple dev envs side-by-side) requires deploy previews (Vercel/Netlify/Render) — out of Codev scope.

Porch integration — concrete details

PIR uses porch's existing gate machinery. This section itemizes what works as-is, what the SPIR spec phase must verify before coding, and what porch-side changes (if any) may be required.

What works without porch engine changes

  • Gate declaration in protocol.json"gate": "<name>" syntax on a phase, already supported. Copy SPIR's pattern at codev/protocols/spir/protocol.json.
  • Gate approval commandporch approve <id> <gate> --a-human-explicitly-approved-this. Unchanged; treats gate name as an opaque string.
  • Gate-pending notification to architect — porch already notifies on gate reach (per codev/protocols/spir/protocol.md:49). PIR inherits this automatically.
  • Gate rollbackporch rollback <id> <phase> exists as an escape hatch if a gate needs to be undone entirely (rare).
  • Protocol loadingloadProtocol('pir') at packages/codev/src/commands/porch/protocol.ts:22 auto-picks up the new codev/protocols/pir/ directory via directory-based discovery.

Open questions the spec phase MUST investigate and resolve

Each of these needs a concrete answer (with code verification) before the plan phase can design the prompts:

Q1. Is code-review as a new gate name accepted without any whitelist / schema change?

  • Action for spec phase: grep -r "plan-approval\|spec-approval" packages/codev/src/commands/porch/ and check whether gate names are referenced beyond protocol.json files (enums, validation schemas, type unions, docs that would need updating)
  • If there's a schema whitelist, code-review must be added to it
  • If purely data-driven, no change needed

Q2. How does an interactive builder detect that a gate was approved out-of-band?

  • Scenario: builder is sitting at plan-approval, displaying its banner, responsive to user input. User (or architect) runs porch approve <id> plan-approval from a different terminal. How does the builder's live Claude session notice?
  • Candidate mechanisms (spec phase must evaluate):
    • Polling: prompt instructs Claude to run porch status every N seconds — wasteful, but simple
    • User-prompted: user types "proceed" in the pane after approving; builder runs porch status to confirm, then advances
    • Signal-based: does porch approve write to a file the builder's prompt can watch? Check porch state storage at codev/projects/<id>/status.yaml
    • Existing pattern reuse: check how SPIR builders detect approval — they may already exit and rely on the restart loop, in which case PIR just inherits that (and the "interactive hold" is an illusion — Claude exits after posting, restarts, sees advanced state, moves on)
  • This is the single most important integration question. The entire banner/stay-alive design depends on the answer. Resolve before drafting prompts.

Q3. Does PHASE_COMPLETE with a pending gate correctly mark phase-complete-but-blocked in porch state?

  • Scenario: builder emits <signal>PHASE_COMPLETE</signal> after posting its plan comment. Does porch (a) advance to the next phase immediately, (b) mark phase complete but gate pending, or (c) error because the gate hasn't been approved?
  • Action: test against existing SPIR behavior — signal PHASE_COMPLETE on specify phase and observe whether porch advances without the spec-approval gate being run first
  • If PHASE_COMPLETE requires gate approval before being emitted, the prompt must not emit until after porch approve runs — meaning the builder does need to detect approval somehow (see Q2)
  • If PHASE_COMPLETE can be emitted while gate is pending (phase "complete but waiting"), the simpler design works

Q4. What does porch status output when a gate is pending?

  • Exact string/JSON format matters for:
    • Dashboard rendering (existing — verify it handles code-review gate state)
    • VSCode Needs Attention view (existing — same)
    • Builder's own introspection if it needs to check gate state
  • Action: run porch init pir <test-id> and porch status <test-id> after sim-emitting PHASE_COMPLETE; verify output format

Q5. Feedback-loop iteration and porch state

  • When the reviewer requests changes via afx send or issue comment, the builder revises its artifact and re-posts. Does the builder emit PHASE_COMPLETE again on each revision? Does porch distinguish "first attempt" from "revised attempt"?
  • Action: check if PHASE_COMPLETE is idempotent or if porch tracks emission count
  • If porch doesn't care, the prompt just tells the builder to re-post freely
  • If porch tracks, may need to adjust prompt to emit PHASE_COMPLETE only once per artifact cycle

Q6. Resumption behavior after builder crash

  • If the builder's Claude session crashes mid-feedback-loop, the Tower restart loop re-launches Claude with the same prompt file. The new Claude instance calls porch status, sees phase: plan, gate: plan-approval, pending. It must know to:
    • Re-read the GitHub issue (including comments posted since it last checked)
    • Re-read afx send messages that arrived while it was down
    • Check its own last-posted comment to decide whether to re-post or continue iterating
  • None of this is porch's responsibility directly — it's prompt design — but it interacts with porch's state ("phase X, gate pending" must be unambiguous on resumption)
  • Action for spec phase: ensure the plan.md and implement.md prompts include explicit resumption logic

Potential porch-side changes (only if spec-phase investigation reveals them)

List the changes that could be needed, for transparency. The spec phase will determine which (if any) are actually required.

Potential change Trigger condition Location
Register code-review in a gate-name allowlist If Q1 reveals a whitelist exists TBD via grep
Add a porch wait-for-approval <id> <gate> command If Q2 concludes polling is the only option and we want porch to own the poll loop packages/codev/src/commands/porch/ — new command module
Document PHASE_COMPLETE + gate semantics If Q3 behavior isn't obvious from existing docs codev/resources/protocol-format.md or equivalent
Expose gate state in a structured porch status --json format If Q4 reveals dashboard/VSCode need parsable output packages/codev/src/commands/porch/status.ts

If none of these are needed, PIR remains a pure data-only addition (protocol.json + prompts + skeleton mirror + spawn routing + docs). If some are needed, they get scoped into this issue (not deferred) — they're prerequisites for PIR to work end-to-end.

Spec-phase deliverable

Before the plan phase begins, the SPIR spec phase for this issue must answer Q1–Q6 with code-verified answers and document them in the spec. The plan phase designs the prompts and tests around those answers.

Files to create

Protocol directory — codev/protocols/pir/

codev/protocols/pir/
├── protocol.json          # phases + plan-approval on plan + code-review on implement
├── protocol.md            # workflow doc, comparison to BUGFIX/AIR/SPIR
├── builder-prompt.md      # top-level builder prompt (mirror of bugfix's, adjusted for PIR phases)
├── prompts/
│   ├── plan.md            # read issue, investigate, post plan comment, await feedback in interactive loop
│   ├── implement.md       # write code, test locally, post code-review summary, await review in interactive loop
│   └── review.md          # create PR, run CMAP, notify architect, merge on instruction
└── consult-types/
    ├── impl-review.md     # CMAP consult type for implementation review
    └── pr-review.md       # CMAP consult type for PR review

Skeleton mirror — codev-skeleton/protocols/pir/

Byte-identical mirror of the above. packages/codev/skeleton/protocols/pir/ is auto-regenerated by the copy-skeleton build script on next build.

Reference implementations: codev/protocols/bugfix/ for the autonomous cousin (investigate → fix → pr phase structure), codev/protocols/spir/ for the gate-at-plan pattern (see protocol.json for gate field syntax).

Files to modify

  • packages/codev/src/agent-farm/cli.ts — update --protocol option help text to include pir

  • packages/codev/src/agent-farm/commands/spawn.ts — route --protocol pir through the bugfix spawn path. One line in getSpawnMode:

    if (options.protocol === 'bugfix' || options.protocol === 'pir') return 'bugfix';

    PIR's spawn is issue-driven with a worktree — identical infrastructure to bugfix. Protocol-specific behavior lives entirely in the prompts, not in the spawn code.

  • CLAUDE.md — add pir to the Available Protocols list and a "Use PIR for" entry in the Protocol Selection Guide:

    ### Use PIR for (engineer-judged — based on the nature of the work, not its size):
    
    Pick PIR when ONE or BOTH of the following apply to a GitHub-issue-driven change:
    
    **1. The approach needs review before coding starts**:
    - Root cause is ambiguous; multiple valid fixes exist
    - Area is unfamiliar or high-blast-radius
    - Design-sensitive (affects conventions, patterns, architecture)
    - Cheaper to redirect at plan time than at PR time
    
    **2. The implementation needs to be TESTED before a PR is created**, because the PR diff alone is insufficient:
    - Mobile app changes (needs device testing on Android, iOS, possibly web)
    - UI/UX changes (visual inspection, interaction flow, accessibility)
    - Hardware-adjacent behavior (sensors, camera, permissions, notifications)
    - Integration with external services that don't mock cleanly
    - User journeys needing full-flow exercise
    - Performance-sensitive changes needing profiling on the running app
    
    **PIR uses GitHub Issues as source of truth.** Three phases: Plan (gated by plan-approval) → Implement (gated by code-review) → Review (PR). Artifacts are GitHub issue comments, not files. Lighter than SPIR (no spec/plan files) but more supervised than autonomous BUGFIX/AIR.
  • AGENTS.md — mirror the CLAUDE.md update (AGENTS.md standard compatibility)

  • packages/codev/src/__tests__/skeleton.test.ts — add pir to the expected protocol list if the test enumerates protocols

  • packages/codev/src/commands/porch/__tests__/protocol.test.ts — add a test asserting loadProtocol('pir') returns a protocol with a plan phase carrying plan-approval gate and an implement phase carrying code-review gate

Acceptance criteria

  1. Protocol loads: porch init pir 42 --worktree .builders/pir-42-test initializes a project with the PIR phases + gates. Porch state correctly shows plan phase with pending plan-approval gate.
  2. Spawn routing: afx spawn <N> --protocol pir spawns an issue-driven builder in .builders/pir-<N>-<slug>/. No error; uses the same spawn path as bugfix.
  3. Plan phase produces artifact: builder reads the issue, investigates, posts a structured plan comment to the GitHub issue via gh issue comment. Comment contains all five sections (Understanding, Proposed change, Files to change, Risks, Test plan).
  4. Plan gate pauses builder: after posting, builder emits PHASE_COMPLETE; porch marks plan-approval as pending. Builder's Claude session stays interactive, displays the banner. Dashboard + VSCode Needs Attention view show the builder as blocked at plan-approval.
  5. Plan feedback loop: reviewer runs afx send <builder-id> "<feedback>" OR posts an issue comment → builder reads, revises plan, re-posts updated comment. Gate stays pending.
  6. Plan approval advances: reviewer runs porch approve <id> plan-approval --a-human-explicitly-approved-this (or VSCode "Approve Gate") → builder transitions to implement phase on next poll.
  7. Implement phase produces code + code-review artifact: builder writes code, runs tests locally in the worktree, commits. Posts a structured code-review comment with Summary, Files changed, Commits, Test results, Things to look at. Emits PHASE_COMPLETE.
  8. Code-review gate pauses builder: porch marks code-review as pending. Builder stays interactive with the second banner. Dashboard + sidebar reflect the state.
  9. Code-review feedback loop: same mechanism as plan — afx send, issue comments, or direct terminal input. Builder revises and re-posts updated summary.
  10. Code-review approval advances: reviewer runs porch approve <id> code-review → builder transitions to review phase.
  11. Review phase creates PR: builder pushes branch, runs gh pr create with Fixes #<N>, runs CMAP 3-way review, updates PR description, notifies architect via afx send architect "PR #<M> ready (PIR #<N>)".
  12. Merge flow: architect approves PR → instructs builder to merge → builder runs gh pr merge <M> --merge (non-squash). Issue auto-closes via Fixes #<N>.
  13. Runnable worktree (requires Issue 1 merged): at the code-review gate, reviewer runs afx dev <builder-id> and the worktree's dev server starts on main's ports. Reviewer tests in browser.
  14. VSCode flow (requires Issue 2 merged): reviewer right-clicks the builder at code-review in the Codev sidebar → View Diff (opens native diff editor) → Run Dev Server (spawns Tower dev PTY, auto-swaps from previous builder's dev if any) → Approve Gate (Cmd+K G) advances.
  15. Skeleton mirror: pnpm build regenerates packages/codev/skeleton/protocols/pir/ from codev-skeleton/protocols/pir/.
  16. Tests: protocol load test passes; skeleton enumeration test includes pir.

Non-goals (explicitly out of scope)

  • Changes to BUGFIX, AIR, SPIR, ASPIR behavior — PIR is a pure sibling protocol
  • Porch engine changes — reusing existing gate mechanism; plan-approval gate name is already defined in SPIR, code-review is a new string but no engine code changes are needed
  • A codev/reviews/<N>-pir.md file — PIR doesn't produce a retrospective review file; the issue comments + PR body are the artifacts
  • A dedicated porch reject command — rejection is via the feedback-iterate pattern, matching SPIR
  • Claude Code plan mode (--permission-mode plan) — not used; PIR's approach is prompt-driven, not runtime-enforced
  • Harness abstraction changes — PIR's mechanism is protocol-level (prompts + porch state), works for any harness

Dependencies

  • Issue 1 merged — PIR's code-review gate is meaningless without runnable worktrees. Without Issue 1, the reviewer can't test the builder's changes; the gate degenerates to a diff-read, which is no better than what existing protocols offer at PR time.
  • Issue 2 merged — Without it, the VSCode review flow requires terminal trips (still workable via git -C + afx dev, but the sidebar-native UX that motivates PIR's existence isn't delivered).

PIR can technically ship without Issues 1+2, but the end-to-end review UX it promises requires them. Recommend shipping Issues 1 and 2 before this.

Benefits delivered

  • A new lightweight middleweight protocol for gated work on GitHub issues — fills the gap between autonomous (BUGFIX/AIR) and ceremonial (SPIR/ASPIR)
  • Two human gates without the overhead of markdown artifacts
  • Leverages the infrastructure from Issues 1 and 2 for testable + reviewable worktrees
  • Opt-in per spawn — existing protocols unchanged

Reference implementations / existing patterns

  • BUGFIX protocol structure: codev/protocols/bugfix/protocol.json, codev/protocols/bugfix/prompts/*.md, codev/protocols/bugfix/builder-prompt.md
  • SPIR's gate-at-plan pattern: codev/protocols/spir/protocol.json — shows "gate": "plan-approval" on the plan phase
  • Porch gate mechanism: packages/codev/src/commands/porch/protocol.ts (loadProtocol), packages/codev/src/commands/porch/state.ts (gate state)
  • Porch's notify-architect-on-gate behavior: documented at codev/protocols/spir/protocol.md:49"When a gate is reached, porch notifies the architect and blocks until approved" — PIR inherits this automatically
  • Spawn routing: getSpawnMode in packages/codev/src/agent-farm/commands/spawn.ts
  • Skeleton sync: copy-skeleton script in packages/codev/package.json

Risks and mitigations

  • Risk: PIR's builder-stays-interactive-at-gate pattern differs from how BUGFIX builders behave (BUGFIX exits cleanly between phases). Mitigation: the approach and implement prompts explicitly instruct Claude to stay alive after posting the artifact and signaling PHASE_COMPLETE. Tower's existing restart loop is a safety net — even if Claude exits, it'll restart and re-poll porch status.
  • Risk: The phrase "plan" in PIR may confuse users familiar with SPIR's codev/plans/<N>.md file convention. Mitigation: PIR's protocol.md explicitly documents that its plan is a GitHub issue comment, not a file; gate name (plan-approval) is reused but scoped by project id.
  • Risk: The code-review gate assumes the user will actually test the change via a runnable worktree. If they don't (or Issue 1 isn't configured with worktree.devCommand), the gate is just a diff-read. Mitigation: documented explicitly; users who don't want this step should use BUGFIX or AIR.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions