Skip to content

Enhancement: ce-deep-review-beta — verified cross-model deep review of high-stakes plans #878

@jaybna

Description

@jaybna

Enhancement: ce-deep-review-beta — verified cross-model deep review of high-stakes plans

Submitting this the way your contributions policy asks for — as an issue + an illustrative reference PR (#858), not a merge request. No need to approve the fork CI or merge anything on my account; flagging it in case it's useful for you (or your Claude/Codex review) to pick up, re-implement, or ignore.

What it is

A turnkey skill that runs a high-stakes plan through the Claude ce-doc-review panel, then — after one consent gate — fans the plan across non-Claude reviewer CLIs for decorrelated findings, verdict-tags each cross-model finding against the plan with a deterministic quote-grep backstop, and writes a reconciled verified <plan>.deep-review.md sidecar.

Pipeline:

  • Phase 0 — detect available arms (codex + agy; offline detection, no API calls, no secret leakage).
  • Phase 1 — Claude ce-doc-review panel (no egress); fail-stops if the panel didn't complete.
  • Phase 2 — single consent gate: gitleaks content preview + per-vendor opt-in whose option labels carry the egress verb (load-bearing — see below).
  • Phase 3 — dispatch only the consented arms across the same six lenses the panel uses, parallel across models; a deselected vendor is never sent the plan.
  • Phase 3.5 — deterministic quote-grep backstop assigns each finding CONFIRMED / NOT-FOUND-IN-DOC / NEEDS-HUMAN. It's authoritative and model-blind (the verdict never sees the producing model), so a model verifier can't inherit the confabulation it's meant to catch. CONFIRMED certifies the quoted evidence exists — not that the finding is correct.
  • Phase 4 — reconcile into the verified sidecar (data-loss-safe rotation; decision-changing union of panel + CONFIRMED cross-model findings).

Why it might interest you

  • It's a productized version of the "fan a plan across models and keep only the decorrelated, grounded findings" pattern, with the verifier-contamination failure mode designed out (deterministic backstop, not an LLM judge).
  • One thing the build surfaced that may be generally useful regardless of this skill: Claude Code's auto-mode permission classifier is consent-scope-keyed, not path-keyed. A cross-model egress dispatch is blocked even with allowed-tools set, unless the in-conversation consent is legible to the classifier — which is why the consent-gate option labels carry the egress verb + vendor (Send the plan to codex (OpenAI)) rather than a bare model name. Decision record is in the branch under docs/solutions/skill-design/.

Reference implementation

  • Branch / PR: feat(ce-deep-review): verified cross-model deep-review skill (RU1–RU6) + the eval harness behind it #858 (also includes the cross-model evaluation harness the skill was built on — a decision tool that measures whether cross-model review is worth shipping before building it; the harness's own finding on this question was inconclusive/underpowered, so it's offered as methodology, not a build recommendation).
  • It's a beta skill (-beta suffix + disable-model-invocation: true) — opt-in, never auto-fires.
  • Green locally (bun test, release:validate in sync); the automated Codex review on the PR is fully resolved.

Happy to split this into smaller issues, convert any part to a plain writeup, or close it if an issue isn't the surface you want for this. Thanks for making the plugin — it's what this was built on.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions