feat: add test-both override for empirical deadlock resolution by claytona500 · Pull Request #1 · peteromallet/megaplan

claytona500 · 2026-03-21T18:42:12Z

Summary

When the critique loop hits ESCALATE, the current options are add-note, force-proceed, or abort — all of which punt the decision to the human without evidence. This PR adds a test-both override that breaks deadlocks empirically:

Invokes a judge agent to evaluate the current plan (approach A) against an alternative approach (approach B) that addresses the unresolved flags
The judge renders a structured verdict: approach_a, approach_b, or synthesis
The verdict determines the next step: approach_a wins → gate, approach_b/synthesis → integrate with the judge's recommendations

Motivation

Inspired by adversarial convergence patterns where competing approaches are tested empirically rather than debated endlessly. When the same critique concerns recur across iterations and neither force-proceed nor add-note resolves the impasse, test-both gives the orchestrator an evidence-based path forward.

Changes

schemas.py — New test-both.json schema with structured approach comparison and verdict enum
prompts.py — Judge prompt that evaluates both approaches against unresolved flags
workers.py — Mock worker, schema filename mapping, session key for test-both step
_core.py — Default agent routing (claude) for test-both
cli.py — _override_test_both handler with full state machine integration; updated infer_next_steps and argparse choices
instructions.md — Documentation for the new override option
tests/test_test_both.py — 15 new tests covering all verdict paths, state transitions, schema, and mock
tests/test_megaplan.py, tests/test_schemas.py — Updated existing parametrized tests

Usage

megaplan override test-both --plan <name> --reason "critique loop stagnated"

Test plan

All 15 new tests pass
All 289 existing tests pass (274 original + 15 new)
test-both only available from EVALUATED state with ESCALATE/ABORT recommendation
All three verdict paths (approach_a, approach_b, synthesis) produce correct state transitions
History entry, override metadata, and artifacts written correctly
Manual test with real agents on a stagnated plan

🤖 Generated with Claude Code

When the critique loop stagnates (ESCALATE), the only options today are add-note, force-proceed, or abort — all of which punt the decision to the human without evidence. This adds a test-both override that invokes a judge agent to evaluate the current plan against an alternative approach, then renders a verdict (approach_a, approach_b, or synthesis) based on empirical assessment. Changes: - New test-both.json schema for structured judge output - Judge prompt in prompts.py that evaluates both approaches against unresolved flags - _override_test_both handler in cli.py with full state machine integration - Mock worker for test-both in workers.py - Default agent routing (claude) in _core.py - Updated infer_next_steps to surface test-both for ESCALATE/ABORT - Documentation in instructions.md - 15 new tests covering all verdict paths, state transitions, and schema Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add test-both override for empirical deadlock resolution#1

feat: add test-both override for empirical deadlock resolution#1
claytona500 wants to merge 1 commit intopeteromallet:mainfrom
claytona500:feat/test-both-override

claytona500 commented Mar 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

claytona500 commented Mar 21, 2026

Summary

Motivation

Changes

Usage

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant