feat: add test-both override for empirical deadlock resolution#1
Open
claytona500 wants to merge 1 commit intopeteromallet:mainfrom
Open
feat: add test-both override for empirical deadlock resolution#1claytona500 wants to merge 1 commit intopeteromallet:mainfrom
claytona500 wants to merge 1 commit intopeteromallet:mainfrom
Conversation
When the critique loop stagnates (ESCALATE), the only options today are add-note, force-proceed, or abort — all of which punt the decision to the human without evidence. This adds a test-both override that invokes a judge agent to evaluate the current plan against an alternative approach, then renders a verdict (approach_a, approach_b, or synthesis) based on empirical assessment. Changes: - New test-both.json schema for structured judge output - Judge prompt in prompts.py that evaluates both approaches against unresolved flags - _override_test_both handler in cli.py with full state machine integration - Mock worker for test-both in workers.py - Default agent routing (claude) in _core.py - Updated infer_next_steps to surface test-both for ESCALATE/ABORT - Documentation in instructions.md - 15 new tests covering all verdict paths, state transitions, and schema Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
When the critique loop hits ESCALATE, the current options are
add-note,force-proceed, orabort— all of which punt the decision to the human without evidence. This PR adds atest-bothoverride that breaks deadlocks empirically:approach_a,approach_b, orsynthesisMotivation
Inspired by adversarial convergence patterns where competing approaches are tested empirically rather than debated endlessly. When the same critique concerns recur across iterations and neither force-proceed nor add-note resolves the impasse,
test-bothgives the orchestrator an evidence-based path forward.Changes
schemas.py— Newtest-both.jsonschema with structured approach comparison and verdict enumprompts.py— Judge prompt that evaluates both approaches against unresolved flagsworkers.py— Mock worker, schema filename mapping, session key fortest-bothstep_core.py— Default agent routing (claude) fortest-bothcli.py—_override_test_bothhandler with full state machine integration; updatedinfer_next_stepsand argparse choicesinstructions.md— Documentation for the new override optiontests/test_test_both.py— 15 new tests covering all verdict paths, state transitions, schema, and mocktests/test_megaplan.py,tests/test_schemas.py— Updated existing parametrized testsUsage
Test plan
test-bothonly available from EVALUATED state with ESCALATE/ABORT recommendation🤖 Generated with Claude Code