fix(levers): fail loud when constraint check rejects every lever#766
Open
neoneye wants to merge 3 commits into
Open
fix(levers): fail loud when constraint check rejects every lever#766neoneye wants to merge 3 commits into
neoneye wants to merge 3 commits into
Conversation
IdentifyPotentialLevers silently returned an empty lever list when the per-lever ConstraintChecker rejected all generated levers. PotentialLeversTask then "succeeded" with an empty potential_levers.json, and the failure surfaced two stages later as the misleading "No input levers to deduplicate" in TriageLeversTask (and the equivalent in FocusOnVitalFewLevers). Resume re-read the cached empty file, so it never recovered. This is triggered by a self-contradictory prompt: a negative constraint that bans the plan's core subject (observed: a plan explicitly about "AI agents" that listed "AI" among its banned words, so all 29 generated levers were rejected by "Do not use AI"). Add raise_if_no_levers_survived(), which fails at the source and names the dominant constraint(s) responsible, plus a TODO to detect such contradictions earlier (constraint extraction / redline gate). Includes unit tests covering the happy path, the all-rejected case, dominant-constraint ranking, and the no-violation-data fallback. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Captures the "AI agents" plan that bans "AI" failure mode and the goal of detecting such contradictions up front (constraint extraction / redline gate), complementing the fail-loud guard from this PR. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
In production, the pipeline failed partway through (≈20 of 125 files) with a misleading error two stages downstream:
Root cause is one stage earlier.
IdentifyPotentialLevers.execute()generated levers fine, but the per-leverConstraintCheckerrejected every lever, so it silently wrote an emptypotential_levers.json([]).PotentialLeversTask"succeeded", and the empty list only blew up downstream inTriageLeversTask. Resume re-read the cached empty file, so it never recovered.The trigger is a self-contradictory prompt: a negative constraint that bans the plan's own core subject. In the observed run the plan was explicitly about "AI agents" yet listed
AIamong its banned words — so all 29 generated levers were rejected by"Do not use AI".Not a regression, and unrelated to recent dependency bumps.
Fix
Add
raise_if_no_levers_survived(levers_cleaned, all_constraint_checks). When constraint checking removes every lever, it now fails loud at the source with an actionable reason that names the dominant constraint(s), e.g.:Counts distinct levers per constraint (a lever may list the same constraint twice) so the "rejected N lever(s)" figure can't exceed the number generated.
Leaves a
TODOto detect such contradictions earlier (constraint extraction / redline gate), before tokens are spent generating levers guaranteed to be rejected.Tests
New
worker_plan_internal/lever/tests/test_identify_potential_levers.pycovering: happy path (levers survive → no raise), all-rejected (names dominant constraint, reports generated count), dominant-constraint ranking, and the no-violation-data fallback.Verified the shipped guard against the actual production
constraint_checksdata from the failing run (correctly names "Do not use AI", 29/29). Fullpytestruns in CI/Docker (deps not available locally).🤖 Generated with Claude Code