Skip to content

fix(levers): fail loud when constraint check rejects every lever#766

Open
neoneye wants to merge 3 commits into
mainfrom
fix/levers-fail-loud-on-contradictory-constraints
Open

fix(levers): fail loud when constraint check rejects every lever#766
neoneye wants to merge 3 commits into
mainfrom
fix/levers-fail-loud-on-contradictory-constraints

Conversation

@neoneye
Copy link
Copy Markdown
Member

@neoneye neoneye commented May 29, 2026

Problem

In production, the pipeline failed partway through (≈20 of 125 files) with a misleading error two stages downstream:

TriageLeversTask -> ValueError: No input levers to deduplicate.
FocusOnVitalFewLeversTask -> ValueError: No valid enriched levers were provided.

Root cause is one stage earlier. IdentifyPotentialLevers.execute() generated levers fine, but the per-lever ConstraintChecker rejected every lever, so it silently wrote an empty potential_levers.json ([]). PotentialLeversTask "succeeded", and the empty list only blew up downstream in TriageLeversTask. Resume re-read the cached empty file, so it never recovered.

The trigger is a self-contradictory prompt: a negative constraint that bans the plan's own core subject. In the observed run the plan was explicitly about "AI agents" yet listed AI among its banned words — so all 29 generated levers were rejected by "Do not use AI".

Not a regression, and unrelated to recent dependency bumps.

Fix

  • Add raise_if_no_levers_survived(levers_cleaned, all_constraint_checks). When constraint checking removes every lever, it now fails loud at the source with an actionable reason that names the dominant constraint(s), e.g.:

    All 29 generated levers were rejected by the constraint check, leaving none for the downstream tasks. Most rejections came from: "Do not use AI" (rejected 29 lever(s))… This usually means a negative constraint contradicts the plan's core subject (e.g. banning "AI" in a plan that is explicitly about AI agents). Revise the plan's banned words / negative constraints so they do not exclude the plan's main topic.

  • Counts distinct levers per constraint (a lever may list the same constraint twice) so the "rejected N lever(s)" figure can't exceed the number generated.

  • Leaves a TODO to detect such contradictions earlier (constraint extraction / redline gate), before tokens are spent generating levers guaranteed to be rejected.

Tests

New worker_plan_internal/lever/tests/test_identify_potential_levers.py covering: happy path (levers survive → no raise), all-rejected (names dominant constraint, reports generated count), dominant-constraint ranking, and the no-violation-data fallback.

Verified the shipped guard against the actual production constraint_checks data from the failing run (correctly names "Do not use AI", 29/29). Full pytest runs in CI/Docker (deps not available locally).

🤖 Generated with Claude Code

neoneye and others added 3 commits May 29, 2026 02:40
IdentifyPotentialLevers silently returned an empty lever list when the per-lever ConstraintChecker rejected all generated levers. PotentialLeversTask then "succeeded" with an empty potential_levers.json, and the failure surfaced two stages later as the misleading "No input levers to deduplicate" in TriageLeversTask (and the equivalent in FocusOnVitalFewLevers). Resume re-read the cached empty file, so it never recovered.

This is triggered by a self-contradictory prompt: a negative constraint that bans the plan's core subject (observed: a plan explicitly about "AI agents" that listed "AI" among its banned words, so all 29 generated levers were rejected by "Do not use AI").

Add raise_if_no_levers_survived(), which fails at the source and names the dominant constraint(s) responsible, plus a TODO to detect such contradictions earlier (constraint extraction / redline gate). Includes unit tests covering the happy path, the all-rejected case, dominant-constraint ranking, and the no-violation-data fallback.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Captures the "AI agents" plan that bans "AI" failure mode and the goal of detecting such contradictions up front (constraint extraction / redline gate), complementing the fail-loud guard from this PR.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant