Skip to content

hotfix: plan-time retry on hallucinated skill names#41

Merged
AVADSA25 merged 1 commit intomainfrom
fix/plan-time-retry
May 4, 2026
Merged

hotfix: plan-time retry on hallucinated skill names#41
AVADSA25 merged 1 commit intomainfrom
fix/plan-time-retry

Conversation

@AVADSA25
Copy link
Copy Markdown
Owner

@AVADSA25 AVADSA25 commented May 4, 2026

Reproducer (real, today)

User dropped:

"Read all markdown files in ~/codec-repo/docs/ and create an index.md that lists each file with its first heading and a one-line description"

Result:

Plan failed: plan invalid: plan references unknown skills: ['file_read']

The user never even reached approve/reject. Plan validation rejected the draft because Qwen invented `file_read` (the actual skill is `file_ops`).

Why PR #35 didn't catch it

PR #35 added a single-shot correction-nudge retry inside `codec_agent_runner._execute_checkpoint` — that handles hallucinations at execution time (skill / write_path / read_path / domain). But validation lives earlier, in `codec_agent_plan.draft_plan` → `validate_plan_skills`. The plan is rejected before it's even saved, so the runner never gets a chance to retry.

Fix (mirrors PR #35 one layer up)

  1. After `validate_plan_skills` returns `ok=False`, build a corrective prompt with:
    • The missing skill names
    • The FULL allowed registry list (closed-world choice)
    • The three most common confusions: `file_read`→`file_ops`, `fetch_url`→`web_fetch`, `read_file`→`file_ops`
  2. Re-call Qwen ONCE with the appended correction.
  3. Re-validate. Success → use the corrected plan. Failure → raise with both attempts in the message so the user sees consistent confusion (vs a one-off transient miss).
  4. If the retry call itself flakes, surface the original validation error.

Also strengthen `_PLAN_SYSTEM_PROMPT` with the same three confusion hints so the FIRST draft is more likely to succeed (cuts the retry rate).

Tests (3 new)

  • `test_draft_plan_retries_on_hallucinated_skill_then_succeeds` — reproduces the exact user case (file_read → file_ops)
  • `test_draft_plan_retry_also_fails_raises_with_both_attempts` — both attempts miss, message contains both
  • `test_draft_plan_retry_qwen_unavailable_surfaces_original_error` — retry call raises ConnectionError, original validation error surfaces

All 3 existing `draft_plan` tests still pass — backward-compat preserved. `test_draft_plan_rejects_unknown_skill` now exercises BOTH attempts (fake returns same bad plan twice) and still raises with the missing skill in the message.

Tally: 35/35 file tests pass + 7 pre-existing pynput env failures (unchanged on baseline, unrelated to this PR).

Test plan after merge

  • Drop the markdown-index project again — expected: plan succeeds with `file_ops`, agent runs, index.md written
  • Drop the forex briefing again — expected: same fix path also helps if Qwen ever drafts it with hallucinated names

🤖 Generated with Claude Code

User repro 2026-05-04 09:58:
  Project: "Read all markdown files in ~/codec-repo/docs/ and create
           an index.md that lists each file with its first heading and
           a one-line description"
  Result:  Plan failed: plan invalid: plan references unknown skills:
           ['file_read']

Same hallucination CLASS as PR #35 but at a different LAYER.
PR #35 fixed retries during execution (codec_agent_runner). This is
failing earlier — at plan validation, before the plan is even saved.
The user never even got to the approve-or-reject step.

Root cause:
Qwen drafts plans naming skills that don't exist. `file_read` and
`fetch_url` are the two we've seen. The actual file-reading skill is
`file_ops` (which reads, writes, appends, lists). The actual URL fetch
is `web_fetch`. The user-visible result was the same as PR #35 —
project mode dies before doing anything useful.

Fix (mirrors PR #35's pattern at planning layer):
1. After validate_plan_skills returns ok=False, instead of raising,
   build a corrective prompt listing the missing skills, the FULL
   allowed registry, and the three most common confusions
   (file_read→file_ops, fetch_url→web_fetch, read_file→file_ops).
2. Re-call _qwen_chat ONCE with the appended correction.
3. Re-validate the second draft. If valid, use it. If not, raise
   with BOTH attempts in the message so the user sees the model is
   consistently confused (vs a one-off transient miss).
4. If the retry call itself fails (Qwen flakes between attempts),
   raise with the ORIGINAL validation error — more diagnostic than
   "qwen flaked on retry".

Also:
- Strengthen _PLAN_SYSTEM_PROMPT with the same three confusion hints
  so the FIRST draft is more likely to succeed (cuts the retry rate).

Tests (3 new in tests/test_agent_plan.py — all pass):
- test_draft_plan_retries_on_hallucinated_skill_then_succeeds
  Reproduces the exact user case: file_read on attempt 1, file_ops
  on attempt 2, plan succeeds.
- test_draft_plan_retry_also_fails_raises_with_both_attempts
  Both attempts hallucinate (file_read, then read_file): error
  message contains both for diagnostic value.
- test_draft_plan_retry_qwen_unavailable_surfaces_original_error
  Retry call raises ConnectionError: original validation error
  surfaces with "retry failed" appended.

All 3 existing draft_plan tests still pass — backward-compat preserved.
The existing test_draft_plan_rejects_unknown_skill now exercises BOTH
attempts (fake_qwen_chat returns same bad plan each time) and still
raises with the missing skill in the message.

Total: 35/35 file pass + 7 pre-existing pynput env failures (unchanged).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@AVADSA25 AVADSA25 merged commit 028550d into main May 4, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants