Skip to content

chore: tighten uniform_failure suggestions after prefix-routing fix#67

Merged
jramos merged 1 commit into
mainfrom
chore/uniform-failure-suggestions-post-prefix-fix
May 23, 2026
Merged

chore: tighten uniform_failure suggestions after prefix-routing fix#67
jramos merged 1 commit into
mainfrom
chore/uniform-failure-suggestions-post-prefix-fix

Conversation

@jramos
Copy link
Copy Markdown
Owner

@jramos jramos commented May 23, 2026

Summary

Loop-close on PR #66. The saturation pre-flight's uniform_failure panel was telling users "validator appears too weak — try a stronger --closed-loop-agent-model" when the actual cause was usually the silent prefix-routing bug PR #66 just fixed. Now that the routing bug is fixed, the residual uniform_failure cases are more likely to be misconfiguration than capability, and the suggestion ordering should reflect that.

Changes

Suggestion list reordered:

  • Lead: state the observation neutrally (Baseline scored 0 on every behavioral task — GEPA has nothing to optimize for)
  • Then: first-line diagnostic pointing at the Stripped LiteLLM provider prefix run.log line (the exact signal PR fix: strip LiteLLM provider prefix before hermes -m #66 added)
  • Then: the capability/suite suggestions, framed as fallback

Panel title: Uniform failure — validator too weakUniform failure — closed-loop scored zero on every task. Observation, not diagnosis.

Test plan

  • uv run pytest tests/core/test_saturation_check.py -q — expect 33 passed.
  • uv run pytest -q — expect 1090 passed (no regressions).

The "validator appears too weak" suggestion was actively misleading
historically: hermes -m treated LiteLLM provider prefixes as openrouter
routing, breaking auth and returning 0-turn sessions that the framework
counted as task failures. Users (and reviewers) followed the suggestion
to bump model strength when the actual fix was routing.

Now that the routing bug is fixed (#66), the residual uniform_failure
cases are more likely to be misconfiguration than capability. Lead the
suggestion list with "first check the validator actually ran" and point
users at the run.log line that confirms routing.

Panel title softened from "validator too weak" to "closed-loop scored
zero on every task" — observation, not diagnosis.
@jramos jramos merged commit e460124 into main May 23, 2026
4 checks passed
@jramos jramos deleted the chore/uniform-failure-suggestions-post-prefix-fix branch May 23, 2026 01:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant