Skip to content

fix: strip LiteLLM provider prefix before hermes -m#66

Merged
jramos merged 1 commit into
mainfrom
fix/hermes-runner-strip-provider-prefix
May 23, 2026
Merged

fix: strip LiteLLM provider prefix before hermes -m#66
jramos merged 1 commit into
mainfrom
fix/hermes-runner-strip-provider-prefix

Conversation

@jramos
Copy link
Copy Markdown
Owner

@jramos jramos commented May 23, 2026

Summary

Closed-loop validation has been silently scoring 0/N for any user who passed a LiteLLM-formatted model string (e.g. openai/gpt-4o-mini) to --closed-loop-agent-model. The fix is one helper + a normalization step in HermesAgentRunner.__init__.

The bug

hermes -m <provider>/<model> interprets the prefix as openrouter-style routing — it silently switches the subprocess base_url to openrouter.ai. An OpenAI key in the user's hermes config isn't valid for openrouter, so the agent loop dies with no turn. The session JSON contains only the user message, and the framework counts the task as failed.

Repro on this machine:

$ hermes -m openai/gpt-5.4-mini -z "ping"
# (no stdout)
$ cat ~/.hermes/sessions/session_<latest>.json | jq '.messages|length, .model, .base_url'
1
"openai/gpt-5.4-mini"
"https://openrouter.ai/api/v1"  ← wrong

vs the bare-model version which works:

$ hermes -m gpt-5.4-mini -z "ping"
pong
$ jq '.messages|length, .model, .base_url' …
2
"gpt-5.4-mini"
"https://api.openai.com/v1"     ← correct

Impact

The framework's saturation pre-flight reported this as uniform_failure ("validator too weak — try a stronger model"), which is the wrong diagnosis — every model tested (nano, mini, gpt-5.4-mini, gpt-5.4) scored 0/7 because of the prefix bug, not because of capability. After the fix, the same probe with gpt-5.4-mini scores 7/7 and gpt-5-mini scores 6/7. The validator was working all along, just routed wrong.

This also means all prior evolve_tool / evolve_skill runs that passed --closed-loop-agent-model openai/… have been getting a contaminated closed-loop signal.

The fix

HermesAgentRunner.__init__ strips a known LiteLLM provider prefix (openai/, anthropic/, azure/, gemini/, cohere/, bedrock/, mistral/) from the model string and logs the transformation. Unknown prefixes pass through unchanged, so openrouter-style routing through an unrecognized vendor still works.

Test plan

  • uv run pytest -q — expect 1090 passed (+8 new tests across TestStripLitellmProviderPrefix and the new integration test in TestHermesAgentRunnerSubprocess).
  • Manual probe: uv run python -m evolution.tools.evolve_tool --tool write_file --manifest …/hermes-agent/tools/ --closed-loop-during-evolution evolution/validation/suites/write_file.jsonl --closed-loop-hermes-repo …/hermes-agent --closed-loop-agent-model openai/gpt-5.4-mini --iterations 1 — expect saturation panel to show Closed-loop (behavioral): 1.000 over 7 tasks and a Stripped LiteLLM provider prefix … log line.

Scope notes

  • Did not touch the saturation pre-flight's uniform_failure suggestions; those will be less misleading now that the underlying bug is fixed. A separate pass could tighten the wording (e.g., "check that --closed-loop-agent-model is reachable from your hermes config").
  • Did not add an explicit --closed-loop-agent-model-routing flag for openrouter users. If anyone hits issues with the strip, that becomes a follow-up.

Closed-loop validation has been silently scoring 0/N for any user who
passed a LiteLLM-formatted model string (e.g. `openai/gpt-4o-mini`) to
`--closed-loop-agent-model`. The hermes `-m` flag interprets
`<provider>/<model>` as openrouter-style routing, which switches the
subprocess base_url to openrouter.ai. An OpenAI key in the user's hermes
config isn't valid for openrouter, so the agent loop dies with no turn
and the framework reports it as `uniform_failure` ("validator too weak"),
hiding the real cause.

Strip known LiteLLM provider prefixes in HermesAgentRunner.__init__ so
users get the behavior they expect from the model string they use
everywhere else in the framework. Unknown prefixes pass through, so
openrouter-style routing through an unrecognized vendor still works.

Verified end-to-end: same probe that previously reported 0/7 (with the
bug) now reports 7/7 with `gpt-5.4-mini` and 6/7 with `gpt-5-mini` — the
validator was working all along, just routed wrong.
@jramos jramos merged commit 5d8ed2e into main May 23, 2026
4 checks passed
@jramos jramos deleted the fix/hermes-runner-strip-provider-prefix branch May 23, 2026 01:25
jramos added a commit that referenced this pull request May 23, 2026
)

The "validator appears too weak" suggestion was actively misleading
historically: hermes -m treated LiteLLM provider prefixes as openrouter
routing, breaking auth and returning 0-turn sessions that the framework
counted as task failures. Users (and reviewers) followed the suggestion
to bump model strength when the actual fix was routing.

Now that the routing bug is fixed (#66), the residual uniform_failure
cases are more likely to be misconfiguration than capability. Lead the
suggestion list with "first check the validator actually ran" and point
users at the run.log line that confirms routing.

Panel title softened from "validator too weak" to "closed-loop scored
zero on every task" — observation, not diagnosis.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant