fix: strip LiteLLM provider prefix before hermes -m by jramos · Pull Request #66 · jramos/agent-self-evolution

jramos · 2026-05-23T01:12:59Z

Summary

Closed-loop validation has been silently scoring 0/N for any user who passed a LiteLLM-formatted model string (e.g. openai/gpt-4o-mini) to --closed-loop-agent-model. The fix is one helper + a normalization step in HermesAgentRunner.__init__.

The bug

hermes -m <provider>/<model> interprets the prefix as openrouter-style routing — it silently switches the subprocess base_url to openrouter.ai. An OpenAI key in the user's hermes config isn't valid for openrouter, so the agent loop dies with no turn. The session JSON contains only the user message, and the framework counts the task as failed.

Repro on this machine:

$ hermes -m openai/gpt-5.4-mini -z "ping"
# (no stdout)
$ cat ~/.hermes/sessions/session_<latest>.json | jq '.messages|length, .model, .base_url'
1
"openai/gpt-5.4-mini"
"https://openrouter.ai/api/v1"  ← wrong

vs the bare-model version which works:

$ hermes -m gpt-5.4-mini -z "ping"
pong
$ jq '.messages|length, .model, .base_url' …
2
"gpt-5.4-mini"
"https://api.openai.com/v1"     ← correct

Impact

The framework's saturation pre-flight reported this as uniform_failure ("validator too weak — try a stronger model"), which is the wrong diagnosis — every model tested (nano, mini, gpt-5.4-mini, gpt-5.4) scored 0/7 because of the prefix bug, not because of capability. After the fix, the same probe with gpt-5.4-mini scores 7/7 and gpt-5-mini scores 6/7. The validator was working all along, just routed wrong.

This also means all prior evolve_tool / evolve_skill runs that passed --closed-loop-agent-model openai/… have been getting a contaminated closed-loop signal.

The fix

HermesAgentRunner.__init__ strips a known LiteLLM provider prefix (openai/, anthropic/, azure/, gemini/, cohere/, bedrock/, mistral/) from the model string and logs the transformation. Unknown prefixes pass through unchanged, so openrouter-style routing through an unrecognized vendor still works.

Test plan

uv run pytest -q — expect 1090 passed (+8 new tests across TestStripLitellmProviderPrefix and the new integration test in TestHermesAgentRunnerSubprocess).
Manual probe: uv run python -m evolution.tools.evolve_tool --tool write_file --manifest …/hermes-agent/tools/ --closed-loop-during-evolution evolution/validation/suites/write_file.jsonl --closed-loop-hermes-repo …/hermes-agent --closed-loop-agent-model openai/gpt-5.4-mini --iterations 1 — expect saturation panel to show Closed-loop (behavioral): 1.000 over 7 tasks and a Stripped LiteLLM provider prefix … log line.

Scope notes

Did not touch the saturation pre-flight's uniform_failure suggestions; those will be less misleading now that the underlying bug is fixed. A separate pass could tighten the wording (e.g., "check that --closed-loop-agent-model is reachable from your hermes config").
Did not add an explicit --closed-loop-agent-model-routing flag for openrouter users. If anyone hits issues with the strip, that becomes a follow-up.

Closed-loop validation has been silently scoring 0/N for any user who passed a LiteLLM-formatted model string (e.g. `openai/gpt-4o-mini`) to `--closed-loop-agent-model`. The hermes `-m` flag interprets `<provider>/<model>` as openrouter-style routing, which switches the subprocess base_url to openrouter.ai. An OpenAI key in the user's hermes config isn't valid for openrouter, so the agent loop dies with no turn and the framework reports it as `uniform_failure` ("validator too weak"), hiding the real cause. Strip known LiteLLM provider prefixes in HermesAgentRunner.__init__ so users get the behavior they expect from the model string they use everywhere else in the framework. Unknown prefixes pass through, so openrouter-style routing through an unrecognized vendor still works. Verified end-to-end: same probe that previously reported 0/7 (with the bug) now reports 7/7 with `gpt-5.4-mini` and 6/7 with `gpt-5-mini` — the validator was working all along, just routed wrong.

) The "validator appears too weak" suggestion was actively misleading historically: hermes -m treated LiteLLM provider prefixes as openrouter routing, breaking auth and returning 0-turn sessions that the framework counted as task failures. Users (and reviewers) followed the suggestion to bump model strength when the actual fix was routing. Now that the routing bug is fixed (#66), the residual uniform_failure cases are more likely to be misconfiguration than capability. Lead the suggestion list with "first check the validator actually ran" and point users at the run.log line that confirms routing. Panel title softened from "validator too weak" to "closed-loop scored zero on every task" — observation, not diagnosis.

jramos merged commit 5d8ed2e into main May 23, 2026
4 checks passed

jramos deleted the fix/hermes-runner-strip-provider-prefix branch May 23, 2026 01:25

jramos mentioned this pull request May 23, 2026

chore: tighten uniform_failure suggestions after prefix-routing fix #67

Merged

2 tasks

jramos mentioned this pull request May 23, 2026

test: weakened write_file fixture + ambiguous-task suite for Path E retro-validation #68

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: strip LiteLLM provider prefix before hermes -m#66

fix: strip LiteLLM provider prefix before hermes -m#66
jramos merged 1 commit into
mainfrom
fix/hermes-runner-strip-provider-prefix

jramos commented May 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jramos commented May 23, 2026

Summary

The bug

Impact

The fix

Test plan

Scope notes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant