feat: Nous Portal OAuth refresh + agent_key minting#60
Merged
Conversation
The output/ directory holds per-run logs, evolved artifacts, and gate decisions — all generated, none committable. Previous gitignore pattern output/**/*.md only excluded markdown, leaving .log and .json files showing as untracked in every git status, which has been quiet noise across the recent provider work.
A user with provider: nous in ~/.hermes/config.yaml currently can't
actually use the framework — the resolver reads the OAuth access_token
from the credential pool and hands that to LiteLLM as the inference
Bearer, but Nous's inference endpoint requires the short-lived
agent_key (a separate credential minted via POST /api/oauth/agent-key).
This commit provides the LM subclass that handles the two-stage
credential model.
Mirrors hermes_cli/auth.py:3061-3193 (resolve_nous_runtime_credentials):
* Refresh OAuth access_token in-memory when within 120s of expiry
via POST {portal}/api/oauth/token (standard refresh_token grant
with client_id="hermes-cli")
* Mint a fresh agent_key when missing or within 120s of expiry via
POST {portal}/api/oauth/agent-key (Bearer access_token, ask for
1800s min TTL)
* Refresh-first-then-mint sequencing so a stale access_token doesn't
cause mint failures
* Mint 401 → refresh OAuth once and retry mint (Hermes pattern)
* Inference 401 → force re-mint and retry once (mid-run recovery)
Cross-instance state sharing via _SharedNousState keyed by initial
refresh_token. The four LM roles (optimizer, reflection, eval, judge)
share the lock + state, so a four-thread evolution doesn't trigger
four parallel mints (which would race the portal's single-use
refresh-token rotation and produce refresh_token_reused errors on
three of them).
In-memory only — no auth.json writeback. Long evolutions (>30 min on
a fresh agent_key) refresh + re-mint in-process; the on-disk store
stays at whatever `hermes model` last wrote. Avoids write-conflict
surface with concurrent Hermes sessions that may also be refreshing.
Error classification mirrors Hermes's own (auth.py:2595-2624):
invalid_grant/invalid_token + HTTP 401/403 from OAuth endpoint surface
HermesProviderError with `hermes model` recovery hint;
refresh_token_reused gets the special "another client consumed it"
message; mint failures translate similarly.
oauth_helpers.parse_iso_or_epoch handles both Nous's ISO 8601
expires_at and Codex's Unix epoch float — kept in a small standalone
module so the next OAuth provider has somewhere obvious to extend
without bloating either provider's LM file.
Adds a new branch in resolve_default_lm: when canonical == "nous" AND the auth.json pool entry has a refresh_token (signals OAuth-managed flow that hermes model writes), build a ResolvedLM whose lm_factory constructs a NousLM that handles the two-stage refresh + agent_key mint internally. Pool entries WITHOUT refresh_token (env-var-style NOUS_API_KEY users) fall through to the existing OpenAI-wire direct-pass-through path unchanged. Note that direct-pass-through path probably also doesn't work for Nous (the pool's access_token field holds the OAuth token, not the agent_key the inference endpoint needs) — but that's a pre-existing condition orthogonal to this PR. We don't try to "upgrade" those users silently. Missing pool entry → HermesProviderError pointing at `hermes model` recovery rather than silent fall-through to the broken direct path. instantiate_lm + the existing _probe_via_factory in auth_check (added in PR #58) both already dispatch on lm_factory presence — Nous flows through them unchanged. The Nous recovery hint is already in _HERMES_AUTH_COMMAND_BY_PROVIDER from the original auth-check work.
docs/model_resolution.md:
* New "Nous Portal OAuth + agent_key" section paralleling the Codex
section. Documents the two-stage credential model, the in-memory
refresh + mint flow, the HERMES_PORTAL_BASE_URL override knob, the
in-memory-only posture, and the env-var-fallthrough behavior for
pool entries without a refresh_token.
* Future-work list trimmed: removes the now-shipped non-Codex OAuth
bullet; rewords what remains so it's clear which providers are
intentionally out of scope (Qwen / Spotify / Gemini).
tests/manual/nous_smoke.py (NEW):
Runnable mock-server smoke that validates the Nous wire flow without
needing a real Nous Portal account. Spins up a stdlib http.server
pretending to be portal.nousresearch.com, drives a real NousLM (and
through it, a real LiteLLM call) against it, asserts on five
scenarios:
1. Initial mint: agent-key POST carries the seed access_token as
Bearer; min_ttl_seconds=1800 in body.
2. OAuth-expiring → refresh-then-mint: confirms call ordering and
that the mint POST uses the REFRESHED access_token (proves the
sequencing isn't backwards).
3. Inference uses the minted agent_key: the inference POST's
Authorization header is the MINTED key, not the OAuth token —
this is the headline bug the whole PR fixes.
4. Mid-run inference 401: forward's exception handler force-re-mints
and retries; the recorded HTTP exchange shows mint→infer(401)→
re-mint→infer(200) with two distinct minted keys.
5. OAuth refresh invalid_grant → HermesProviderError with the
`hermes model` recovery hint.
Smoke uses cache=False + num_retries=0 to expose the underlying
network behavior — DSPy cache would otherwise leak prior responses
across scenarios and LiteLLM's internal retry-on-401 would mask our
own re-mint logic. Comments explain both choices.
Not part of CI (heavyweight; spins up a server). Documented in the
Nous setup section as the recommended way to gain confidence in the
Nous flow when a real Portal account isn't available.
Review pass found a handful of silent-failure paths and one factually
wrong docstring; all addressed here.
oauth_helpers.parse_iso_or_epoch:
Reject bool, inf, nan, negative, and naive-datetime inputs that would
silently produce wrong epoch values:
* inf/nan: every skew check evaluates as "now >= inf" → False, so
the token would be treated as eternally fresh and never refreshed
* naive ISO: datetime.timestamp() interprets in the host's local TZ,
silently corrupting the skew window by hours on non-UTC hosts
* bool: subclass of int, would coerce True → 1.0 epoch seconds
Module docstring also rewritten to describe actual current consumer
(NousLM only — the previous "shared by Codex" claim was aspirational;
Codex parses expires_at inline as a raw float and doesn't import this).
nous_lm._refresh_oauth and _absorb_mint_response:
expires_in fields now reject bool explicitly — without the guard,
expires_in: True is accepted as 1 second (isinstance(True, int) is
True in Python), triggering perpetual re-mint storms. Also: when both
expires_at AND expires_in are absent or unusable, the code now logs a
warning before falling through to the conservative TTL floor, so a
portal protocol change that drops both fields is at least visible in
the run log instead of silently caching a key for 30 minutes.
nous_lm.NousLM.__init__:
HERMES_PORTAL_BASE_URL and NOUS_INFERENCE_BASE_URL env vars are now
read at instance time, not module-import time. Previously the docs
advertised both as "overridable" but if anything imported nous_lm
before the operator set the var, the override was silently ignored.
The manual smoke harness already worked around this; this fix makes
the documented behavior actually true.
nous_lm.NousLM.forward / aforward:
When the post-401 retry ALSO returns 401, wrap with HermesProviderError
that names the recovery action ("OAuth grant may have been revoked;
run hermes model"). Previously a bare litellm.AuthenticationError
propagated with no signal that recovery had been attempted.
nous_lm._SharedNousState.__post_init__:
Reject construction with agent_key set but agent_key_expires_at None
(or vice versa). The runtime path defensively treats this as "always
re-mint" — surfacing the construction-time mistake loudly is cheaper
than letting it cause silent re-mint storms in production.
hermes_provider._maybe_resolve_nous_lm:
When the credential pool entry has access_token but neither
refresh_token NOR agent_key, raise HermesProviderError pointing at
`hermes model` recovery. This is almost certainly a partial OAuth
setup (interrupted hermes model run) that would otherwise let the
caller fall through to direct pass-through and 401 against Nous's
inference endpoint with no breadcrumb. Pool entries with agent_key
set still fall through unchanged (genuine inference-only credentials).
Comment-rot cleanup:
Stripped six `hermes_cli/auth.py:NNNN-NNNN` line-number references
from nous_lm.py and docs/model_resolution.md. Replaced with symbol
references. Codex's nous_lm-equivalent (codex_lm.py) got this right
and Nous followed the wrong precedent; aligning now.
New test coverage (~20 cases):
Refresh + mint malformed-JSON paths, network-error wrapping (httpx
ConnectError), OAuth 403 / mint 403 status-code-triggers-relogin,
agent_key field-name alias, ISO expires_at parsing, bool expires_in
hits floor, async-path 401 recovery (sync had it; async test
previously mocked the thing under test), partial-OAuth-setup error
in the resolver, _SharedNousState __post_init__ guard, and the new
parse_iso_or_epoch rejection paths.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
provider: nousin~/.hermes/config.yamlcouldn't actually use the framework before this PR. The resolver read the OAuthaccess_tokenfrom the credential pool and handed it to LiteLLM as the inference Bearer, but Nous's inference endpoint requires the short-lived agent_key (a separate credential minted from the access_token). Inference would 401 silently.provider: nousthrough a newNousLMsubclass (evolution/core/nous_lm.py) that handles the two-stage credential model: refresh OAuth in-memory when the access_token is expiring, mint a fresh agent_key from it, re-mint on inference 401. Mirrors Hermes's ownresolve_nous_runtime_credentialsflow athermes_cli/auth.py:3061-3193._SharedNousStatekeyed by initial refresh_token. The four LM roles share the lock + state, so a four-thread evolution doesn't trigger four parallel mints (which would race the portal's single-use refresh-token rotation).refresh_token(env-var-styleNOUS_API_KEYsetups) fall through to the existing OpenAI-wire direct-pass-through path unchanged.output/to.gitignoreso per-run log dirs stop appearing as untracked in everygit status.Two-stage credential model
refresh_tokengrantPOST {portal}/api/oauth/tokenPOST {portal}/api/oauth/agent-keyNousLM.forwardorchestrates both: ensure-credentials at start (refresh OAuth if expiring → mint if agent_key expiring), call inference, on 401 force re-mint and retry once.Manual validation
tests/manual/nous_smoke.pyis a runnable wire-level smoke against a stdlibhttp.servermockingportal.nousresearch.com. Validates five scenarios:invalid_grant→HermesProviderErrorwithhermes modelrecovery hintAll five pass against the local mock. Not part of CI (heavyweight; spins up a server). Run via
uv run python tests/manual/nous_smoke.py. Documented in the Nous Portal setup section as the recommended validation path when a real Portal account isn't available — which is the case here, so this is the closest we can get to the real-credentials end-to-end smoke we did for Codex via the live spike.Out of scope
OAuthLMbase class extracted fromCodexLM+NousLM: with only two subclasses (and significantly different shapes — Codex needs Cloudflare headers, Nous needs agent-key minting) the abstraction would be premature. Defer until a third OAuth provider lands and the right cut becomes obvious.auth.jsonwriteback after refresh/mint: in-memory only, mirroring Codex. Long evolutions still need periodichermes modelto refresh the on-disk store.auxiliary.vision,auxiliary.web_extract,auxiliary.session_search): only Future-work item left after this PR.Test plan
tests/core/test_nous_lm.py(construction, mint timing, refresh+mint sequencing, 401-triggers-refresh-retry, mid-run inference 401 recovery, concurrent mint race, async path, error classification, ISO/epoch parsing); 8 intests/core/test_nous_provider.py(OAuth-flow detection, factory wiring, fallback paths, pool exhaustion respect); 2 oauth_helpers cases.Spike notes
Two recon rounds shaped this PR: