You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The architectural decision from the May session: Agent-as-MCP-tool, NOT LLM-caller-replacement. The xiaozhi-server LLM (or any MCP-host LLM) keeps the fast/cheap turns on the cheap path; expensive agentic loops are explicit tool calls into Hermes (or OpenClaw — #118).
This issue is the first proof that M3's runtime-neutrality thesis works. Per milestones-roadmap.md §4, M3 needs 3+ runtimes proving the same AgentKeys backend serves them all. Hermes is the first; OpenClaw (#118) is the second; Doubao (already in M2's #112) is the third.
NousResearch hermes-agent is MIT-licensed and battle-tested — self-improving learning loop, Honcho user modeling, FTS5 session search. Wrapping it as an MCP tool means any MCP host can invoke a full agentic runtime without having to embed it.
Scope (M3)
Deploy NousResearch Hermes-agent
One instance (single-region for v0); scale-out in M4 if vendor pilots demand it
Persistent storage for Hermes' session state (SQLite or Postgres; whichever the upstream defaults to)
Cost tracking: every Hermes step that calls an LLM increments the cost counter; constraint enforced server-side
Recursive composition
Hermes-agent uses AgentKeys MCP tools internally:
Read memory for context → agentkeys.memory.get
Check permissions for actions → agentkeys.permission.check
Append audit rows for each step → agentkeys.audit.append
This creates a two-layer audit trail: AgentKeys records "Hermes invoked"; Hermes-side audit records each step Hermes took inside the run. Both surface in #115's audit dashboard.
Out of scope (defer)
Multi-instance Hermes (M4 — single instance is enough for M3 proof)
Tuning Hermes' system prompts per vendor (M4)
Streaming responses (MCP spec supports it; defer until vendor demand)
Cross-Hermes-session memory sharing (M4 with delegation work)
Acceptance criteria
A xiaozhi-server LLM (M1's setup) successfully calls hermes.execute_task for a complex task ("plan my 3-day Chengdu trip with ¥5000 budget") and gets a result back
Hermes pulls memory via AgentKeys MCP for context — verified by audit-trail showing memory.get calls during the Hermes run
End-to-end latency for non-real-time tasks is tolerable (30-60s acceptable per the M3 sequencing — these are tasks, not chat turns)
max_duration_s constraint enforced: a deliberately-long task times out at the configured limit and returns a graceful timeout result + audit row
max_cost_usd constraint enforced: a deliberately-expensive task halts at the cost cap + audit row explaining why
Then xiaozhi-hermes-risks.md — verified risks against actual repo code with file:line citations
The architectural decision is in agent-iam-strategy.md — Hermes is a callable tool, not an LLM-caller replacement. Don't accidentally re-architect it as the LLM the xiaozhi-server calls; that destroys the cheap-path principle.
Context
The architectural decision from the May session: Agent-as-MCP-tool, NOT LLM-caller-replacement. The xiaozhi-server LLM (or any MCP-host LLM) keeps the fast/cheap turns on the cheap path; expensive agentic loops are explicit tool calls into Hermes (or OpenClaw — #118).
This issue is the first proof that M3's runtime-neutrality thesis works. Per
milestones-roadmap.md§4, M3 needs 3+ runtimes proving the same AgentKeys backend serves them all. Hermes is the first; OpenClaw (#118) is the second; Doubao (already in M2's #112) is the third.NousResearch hermes-agent is MIT-licensed and battle-tested — self-improving learning loop, Honcho user modeling, FTS5 session search. Wrapping it as an MCP tool means any MCP host can invoke a full agentic runtime without having to embed it.
Scope (M3)
Deploy NousResearch Hermes-agent
MCP server wrapping Hermes
hermes.execute_task(task, context, constraints)agent-iam-strategy.md"Hermes-as-MCP" discussion:X-AgentKeys-Actorheader pattern as Phase 1: AgentKeys MCP server — 7 active tools + 3 schema-only #107Recursive composition
Hermes-agent uses AgentKeys MCP tools internally:
agentkeys.memory.getagentkeys.permission.checkagentkeys.audit.appendThis creates a two-layer audit trail: AgentKeys records "Hermes invoked"; Hermes-side audit records each step Hermes took inside the run. Both surface in #115's audit dashboard.
Out of scope (defer)
Acceptance criteria
hermes.execute_taskfor a complex task ("plan my 3-day Chengdu trip with ¥5000 budget") and gets a result backmemory.getcalls during the Hermes runmax_duration_sconstraint enforced: a deliberately-long task times out at the configured limit and returns a graceful timeout result + audit rowmax_cost_usdconstraint enforced: a deliberately-expensive task halts at the cost cap + audit row explaining whyRisks
References
docs/spec/plans/milestones-roadmap.md§4 (M3 scope)docs/research/agent-iam-strategy.md— "Agent-as-MCP-tool" architectural decision + Hermes-as-MCP discussiondocs/research/xiaozhi-hermes-architecture.md— architecture diagrams + per-turn flowdocs/research/xiaozhi-hermes-risks.md— R1-R4 risk verification (latency, concurrency, cold-construction, gateway)Effort
~1-2 weeks. Sequencing:
Pickup notes for the next agent / developer
xiaozhi-hermes-architecture.mdfirst — the three-diagram explanation of where Hermes fitsxiaozhi-hermes-risks.md— verified risks against actual repo code with file:line citationsagent-iam-strategy.md— Hermes is a callable tool, not an LLM-caller replacement. Don't accidentally re-architect it as the LLM the xiaozhi-server calls; that destroys the cheap-path principle./agentkeys-issue-createskill for follow-up issues (e.g., per-runtime tuning, M4 multi-instance scaling)