Skip to content

Phase 3: Hermes-MCP server (hermes.execute_task as MCP tool) #117

@hanwencheng

Description

@hanwencheng

Context

The architectural decision from the May session: Agent-as-MCP-tool, NOT LLM-caller-replacement. The xiaozhi-server LLM (or any MCP-host LLM) keeps the fast/cheap turns on the cheap path; expensive agentic loops are explicit tool calls into Hermes (or OpenClaw — #118).

This issue is the first proof that M3's runtime-neutrality thesis works. Per milestones-roadmap.md §4, M3 needs 3+ runtimes proving the same AgentKeys backend serves them all. Hermes is the first; OpenClaw (#118) is the second; Doubao (already in M2's #112) is the third.

NousResearch hermes-agent is MIT-licensed and battle-tested — self-improving learning loop, Honcho user modeling, FTS5 session search. Wrapping it as an MCP tool means any MCP host can invoke a full agentic runtime without having to embed it.

Scope (M3)

Deploy NousResearch Hermes-agent

  • One instance (single-region for v0); scale-out in M4 if vendor pilots demand it
  • Persistent storage for Hermes' session state (SQLite or Postgres; whichever the upstream defaults to)
  • Hermes connects to the AgentKeys MCP server (Phase 1: AgentKeys MCP server — 7 active tools + 3 schema-only #107) as a downstream — Hermes uses AgentKeys tools internally for memory + identity + audit

MCP server wrapping Hermes

  • One tool: hermes.execute_task(task, context, constraints)
  • Tool signature per agent-iam-strategy.md "Hermes-as-MCP" discussion:
hermes.execute_task(
  task: string,
  context: {
    actor_omni: string,
    session_id: string,
    memory_namespaces: string[],
  },
  constraints: {
    max_duration_s: number,
    max_cost_usd: number,
    tools_allowed: string[],
  }
) → {
  result: string,
  steps_taken: number,
  cost_usd: number,
  audit_trail_id: string,
}

Recursive composition

Hermes-agent uses AgentKeys MCP tools internally:

  • Read memory for context → agentkeys.memory.get
  • Check permissions for actions → agentkeys.permission.check
  • Append audit rows for each step → agentkeys.audit.append

This creates a two-layer audit trail: AgentKeys records "Hermes invoked"; Hermes-side audit records each step Hermes took inside the run. Both surface in #115's audit dashboard.

Out of scope (defer)

  • Multi-instance Hermes (M4 — single instance is enough for M3 proof)
  • Tuning Hermes' system prompts per vendor (M4)
  • Streaming responses (MCP spec supports it; defer until vendor demand)
  • Cross-Hermes-session memory sharing (M4 with delegation work)

Acceptance criteria

  • A xiaozhi-server LLM (M1's setup) successfully calls hermes.execute_task for a complex task ("plan my 3-day Chengdu trip with ¥5000 budget") and gets a result back
  • Hermes pulls memory via AgentKeys MCP for context — verified by audit-trail showing memory.get calls during the Hermes run
  • End-to-end latency for non-real-time tasks is tolerable (30-60s acceptable per the M3 sequencing — these are tasks, not chat turns)
  • max_duration_s constraint enforced: a deliberately-long task times out at the configured limit and returns a graceful timeout result + audit row
  • max_cost_usd constraint enforced: a deliberately-expensive task halts at the cost cap + audit row explaining why
  • Hermes invocation is observable in Phase 2: Audit dashboard (two-tier visible: real-time feed + chain anchor) #115's audit dashboard via the two-layer trail

Risks

Risk Mitigation
Hermes' LLM costs explode under bad constraints Server-side cost tracker is authoritative; vendor's per-actor cost limit is enforced before Hermes-side limit
Hermes-side state diverges from AgentKeys session state Hermes uses AgentKeys memory worker as its persistent context; session_id maps to actor_omni-scoped memory namespace
Cold start latency (Hermes process + LLM warmup) is unacceptable for the demo Pre-warm one Hermes instance per vendor; warm-pool sizing is M4 tuning

References

Effort

~1-2 weeks. Sequencing:

  1. (Days 1-3) Deploy Hermes-agent + storage + AgentKeys MCP client config
  2. (Days 3-5) MCP server wrapper + tool signature + auth
  3. (Days 5-8) Constraint enforcement (duration + cost) + audit-trail wiring
  4. (Days 8-10) Integration test from xiaozhi-server → Hermes → AgentKeys MCP → S3
  5. (Days 10-14) Performance pass + vendor-demo readiness

Pickup notes for the next agent / developer

  • Read xiaozhi-hermes-architecture.md first — the three-diagram explanation of where Hermes fits
  • Then xiaozhi-hermes-risks.md — verified risks against actual repo code with file:line citations
  • The architectural decision is in agent-iam-strategy.md — Hermes is a callable tool, not an LLM-caller replacement. Don't accidentally re-architect it as the LLM the xiaozhi-server calls; that destroys the cheap-path principle.
  • Hermes-agent upstream lives at github.com/nousresearch/hermes-agent; it's Python; their HTTP gateway is the surface we wrap
  • For the MCP server framework: stick with the same Python SDK choice from Phase 1: AgentKeys MCP server — 7 active tools + 3 schema-only #107
  • Watch for: the two-layer audit trail is the proof that runtime-neutrality works. If you ship without it, you can't demonstrate the value to a vendor.
  • Use the /agentkeys-issue-create skill for follow-up issues (e.g., per-runtime tuning, M4 multi-instance scaling)

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/mcpMCP server, MCP tool integration, MCP protocol work

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions