Wire protocol

agent-eval exposes its evaluation logic over a versioned wire protocol so non-TypeScript clients (Python, Rust, Go, …) can drive it without a parallel implementation. The TypeScript runtime is the single source of truth; clients in other languages are transport adapters, not ports.

Mental model

your code (any language)
        │
        ▼
   thin transport client  ──HTTP──▶  agent-eval serve   ──┐
        │                                                  │
        └─────subprocess────────▶  agent-eval rpc        ──┤
                                                           ▼
                                              same TS handlers, same rubrics,
                                              same scoring code

Both transports talk to identical handlers. If you need a sustained connection (live agent paths, high-frequency calls), use HTTP. If you need a one-shot (cron, CI, batch), use stdio RPC. The wire shape is the same.

Two transports, one contract

	HTTP	stdio RPC
Start	`agent-eval serve --port 5005`	per-call: `agent-eval rpc <method>`
Latency	~10 ms	~500 ms (Node startup)
Best for	live calls, agent paths, dashboards	cron, CI, batch evaluation
Requires	running server	binary on PATH

Methods

The current surface is the smallest useful slice. Adding a method is mechanical — see §Adding a method.

`judge` — score content against a rubric

POST /v1/judge
{
  "rubricName": "anti-slop",
  "content": "We just shipped zero-copy IO between sandboxes",
  "context": { "platform": "x", "author": "drew", "impressions": 1240 }
}

{
  "composite": 0.78,
  "dimensions": { "buyer_quality": 0.85, "voice": 0.7, "signal": 0.8 },
  "failureModes": [],
  "wins": ["specific-component", "earned-detail"],
  "rationale": "Specific architectural detail, no AI cadence, technical voice.",
  "rubricVersion": "anti-slop@a4f2b8c1",
  "model": "claude-sonnet-4-6",
  "durationMs": 1840
}

Pass either rubricName (built-in) or rubric (inline definition). Not both. The handler:

Resolves the rubric.
Calls the judging LLM with a JSON-schema-constrained response.
Computes composite = Σ(weight_i × normalized_score_i) / Σ(weight_i).
Returns a typed JudgeResult.

rubricVersion is the stable hash of the rubric used. Scores are only comparable across runs when this matches.

`listRubrics` — discover what's registered

GET /v1/rubrics

{
  "rubrics": [
    {
      "name": "anti-slop",
      "description": "Voice and signal quality for technical-buyer content.",
      "dimensions": [
        { "id": "buyer_quality", "description": "Would the target buyer care?", "weight": 0.5 },
        { "id": "voice", "description": "Builder voice, not AI/marketing?", "weight": 0.3 },
        { "id": "signal", "description": "Non-obvious detail or constraint?", "weight": 0.2 }
      ],
      "failureModes": ["ai-cadence", "marketing-tone", "vague-claim", "no-hook", "engagement-bait", "off-icp", "stale-claim"],
      "rubricVersion": "anti-slop@a4f2b8c1"
    }
  ]
}

`version` — server + wire-protocol versions

GET /v1/version

{
  "package": "@tangle-network/agent-eval",
  "version": "0.20.10",
  "wireVersion": "1.0.0",
  "apiSurface": ["judge", "listRubrics", "version"]
}

version matches the package version. wireVersion bumps independently — only on breaking request/response schema changes. Package versions can differ across releases as long as wireVersion matches.

`GET /healthz` — liveness

For probing whether a server is up. Returns { "status": "ok", "uptimeSec": <number> }.

`GET /openapi.json` — full spec

Auto-generated from the Zod schemas. This is what code generators consume to produce typed clients in other languages.

Errors

Every error response uses the same shape:

{
  "error": {
    "code": "rubric_not_found",
    "message": "No built-in rubric named \"missing-name\".",
    "details": null
  }
}

HTTP	code	meaning
400	`validation_error`	Request didn't match the schema.
404	`rubric_not_found`	Unknown `rubricName`.
500	`judge_error`	LLM returned malformed output.
500	`internal_error`	Unexpected server error.

stdio RPC uses the same shape inside an envelope: {"error": {...}} instead of {"result": {...}}. Exit code is non-zero on error.

Running the server

agent-eval serve --port 5005 --host 127.0.0.1

Defaults to 127.0.0.1:5005. Bind to 0.0.0.0 only if you trust the network.

# health
curl http://localhost:5005/healthz

# discover
curl http://localhost:5005/v1/rubrics | jq

# judge
curl -X POST http://localhost:5005/v1/judge \
  -H 'content-type: application/json' \
  -d '{"rubricName":"anti-slop","content":"We just shipped …"}'

Using stdio RPC

# version
echo '{}' | agent-eval rpc version

# listRubrics
echo '{}' | agent-eval rpc listRubrics

# judge (one-shot)
echo '{"rubricName":"anti-slop","content":"…"}' | agent-eval rpc judge

# JSONL batch — one request per line
cat requests.jsonl | agent-eval rpc-batch judge > results.jsonl

Each invocation is one process — Node startup adds ~500 ms. For more than a few calls, stand up a server.

Clients

Python: source lives in clients/python. Auto-detects HTTP, falls back to subprocess. Version-locked to npm.
TypeScript: import directly from @tangle-network/agent-eval (no wire round-trip needed in-process).
Rust / Go / Other: generate from dist/openapi.json. PRs welcome to add an officially-maintained client.

Adding a method

Schema — define XRequestSchema and XResponseSchema in src/wire/schemas.ts. Every field gets a .describe() so docs flow through to OpenAPI.
Handler — pure function in src/wire/handlers.ts. Throws WireError for caller-fixable issues.
Server route — app.post('/v1/x', …) in src/wire/server.ts.
RPC case — add case 'x': in dispatchRpc in src/wire/rpc.ts.
OpenAPI route — register in src/wire/openapi.ts so it shows up in the spec.
Test — add to tests/wire/. At minimum: schema validation, happy-path, error-path.
Python client — add a method on Client in clients/python/src/agent_eval_rpc/client.py, plus pydantic models in models.py mirroring the new schemas.

The pattern is mechanical. When the surface grows past ~10 methods, swap the hand-written Python models for datamodel-code-generator -i openapi.json -o models.py.

Wire-protocol versioning

WIRE_VERSION (in src/wire/schemas.ts) is a separate semver from the npm/PyPI package version. It bumps on breaking changes to a request/response schema. Additive changes (new optional fields, new methods) don't require a bump.

When WIRE_VERSION bumps, every language client gets a new major version; the dual-publish CI (see .github/workflows/publish.yml) enforces this lock-step.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wire protocol

Mental model

Two transports, one contract

Methods

`judge` — score content against a rubric

`listRubrics` — discover what's registered

`version` — server + wire-protocol versions

`GET /healthz` — liveness

`GET /openapi.json` — full spec

Errors

Running the server

Using stdio RPC

Clients

Adding a method

Wire-protocol versioning

FilesExpand file tree

wire-protocol.md

Latest commit

History

wire-protocol.md

File metadata and controls

Wire protocol

Mental model

Two transports, one contract

Methods

judge — score content against a rubric

listRubrics — discover what's registered

version — server + wire-protocol versions

GET /healthz — liveness

GET /openapi.json — full spec

Errors

Running the server

Using stdio RPC

Clients

Adding a method

Wire-protocol versioning

`judge` — score content against a rubric

`listRubrics` — discover what's registered

`version` — server + wire-protocol versions

`GET /healthz` — liveness

`GET /openapi.json` — full spec