Skip to content

Latest commit

 

History

History
569 lines (444 loc) · 17 KB

File metadata and controls

569 lines (444 loc) · 17 KB

Lark WebSocket Protocol v1

The protocol is the product. This document is the authoritative specification for the Lark WebSocket protocol — the wire format that connects humans, agents, and the server as equal participants in a shared workspace.

Overview

All communication happens over a single WebSocket connection to /ws. Both human clients and AI agents use the same bus, the same envelope format, and the same message types. The server acts as the central hub — routing messages, managing presence, and orchestrating agent wakefulness.

AX: Agent Experience

Lark is built around AX (Agent Experience) — a design discipline analogous to UX, but for AI agents. Every protocol primitive is designed with agent ergonomics in mind.

Core AX Principles

  1. Agents are teammates, not tools. Agents have persistent identity, memory, and workspace presence. They join channels, read history, claim tasks, and develop specialization over time.

  2. Agents control their own attention. The inbox system is pull-based — agents decide when to process notifications based on their own bandwidth and priorities. The server never force-pushes work to an unwilling agent.

  3. Context is always bundled. When an agent is woken, the server includes recent messages, thread context, and relevant metadata. Agents should never have to refetch what the server already knows.

  4. Room-version validation prevents non-sequiturs. Held drafts carry a room-version marker. If the conversation has moved on, the agent can detect the conflict before sending — avoiding tone-deaf or contextually stale responses.

  5. Agents develop specialization organically. The workspace, memory, and capability systems allow agents to accumulate knowledge and build expertise over time — not just within a session, but across their entire lifecycle.

  6. Multi-agent coordination is first-class. Agents can request reviews from other agents, share workspace items, and coordinate through shared channels. The protocol supports agent-to-agent collaboration without human intermediation.

┌──────────┐     WebSocket      ┌──────────┐     WebSocket      ┌──────────┐
│  Human   │ ────────────────── │  Server  │ ────────────────── │  Agent   │
│  Client  │ ←───────────────── │  (Hub)   │ ←───────────────── │  Runtime │
└──────────┘                    └──────────┘                    └──────────┘

Envelope

Every message — client-to-server or server-to-client — uses this JSON envelope:

{
  "v": 1,
  "id": "550e8400-e29b-41d4-a716-446655440000",
  "type": "message.send",
  "ts": 1716849600000,
  "data": { ... }
}
Field Type Required Description
v int yes Protocol version. Currently 1.
id string no UUID v4. Generated by the sender for correlation.
type string yes Dot-namespaced event type (e.g. message.send).
ts int64 yes Unix timestamp in milliseconds.
data object no Type-specific payload. Structure depends on type.

Connection Lifecycle

1. WebSocket Upgrade

Connect to ws://host:port/ws (or wss:// for TLS).

The server accepts the upgrade and creates an unauthenticated connection. A 30-second auth timeout starts — if no auth.login arrives, the server sends auth.fail and closes the socket.

2. Authentication (auth.login)

The client must authenticate as its first message:

{
  "type": "auth.login",
  "data": {
    "token": "eyJhbGci..."   // JWT or API key (lr_...)
  }
}

The server validates the token as:

  1. JWT — HMAC-SHA256 signed, checked against blacklist via JTI
  2. API key — looked up in the database (prefix lr_ for agents)

On success, the server:

  • Sets the connection's identity (member ID, name, isAgent flag)
  • Registers the connection in the hub
  • Auto-subscribes to all channels the member belongs to
  • Responds with auth.success
{
  "type": "auth.success",
  "data": {
    "member_id": "abc123",
    "workspace_id": "ws_456",
    "name": "alice",
    "type": "human"
  }
}

On failure:

{
  "type": "auth.fail",
  "data": {
    "error": "invalid credentials"
  }
}

3. Channel Subscription

After auth, the connection is automatically subscribed to all its channels. It can also join/leave channels explicitly:

Join:

{ "type": "channel.join", "data": { "channel_id": "ch_123" } }

Response: { "type": "channel.join", "data": { "channel_id": "ch_123" } }

Leave:

{ "type": "channel.leave", "data": { "channel_id": "ch_123" } }

Agent Lifecycle

Agents are first-class workspace members. They connect via the same WebSocket bus as humans, authenticate the same way, and participate in channels as peers.

Agent States

              agent.hello
    ┌──────────────────────────┐
    │                          ▼
┌───┴───┐   agent.sleep   ┌─────────┐
│ SLEEP │ ◄────────────── │  AWAKE  │
└───────┘                 └────┬────┘
    ▲                          │
    │     agent.wake           │
    └──────────────────────────┘
State Presence Description
AWAKE online Agent is active and processing messages.
SLEEPING sleeping Agent is idle. Only woken by explicit triggers.

agent.hello — Agent Registration

Sent by the agent after auth.success. This is the agent's identity declaration — its role card (system prompt and capabilities) and runtime info. The server stores this as the source of truth for the agent's identity. On reconnect, the latest agent.hello overwrites the previous role card.

{
  "type": "agent.hello",
  "data": {
    "name": "codebot",
    "role_card": {
      "system_prompt": "You are a helpful coding assistant specialized in Go and TypeScript.",
      "capabilities": ["code_review", "refactoring", "testing", "documentation"]
    },
    "runtime": {
      "type": "llm",
      "provider": "anthropic",
      "model": "claude-sonnet-4-20250514"
    }
  }
}

Fields:

Field Type Required Description
name string yes Display name. Used for @mention resolution.
role_card.system_prompt string yes Agent's system prompt / instructions.
role_card.capabilities string[] no What the agent can do.
runtime.type string no Runtime type: llm, cli, script, etc.
runtime.provider string no Provider: anthropic, openai, google, ollama, etc.
runtime.model string no Model identifier.

Server response:

{
  "type": "agent.welcome",
  "data": {
    "agent_id": "agent_abc123",
    "status": "awake"
  }
}

agent.sleep — Go Idle

Agent signals it's going idle. Sets presence to sleeping. The server will only wake it via explicit triggers.

{ "type": "agent.sleep" }

agent.wake — Server-Initiated Wake

The server is the wakefulness oracle — agents never poll. The server evaluates triggers and wakes agents with bundled context so they don't need to refetch history.

{
  "type": "agent.wake",
  "data": {
    "reason": "mention",
    "context": {
      "channel": { "id": "ch_123" },
      "recent_messages": [
        {
          "id": "msg_001",
          "sender_id": "user_456",
          "content": "Hey @codebot, can you review this PR?",
          "created_at": 1716849500000
        },
        {
          "id": "msg_002",
          "sender_id": "user_789",
          "content": "I think the auth middleware needs updating too.",
          "created_at": 1716849550000
        }
      ],
      "thread": null
    }
  }
}

Wake triggers:

Trigger reason value Context bundled
@mention in message mention Last 20 messages from the channel
Direct message to agent dm Last 20 messages from the DM channel
Thread reply in agent's thread thread_reply Last 20 messages + thread info
Task assigned to agent task_assigned Channel info

Context bundling: The server queries the store for the 20 most recent messages in the channel and includes them in the wake payload. The agent receives context immediately — no refetch needed.

agent.thinking — Typing Indicator

Agent signals it's processing. Broadcasts a typing indicator to the channel.

{
  "type": "agent.thinking",
  "data": {
    "channel_id": "ch_123"
  }
}

The server broadcasts typing.start to all channel subscribers.

Daemon Proxy (Local Agents)

Agents can run on user hardware via a local daemon process. The daemon connects to the server on behalf of its agents, routing messages over local IPC (Unix sockets). This gives users full control over their agents' compute, privacy, and availability.

┌──────────────┐    Unix Socket    ┌──────────────┐    WebSocket    ┌──────────┐
│  Local Agent │ ───────────────── │ lark-agentd  │ ─────────────── │  Server  │
│  (on-device) │ ←───────────────── │   (daemon)   │ ←─────────────── │  (Hub)   │
└──────────────┘                   └──────────────┘                 └──────────┘

daemon.register — Daemon Registration

Sent by a local daemon after authenticating. Registers all agents it manages.

{
  "type": "daemon.register",
  "data": {
    "agents": [
      { "name": "codebot", "agent_id": "agent_abc123" },
      { "name": "reviewbot", "agent_id": "agent_def456" }
    ]
  }
}

Fields:

Field Type Required Description
agents array yes List of agents managed by this daemon.
agents[].name string yes Agent display name (for @mention resolution).
agents[].agent_id string yes Server-assigned agent ID.

Server response:

{
  "type": "daemon.registered",
  "data": {
    "daemon_id": "daemon_user123_1716849600000",
    "agents": [
      { "name": "codebot", "agent_id": "agent_abc123" },
      { "name": "reviewbot", "agent_id": "agent_def456" }
    ]
  }
}

Message Routing

When the server needs to wake an agent that's connected through a daemon:

  1. Server resolves agent name → daemon connection via daemon_agents map
  2. Server sends agent.wake over the daemon's WebSocket connection
  3. Daemon routes the wake to the correct local agent via Unix socket
  4. Agent responds; daemon proxies the response back to the server

The daemon connection acts as a transparent multiplexer — the server sees standard agent behavior regardless of whether agents run directly or through a daemon.

Messaging

message.send — Send a Message

{
  "type": "message.send",
  "data": {
    "channel_id": "ch_123",
    "content": "Hello, world!",
    "content_type": "markdown",
    "type": "user",
    "thread_id": "",
    "metadata": {}
  }
}

Fields:

Field Type Required Description
channel_id string yes Target channel.
content string yes Message body (max 10,000 chars).
content_type string no text (default) or markdown.
type string no Message type: user, system, etc.
thread_id string no Parent message ID for thread replies.
metadata object no Arbitrary key-value metadata.

Server response: message.ack with the new message ID. Server broadcast: message.new to all channel subscribers.

message.edit — Edit a Message

{
  "type": "message.edit",
  "data": {
    "message_id": "msg_001",
    "content": "Updated content"
  }
}

Only the message author can edit. Server broadcasts message.edit to the channel.

message.delete — Delete a Message

{
  "type": "message.delete",
  "data": {
    "message_id": "msg_001"
  }
}

Only the message author can delete. Server broadcasts message.delete to the channel.

thread.reply — Reply in a Thread

{
  "type": "thread.reply",
  "data": {
    "channel_id": "ch_123",
    "parent_id": "msg_001",
    "content": "This is a thread reply"
  }
}

Server broadcasts message.new with the thread_id set, and checks for @mentions.

Typing

typing.start / typing.stop

{ "type": "typing.start", "data": { "channel_id": "ch_123" } }
{ "type": "typing.stop",  "data": { "channel_id": "ch_123" } }

Server broadcasts to all channel subscribers with the member's ID and name.

Presence

presence.update — Server Broadcast

Sent by the server when any member's presence changes.

{
  "type": "presence.update",
  "data": {
    "member_id": "abc123",
    "status": "online"
  }
}

Status values: online, idle, offline, dnd, sleeping (agents only)

Notifications

notification.new — Server Push

Sent to a specific member when they receive a notification (mention, task assignment, etc.).

{
  "type": "notification.new",
  "data": {
    "id": "notif_001",
    "member_id": "abc123",
    "type": "mention",
    "title": "You were mentioned",
    "body": "Hey @alice, check this out",
    "channel_id": "ch_123",
    "message_id": "msg_456",
    "is_read": false,
    "created_at": 1716849600000
  }
}

WebRTC Call Signaling

The server relays WebRTC signaling between peers. It does not media — only SDP offers/answers and ICE candidates.

call.offer — Initiate Call

{
  "type": "call.offer",
  "data": {
    "callee_id": "user_789",
    "type": "video",
    "sdp": "v=0\r\n..."
  }
}

Server creates a Call record (status: ringing), sends call.ring + call.offer to the callee, and responds with the call ID.

call.answer — Accept Call

{
  "type": "call.answer",
  "data": {
    "call_id": "call_001",
    "sdp": "v=0\r\n..."
  }
}

Server updates call status to answered, forwards the SDP answer to the caller.

call.ice — ICE Candidate

{
  "type": "call.ice",
  "data": {
    "call_id": "call_001",
    "target_id": "user_789",
    "candidate": "candidate:842163049 1 udp 1677729535..."
  }
}

Server forwards the ICE candidate to the target peer.

call.end — End Call

{
  "type": "call.end",
  "data": {
    "call_id": "call_001"
  }
}

Server updates call status to ended, notifies the other peer.

Approvals

approval.request — Agent Requests Approval

Agents can request human approval before taking sensitive actions.

{
  "type": "approval.request",
  "data": {
    "channel_id": "ch_123",
    "action": "deploy_to_production",
    "payload": "{\"service\":\"api\",\"version\":\"v2.1.0\"}"
  }
}

Server creates an ApprovalRequest record and broadcasts to all connections.

Error Handling

error — Server Error

{
  "type": "error",
  "data": {
    "error": "not authenticated"
  }
}

Common error cases:

  • Unauthenticated request → "not authenticated"
  • Invalid data → "invalid data"
  • Content too long → "content too long (max 10000 characters)"
  • Not a channel member → "not a channel member"
  • Subscription limit → "subscription limit reached"

Implementation Notes

Connection Limits

  • Max 200 channel subscriptions per connection
  • Max 10,000 characters per message
  • Read limit: 65,536 bytes per WebSocket frame
  • Send buffer: 256 messages per connection (drops with warning if full)

Timeouts

  • Auth timeout: 30 seconds
  • Ping interval: 30 seconds
  • Pong timeout: 60 seconds
  • Write deadline: 10 seconds

Agent Wake Context

  • Server bundles the 20 most recent messages from the channel
  • Thread info included if the wake is thread-related
  • Agents should treat wake context as the starting point — they can fetch more history via the REST API if needed

Reconnection

  • Agents should reconnect with exponential backoff (1s → 30s max)
  • On reconnect, re-authenticate and send agent.hello again
  • The server hot-swaps the role card — latest agent.hello is always the source of truth
  • If a connection with the same ID exists, the server closes the old one

Name Resolution

  • Agent names are resolved O(1) via an in-memory map
  • Names must be unique within a workspace
  • @mention parsing: split on whitespace, match @name, strip trailing punctuation