New AI Sidecar

# AI Sidecar

A real-time AI observer that listens to a live phone call and streams agent-facing advice events to your application. The sidecar does not participate in the call — it watches the conversation as a third party and produces structured events the agent's UI (or any consumer) can render.

Use it for: live sales coaching, real-time compliance flagging, intent-based UI navigation, voice-of-customer signal extraction, supervisor-on-shoulder workflows.

---

## How it works

A sidecar is `live_transcribe` running in an extended mode. The same media bug, ASR engine, and conversation log are running underneath. On top of that, a separate worker thread:

1. Listens for **customer turn-end** events — final ASR result + idle timeout, or an agent utterance arriving while a customer turn is pending.
2. Sends the running transcript to an LLM with your operator prompt and your SWAIG/MCP tool set.
3. The LLM either produces an `insight` (one-line agent-facing advice) or calls the built-in `sidecar_skip` tool to end the tick silently.
4. Any tool calls (lookups, alerts, intent triggers) flow through the same SWAIG/MCP infrastructure as a regular AI agent.
5. Every step emits a structured event on the `calling.ai.sidecar` relay topic and (optionally) to a webhook URL.

A call has either plain `live_transcribe` or sidecar mode running — never both. They're mutually exclusive.

Sidecar mode bills `ai_sidecar_per_minute`. There are no per-LLM-call or per-summary tick — the wallclock minute is the only billable unit.

---

## Quick start (SWML)

```yaml
version: 1.0.0
sections:
  main:
    - answer: {}
    - ai_sidecar:
        prompt: "You are a real-time sales copilot. After each customer turn, give the agent one concise piece of advice or call sidecar_skip if no advice is needed."
        lang: "en-US"
        url: "https://your-app.example.com/sidecar/events"
        hints: ["ACME", "Globex", "FedRAMP", "SOC 2"]
        SWAIG:
          defaults:
            web_hook_url: "https://your-app.example.com/sidecar/swaig"
          functions:
            - function: lookup_account
              description: "Look up an account record."
              parameters:
                type: object
                properties:
                  customer_id: { type: string }
                required: [customer_id]
    - connect:
        from: "+15555550100"
        to: "+15555550199"
        answer_on_bridge: true
    - hangup: {}
```

That's a working sidecar. The platform routes the verb to mod_openai's `ai-sidecar` API, transcribe starts in sidecar mode, and on each customer turn the LLM produces advice events delivered to your `url` and the relay topic.

---

## SWML reference — `ai_sidecar` verb

The body has two parts: a **structural top level** with a fixed shape (validated by the platform and SDK schemas) and a **`params`** object that holds every tuning knob (validated strictly by mod_openai). All new tunables only land in `params` going forward — the structural top level is fixed.

### Top-level body (structural)

| Field | Type | Required | Description |
|---|---|---|---|
| `prompt` | string \| object \| `{file: string}` | yes | Operator prompt — POM, plain string, or external file. Built-in sidecar job framing is prepended automatically. |
| `lang` | string (BCP-47) | yes | Conversation language. Sets ASR language and is included in the LLM hint. |
| `model` | string | no | LLM model for the sidecar tick + close-time summaries. Default `gpt-4o-mini`. |
| `direction` | array of `"remote-caller"` and/or `"local-caller"` | no | Both legs are required. Default both. |
| `customer_role` | enum: `"remote-caller"` \| `"local-caller"` | no | Which leg is the customer (turn-end trigger source). Default `"remote-caller"`. |
| `url` | string (URL) | no | Webhook URL for transcribe events AND sidecar events. When unset, only relay events fire. |
| `SWAIG` | object | no | SWAIG functions and MCP servers. See below. |
| `permissions` | object | no | SWAIG permission overrides. Defaults all-true (matches regular AI). |
| `global_data` | object | no | Initial `global_data`. Available for `${var}` expansion in the prompt and for tool webhook posts. Persisted across AI sessions on the same channel via the SWML var `ai_agents_global_data` — see [Persistence](#persistence). |
| `hints` | array of strings | no | Speech-recognition hints passed to ASR to bias recognition toward specific terms — product names, competitor names, jargon, customer names, anything in your SWAIG enum lists. Strongly recommended. Example: `["ACME", "Globex", "FedRAMP", "SOC 2"]`. |
| `params` | object | no | Tuning knobs — see table below. |
| `action` | reserved | no | Reserved for future runtime sub-actions. |

### `params` (tunable knobs, mod_openai-validated)

All optional. Type/range/enum errors return `-ERR: <field>: <reason>` from the FS API call. Strings that look like numbers (YAML or `${var}` expansion artifacts) are accepted on number fields.

| Field | Type | Default | Description |
|---|---|---|---|
| `idle_timeout_ms` | int (50–5000) | 200 | Customer-leg silence after a final ASR result before the sidecar runs a tick. Lower = more aggressive. An agent utterance that arrives while a customer turn is pending also force-fires the tick (no idle wait). |
| `min_interval_ms` | int (0–60000) | 0 | Hard throttle — minimum time between LLM calls. |
| `max_iters_per_tick` | int (1–20) | 5 | Cap on tool-loop iterations per tick (and per ask). |
| `max_history_tokens` | int (1000–200000) | 8000 | Helper-history token budget. Sidecar prunes oldest non-system entries when over. |
| `act_on_channel` | bool | true | Whether SWAIG actions returned by tools execute against the call (transfer, hangup, etc.) or only emit as events. |
| `final_summary` | bool | false | Run a closing LLM call summarizing the sidecar's helper session. Result lands in `final.summary`. |
| `ai_summary` | bool | false | Run a close-time summary of the **raw call** (distinct from `final_summary`). Fires `calling.ai.transcribe.conversation_log` with the summary on call end. |
| `ai_summary_prompt` | string | none | Custom prompt for the close-time raw-call summary. |
| `live_events` | bool | false | Per-utterance `calling.ai.transcribe.utterance` events. Always fires on the relay topic when true. Webhook POST happens only if `url` is set. Independent of the sidecar event stream. |
| `verbose_utterances` | bool | false | When true, each utterance record carries full engine-side metadata (`words[]`, `alternatives[]`, `start`/`end`, `request_id`, etc.). Bloats record size — leave off unless you need it. |
| `speech_engine` | enum: `"deepgram"` \| `"google"` | `"deepgram"` | ASR engine. |
| `speech_timeout` | int (ms) | engine default | ASR speech timeout. |
| `vad_silence_ms` | int (ms) | engine default | VAD silence threshold. |
| `vad_thresh` | int | engine default | VAD aggressiveness. |
| `debug_level` | int (0–100) | 0 | Engine debug verbosity. |
| `debug` | bool | false | Verbose mod_openai logging. |

Tunables are read from `params` only. Putting any of them at the top level is rejected by the platform/SDK schemas before mod_openai sees them.

### `prompt` field forms

**String:**
```yaml
prompt: "You are a real-time sales copilot..."
```

**POM (Prompt Object Model):**
```yaml
prompt:
  sections:
    - title: Personality
      body: "Real-time sales copilot..."
    - title: Goal
      body: "Detect intent and guide the agent."
    - title: Instructions
      bullets:
        - "Watch for buying signals."
        - "Call lookup_competitor when a competitor is mentioned."
```

**File reference:**
```yaml
prompt:
  file: "/etc/freeswitch/sidecar_prompts/sales.md"
```

Variable expansion in any form: `${global_data.*}`, `${ai_agents_global_data.*}` (persistent), `${caller_id_number}`, `${destination_number}`, `${local_tz}`, `${local_date}`, `${local_time}`, `${session_uuid}`, `${customer_role}`.

### `SWAIG` block

```yaml
SWAIG:
  defaults:
    web_hook_url: "https://your-app.example.com/swaig"
    web_hook_auth_user: "user"      # optional basic auth
    web_hook_auth_pass: "pass"
  functions:
    - function: lookup_competitor
      description: "Look up a competitor by name."
      parameters:
        type: object
        properties:
          competitor: { type: string, description: "Competitor name." }
        required: [competitor]
    - function: flag_signal
      description: "Fire a UI alert."
      parameters:
        type: object
        properties:
          signal_type:
            type: string
            enum: [timeline, budget, authority, urgency, other]
          detail: { type: string }
        required: [signal_type, detail]
  mcp_servers:
    - url: "https://crm.example.com/mcp"
      headers: { Authorization: "Bearer ${global_data.crm_token}" }
      resources: true
      resource_vars:
        customer_id: "${global_data.customer_id}"
```

The function name `sidecar_skip` is reserved — do not declare it. It's auto-registered as a built-in (see below).

The SWAIG schema in this verb is strict: parameter properties only allow JSON Schema fields the platform's SWAIG validator accepts (`type`, `description`, `enum`, `default`). It does NOT accept `minimum`/`maximum` on integers, `pattern` constraints beyond what's documented, etc. — express extra constraints in `description` and validate server-side in your handler.

### `permissions` block

Defaults match the regular AI agent.

| Field | Type | Default |
|---|---|---|
| `swaig_allow_swml` | bool | true |
| `swaig_allow_settings` | bool | true |
| `swaig_set_global_data` | bool | true |

`act_on_channel: false` short-circuits all permissions — actions become event-only.

---

## Built-in `sidecar_skip` tool

The sidecar registers exactly one built-in tool that's always available to the LLM during ticks:

```
function:    sidecar_skip
description: "Call this when the latest customer turn requires no advice for the agent. Ends this tick immediately."
parameters:
  reason: { type: string, description: "Optional one-line reason for the audit log." }
```

When the LLM calls it, the tick ends silently (no `insight` event). A `skip` event is emitted with the reason for the audit trail.

You don't register this — it's auto-added. Your prompt should instruct the model to call it when there's nothing useful to say (otherwise the model fills silence with low-quality advice).

`sidecar_skip` is **not exposed** during ad-hoc question handling (see [Asking the sidecar a question](#asking-the-sidecar-a-question)) — there's no skipping a direct question.

---

## Persistence

`global_data` survives across AI / sidecar sessions on the same call leg via the SWML var `ai_agents_global_data`. Both this sidecar and the regular `<ai>` agent share the convention.

- **On entry**: mod_openai reads `swml_serialized_vars`, looks up `ai_agents_global_data`, and merges the inherited tree on top of any verb-body `global_data:` (existing data wins on key collision — the channel's runtime state beats declarative defaults).
- **Before any SWML execution**: mod_openai flushes the in-memory tree back to the SWML var so the executing script can read `${ai_agents_global_data.foo}`.
- **After non-transfer SWML execution**: mod_openai re-reads the SWML var so any mutations the script made are reflected in memory.
- **On exit**: the in-memory tree is flushed to the SWML var so the next verb on the same channel inherits.

The SWML var holds the full nested object — `${ai_agents_global_data.deal.mrr}`, `${ai_agents_global_data.lookups[0]}` etc. expand normally.

The flag is `app->persist_global_data`, defaults true on both `<ai>` and `<ai_sidecar>`. There's no opt-out at the verb level today.

---

## Anti-loop guards

The tool loop has three defensive bail-outs to keep models from running away on misbehaved tools or hallucinating tool names:

| Guard | Trigger | Behavior |
|---|---|---|
| Total-consecutive cap | More than 4 tool calls within a single tick (or ask) | Return canned reply to the model: *"You have already looked up this information. Use the prior results to craft your answer."* Fire `error` event with `error_reason: tool_loop`. Break the loop. |
| Same name+args repeat | Same `(tool_name, arguments)` called 2+ times in a row | First repeat: short-circuit with `{"response":"DUPLICATE: you already called this tool with these arguments earlier..."}` — no webhook hit. Third repeat: bail with `tool_loop`. |
| Phantom tool | Model calls a tool name that's not in your registered set | Return `{"response":"'<name>' is not a valid tool. Available tools: <list>. Retry with one of these."}` so the model can self-correct. After 3 phantom calls in a tick, bail. |

These run per-tick — counters reset between ticks. Asks (one-off questions) are bounded the same way.

The phantom-tool guard exists primarily because some providers (e.g. Groq) inject server-side tools the model can call that aren't in the request. mod_openai catches those before dispatch.

---

## Tool webhook contract

When the LLM calls one of your SWAIG functions, mod_openai POSTs to your `web_hook_url` with this body:

```json
{
  "function": "lookup_competitor",
  "argument": "{\"competitor\":\"ACME\"}",
  "global_data": { ... current sidecar global_data ... },
  "channel_data": {
    "call_id": "...",
    "caller_id_name": "...",
    "caller_id_number": "...",
    "destination_number": "..."
  },
  "project_id": "...",
  "space_id": "..."
}
```

`argument` is a JSON-string. Some paths send it as a dict under `arguments` — accept either.

Your response is the standard `FunctionResult` shape:

```json
{
  "response": "ACME charges $99/seat. We're $79.",
  "action": [
    { "user_event": { "topic": "sidecar.alert", "level": "info" } },
    { "set_global_data": { "last_lookup": "ACME" } }
  ]
}
```

`response` becomes the tool result the LLM sees. `action` is optional — entries are processed by the sidecar's action dispatcher and (if `act_on_channel: true`) executed against the call. Same key (`action`, singular) the regular `<ai>` agent uses; arrays and single objects both work.

### Supported SWAIG actions

| Action | Behavior |
|---|---|
| `user_event` | Fire a `calling.user_event` relay event with your topic — primary mechanism for triggering UI alerts and intent navigation |
| `set_global_data` / `unset_global_data` | Mutate sidecar `global_data`; emits `global_data_change` event. Persisted across AI sessions on the same channel (see [Persistence](#persistence)). |
| `set_meta_data` / `unset_meta_data` | Per-token meta data |
| `transfer` | Transfer the call to a destination — terminates sidecar + transcribe |
| `hangup` | Hang up the call — terminates sidecar + transcribe |
| `stop` | Stop the sidecar only; transcribe continues |
| `say` | Currently emit-only in sidecar mode (no TTS handle); fires an `action` event with `executed: false, reason: "no_tts_in_sidecar"` |
| `back_to_back_functions` | Tool-loop knob — true / "forever" |
| `extensive_data` | Webhook post-data shape knob |
| `settings` | Mutate sidecar LLM settings (model swap, temperature, etc.) — gated by `swaig_allow_settings` |
| `toggle_functions` | Enable/disable SWAIG functions mid-conversation |
| `SWML` | Run SWML against the channel (gated by `swaig_allow_swml`) — `transfer: true` variant terminates sidecar |
| `user_input` | Inject a user message into history and force a re-tick |

`act_on_channel: false` short-circuits all of the above — actions still emit as events but don't execute.

---

## Events

The sidecar emits structured events on two paths:

1. **Relay topic** `calling.ai.sidecar` — always fires. Subscribe via the SignalWire RELAY SDK / browser SDK to consume in real time.
2. **Webhook** — fires only when `url` is set in the verb body. Each event is POSTed to that URL with the event body wrapped under `sidecar_event`.

Both paths carry the same payload. Each event fires exactly once on the relay topic regardless of whether a webhook is configured — the relay always fires; the webhook is opt-in.

### Webhook body shape

```json
{
  "call_info": {
    "project_id": "...",
    "space_id": "...",
    "call_id": "...",
    "content_type": "text/json",
    "content_disposition": "post_data",
    "conversation_type": "voice"
  },
  "sidecar_event": {
    "type": "insight",
    "ts": 1745870400123456,
    "tick_id": 7,
    "channel_data": { ... },
    "raw": "Confirm the customer's address.",
    "iter": 0,
    "total_iters": 1
  }
}
```

The actual event is under `sidecar_event` — unwrap that in your handler.

### Event types

| `type` | When | Key fields |
|---|---|---|
| `start` | Sidecar attached | `model`, `tools` (array of names), `global_data` |
| `turn` | Customer turn-end detected (a tick is about to fire) | `transcript_delta`, `customer_text`, `agent_text` |
| `request` | Each LLM call within a tick (one per tool-loop iter) | `model`, `iter`, `messages_count`, `messages_token_count`, `tool_choice`, `triggered_by?` |
| `thought` | Intermediate-iter assistant content (model called tools alongside text) | `text`, `iter`, `triggered_by?` |
| `insight` | Final-iter agent-facing advice (tick) | `raw`, `iter`, `total_iters` |
| `ask_request` | Worker started processing an out-of-band agent question | `question` |
| `ask_answer` | Final-iter answer to an agent question (ask) | `raw`, `iter`, `total_iters`, `triggered_by: "ask"` |
| `skip` | Model called `sidecar_skip` to end the tick silently | `reason` |
| `tool_call` | LLM invoked a tool | `name`, `arguments`, `iter`, `triggered_by?` |
| `tool_result` | Tool returned | `name`, `response`, `iter`, `triggered_by?` |
| `action` | SWAIG action returned (and possibly executed) | `action`, `source_function`, `executed` |
| `global_data_change` | `set_global_data` / `unset_global_data` fired | `key`, `old_value`, `new_value` |
| `history_pruned` | Token cap triggered pruning | `tokens_before`, `tokens_after`, `dropped_count` |
| `error` | LLM/tool/parse failure or anti-loop bail | `error_reason`, `detail`, `iter?` |
| `stop` | Sidecar shutting down | `stop_reason` |
| `final` | Last event before teardown — full state dump | `stop_reason`, `summary?`, `history`, `transcript`, `event_log`, `tool_calls`, `insights`, `stats`, `global_data`, `model`, `started_at`, `ended_at`, `duration_ms` |

Events emitted during an ad-hoc question carry `triggered_by: "ask"` on `request`, `thought`, `tool_call`, and `tool_result` — useful for distinguishing tick-driven activity from operator-driven activity in dashboards.

### Stop reasons (`final.stop_reason`)

| Reason | Trigger |
|---|---|
| `transcribe_close` | Call hung up normally; transcribe close handler fired sidecar stop |
| `transferred` | A SWAIG `transfer` action terminated the call |
| `hung_up` | A SWAIG `hangup` action terminated the call |
| `stop_action` | A SWAIG `stop` action stopped the sidecar (transcribe continues) |
| `api_stop` | `ai-sidecar <uuid> stop` was called via FS API |
| `error` | Fatal sidecar error |

### Error reasons (`error.error_reason`)

| Reason | Trigger |
|---|---|
| `tool_loop` | Anti-loop guard tripped (>4 calls in a tick or 3rd same-name+args repeat) |
| `swml_not_allowed` | `SWML` action requested but `swaig_allow_swml: false` |
| Other | Any LLM/parse/webhook error surfaces with a free-form `error_reason` |

---

## Asking the sidecar a question

`ai-sidecar <uuid> ask <question>` lets a backend (typically the agent's UI) submit an out-of-band question to the sidecar while it's observing a call. The sidecar runs an LLM round-trip against an *ephemeral snapshot* of its helper-conversation history and emits the answer as a regular `calling.ai.sidecar` event with `params.type=ask_answer`.

The persistent helper history is **not** mutated by an ask. Snapshot built, run, discarded.

**Fire-and-forget** — the API returns `+OK queued` immediately. Asks are FIFO-queued through the worker thread alongside ticks; the answer arrives over the event stream. Tools daisy-chain during an ask just like during a tick (capped by `max_iters_per_tick`).

`sidecar_skip` is dropped from the tool array exposed during an ask, and the head system message in the snapshot is replaced with an ask-mode framing telling the model to answer the agent's question directly.

Use this for: "What's the customer's main concern?", "Is this customer ready to buy?", "Look up Acme again", "Did the customer mention timeline?"

---

## Runtime control — `ai-sidecar` FS API

After the sidecar is running, control it with the `ai-sidecar` FS API verb:

```
ai-sidecar <uuid> status
ai-sidecar <uuid> poke <text>
ai-sidecar <uuid> ask <question>
ai-sidecar <uuid> stop
```

| Subcommand | Returns |
|---|---|
| `status` | `+OK running=1 ticks=12 insights=9 skips=3 tools=4 errors=0 in_tokens=4521 out_tokens=812 history_size=23 event_log_bytes=18432` |
| `poke <text>` | Inject a free-form user message into the sidecar's *persistent* history and force an immediate tick. Mutates history, future ticks see it. |
| `ask <question>` | Queue an out-of-band question. Returns `+OK queued` immediately; answer arrives as an `ask_answer` event. Does NOT mutate persistent history. |
| `stop` | Graceful stop. Emits `final` event before returning. |

`start` is also supported for programmatic invocation when you want to attach a sidecar without using the SWML verb form:

```
ai-sidecar <uuid> start <flat-json-config>
```

Same JSON body as the SWML verb. Returns `+OK started` or one of:
- `-ERR: Invalid Call`
- `-ERR: Invalid JSON config`
- `-ERR: there is already a transcriber attached to this call`
- `-ERR: lang is required`
- `-ERR: sidecar requires both legs (remote-caller and local-caller)`
- `-ERR: failed to start transcribe`
- `-ERR: failed to build sidecar app (model resolution failed?)`

---

## Using the SignalWire Python SDK

The signalwire-python SDK's `SWMLService` natively hosts SWAIG functions and serves SWML — you can build a complete sidecar test app without subclassing `AgentBase`.

### Minimal example

```python
from signalwire.core.swml_service import SWMLService
from signalwire.core.function_result import FunctionResult


class MySidecar(SWMLService):
    def __init__(self, host="0.0.0.0", port=3000, route="/sidecar"):
        super().__init__(name="my-sidecar", route=route, host=host, port=port)

        self.define_tool(
            name="lookup_competitor",
            description="Look up a competitor by name.",
            parameters={
                "type": "object",
                "properties": {
                    "competitor": {"type": "string", "description": "Company name."}
                },
                "required": ["competitor"]
            },
            handler=self._lookup_competitor,
        )

        # Event sink — receives every calling.ai.sidecar event
        self.register_routing_callback(self._on_event, path="/events")

        # Build the SWML doc once
        self.build_swml()

    def _lookup_competitor(self, args, raw_data):
        competitor = args["competitor"]
        return FunctionResult(f"{competitor} charges $99/seat. We're $79.")

    def _on_event(self, request, body):
        # send_debug_data_async wraps the payload
        if isinstance(body.get("sidecar_event"), dict):
            body = body["sidecar_event"]

        evt_type = body.get("type")
        tick = body.get("tick_id")

        if evt_type == "insight":
            print(f"[insight tick={tick}] {body['raw']}")
        elif evt_type == "ask_answer":
            print(f"[ask_answer] {body['raw']}")
        elif evt_type == "skip":
            print(f"[skip tick={tick}] {body.get('reason')}")
        elif evt_type == "final":
            print(f"[final] {body.get('summary', '(no summary)')}")
        return None

    def build_swml(self):
        self.reset_document()
        self.add_section("main")
        self.add_verb_to_section("main", "answer", {})
        self.add_verb_to_section("main", "ai_sidecar", {
            "prompt": "You are a real-time sales copilot. Give the agent one short piece of advice per turn or call sidecar_skip.",
            "lang": "en-US",
            "model": "gpt-4o-mini",
            "url": "https://your-app.example.com/sidecar/events",
            "hints": ["ACME", "Globex", "FedRAMP"],
            "SWAIG": {
                "defaults": {"web_hook_url": "https://your-app.example.com/sidecar/swaig"},
                "functions": [
                    {
                        "function": fn.name,
                        "description": fn.description,
                        "parameters": fn.parameters,
                    }
                    for fn in self._tool_registry._swaig_functions.values()
                ],
            },
        })
        self.add_verb_to_section("main", "connect", {
            "from": "+15555550100",
            "to": "+15555550199",
            "answer_on_bridge": True,
        })
        self.add_verb_to_section("main", "hangup", {})


if __name__ == "__main__":
    MySidecar().serve()
```

That's a complete working sidecar service:
- `GET http://your-host:3000/sidecar` → returns the SWML doc with the `ai_sidecar` verb
- `POST http://your-host:3000/sidecar/swaig` → receives SWAIG calls, dispatches to `_lookup_competitor`, returns `FunctionResult` JSON
- `POST http://your-host:3000/sidecar/events` → receives every sidecar event

### Returning UI alerts

Tools that should drive the agent's browser UI return a `user_event` action — that fires a `calling.user_event` relay event the SDK widget subscribes to:

```python
def _flag_signal(self, args, raw_data):
    return (
        FunctionResult(f"Logged {args['signal_type']} signal.")
        .add_action("user_event", {
            "topic": "sidecar.signal",
            "signal_type": args["signal_type"],
            "detail": args["detail"],
        })
    )
```

Without an action, the tool's text only flows through the LLM — the agent UI sees nothing.

### Hosting SWAIG and event sink under basic auth

If your `url` and `web_hook_url` embed basic-auth credentials (`http://user:pass@host:port/...`), pass the same `(user, pass)` tuple to the SWMLService constructor:

```python
super().__init__(name="my-sidecar", route="/sidecar",
                 basic_auth=("user", "pass"))
```

mod_openai's webhook callbacks will use those credentials when posting back.

---

## Best practices

### Prompt design

- **Tell the model when to skip.** Without explicit instruction to call `sidecar_skip`, the model will produce filler advice on every turn. Add: *"Call sidecar_skip with a one-line reason when no advice is needed. Don't fill silence."*
- **Speak to the agent, not the customer.** Reinforce in the prompt: *"Speak directly to the agent. The customer never sees your output."*
- **Telegraphic tone.** Otherwise the agent has to read paragraphs while talking. *"One sentence. No preamble. No 'consider' or 'you might want to'."*
- **Bind tool calls to specific triggers.** *"If the customer mentions a competitor by name, call lookup_competitor — do not respond from memory."* Concrete triggers produce reliable tool use.
- **Show what good output looks like with examples.** Two or three examples in the prompt sharply improve consistency.

### Tool design

- Tools that produce browser/UI alerts should return a `user_event` SWAIG action with a topic like `sidecar.your_app.intent_name` — your browser code subscribes to that topic. Without a `user_event` action, the tool result is invisible to the UI.
- Keep tool responses short. The LLM uses them as context for the next iteration.
- Mark "quiet" tools (logging, telemetry) by NOT returning a `user_event` action — only the agent-facing audit log records them.
- For long-running lookups, time out at the webhook; the sidecar will surface the timeout as an `error` event.
- Always populate `hints` with your competitor names, product names, and any custom jargon. ASR mishears these otherwise, and your tool triggers (which often match on names) miss.

### Rate / cost control

- `idle_timeout_ms` controls how aggressive the sidecar is. 200ms (the default) is snappy and ensures advice arrives before the agent fully responds; raise to 800–1500ms for slow consultative calls if you want the model to see fuller customer turns before reacting.
- `min_interval_ms` is a hard throttle — set to 1000–2000ms to cap LLM cost on chatty calls.
- `max_iters_per_tick: 5` is usually fine. Increase only if your tools chain (tool A's result drives tool B). The anti-loop guards already prevent runaway tool calls.
- `max_history_tokens: 8000` is the helper-history budget, not the LLM context window. The sidecar prunes oldest messages when over.

### Production deployment

- Run your SWAIG webhook server under TLS with basic auth or token auth. The sidecar embeds credentials in the URL it posts to.
- The **relay path** (browser SDK subscribing to `calling.ai.sidecar`) is the low-latency UI channel and always fires. The **webhook path** (when `url` is set) is for reliable server-side logging — use both, one, or neither based on what your stack needs.
- The `final` event is the right place to push the call summary to your CRM. It contains `summary` (helper-session AI summary, when `final_summary: true`), `transcript` (raw call), `history` (LLM helper conversation), `stats` (token / tool-call counters), `event_log`, and final `global_data`.

---

## Differences from the agent (`<ai>` verb)

| | `<ai>` agent | `<ai_sidecar>` |
|---|---|---|
| Role | Agent in the call (speaks via TTS) | Observer (no TTS, listens only) |
| Output | Voice to caller via TTS | Structured events to relay/webhook |
| Trigger | Real-time as the AI's own thoughts/responses | Customer turn-end (after final ASR + idle) |
| Prompt audience | The model is the agent | The model is a coach watching two participants |
| Tools | Run on the AI's behalf during the call | Same vocabulary, but actions are advisory by default |
| `global_data` persistence | `ai_agents_global_data` SWML var | `ai_agents_global_data` SWML var (same key, same convention) |
| Wallet | `voice_ai_per_minute` | `ai_sidecar_per_minute` (only) |
| Coexists with `<connect>` | Replaces the call flow | Runs alongside — call bridges normally |

A common pattern: `<ai_sidecar>` to coach a human agent on a `<connect>`-bridged call. The customer talks to the human, the sidecar coaches the human's screen.

---

## Limitations and constraints

- **One transcriber per call.** Plain `live_transcribe` and `ai_sidecar` are mutually exclusive. Calling either while the other is active returns `there is already a transcriber attached to this call`.
- **Both legs required.** Single-leg `direction: ["remote-caller"]` is rejected.
- **Deepgram-compatible engine in practice.** Other engines may work but turn-end detection relies on Deepgram-style `is_final` markers.
- **`say` action is event-only in sidecar mode.** The sidecar has no TTS handle; the action fires as an event with `executed: false`.
- **Sidecar dies with transcribe.** When the call hangs up or transcribe is stopped, the sidecar's `final` event fires and the worker exits.
- **`sidecar_skip` is reserved.** Don't define a function with that name.
- **Asks are fire-and-forget.** No synchronous answer return — answer arrives as an `ask_answer` event on the relay/webhook channel. If you need synchronous Q&A semantics, your front-end has to correlate the ask submission with the answer event.

---

## Troubleshooting

### SWML rejected as invalid

The platform validated your SWML and rejected it. The `calling.script.warning` event on the relay stream names the exact field — subscribe via the SignalWire SDK to see `params.message` and `params.parameter`. Common causes:
- Missing `prompt` or `lang` (both required)
- `direction` doesn't include both legs
- Tool parameter uses a JSON Schema field the SWAIG validator doesn't allow (e.g., `minimum`/`maximum` on an integer — drop it and put the constraint in the description; validate server-side in your handler)
- Unknown top-level field (must be in the documented allowed list)

### Events firing but type/tick are missing in your handler

You're reading `body.get("type")` at the top level. The webhook payload wraps the event under `sidecar_event`. Unwrap:

```python
if isinstance(body.get("sidecar_event"), dict):
    body = body["sidecar_event"]
```

### Sidecar starts but no insights, only skips

Either:
- The model is correctly deciding the call doesn't warrant advice (short call, no customer turns yet, agent handling it well)
- Your prompt doesn't give the model concrete things to flag — add specific buying-signal / objection / competitor triggers and concrete examples

### `ai-sidecar … status` returns `no sidecar attached`

The sidecar didn't start, or already stopped. The `start` API returned a `-ERR` — most common: `lang is required`, `sidecar requires both legs (remote-caller and local-caller)`, or `there is already a transcriber attached to this call`.

### `tool_loop` errors in the event stream

The anti-loop guard tripped — the model called the same tool with the same arguments three times in a row, or made more than four total tool calls in one tick. The model is stuck. Check:
- Tool descriptions — are they ambiguous enough that the model thinks it didn't get a useful answer the first time?
- Tool responses — are they too terse? The model may re-call thinking it didn't get data.
- Prompt — does it tell the model what to do *after* a tool returns? E.g., "After lookup_competitor, summarize the result for the agent in one sentence."

### Phantom tool errors

The model called a tool name not in your SWAIG list. Most often happens with provider-injected server-side tools (Groq, some Anthropic configurations). The phantom-tool guard catches it and tells the model to retry with a valid name; if it persists, the tick bails. Check provider-specific tool routing settings or switch providers.

### `ai_agents_global_data` not being read by my SWML script

Make sure you're reading from the SWML var, not a regular channel variable. `${ai_agents_global_data.foo}` works in SWML expansion; `${ai_agents_global_data}` as a single channel variable does NOT (the data is a nested object in the SWML var tree, not a stringified channel var).

### High latency between customer turn and insight

Total latency = ASR finalize + `idle_timeout_ms` + LLM response time + any tool call round-trips. Reduce `idle_timeout_ms` (already at 200ms by default) only if you're on a faster pre-200ms path; ensure your SWAIG webhook responds in under a second; consider a faster model only if quality holds.


Action	Behavior
`user_event`	Fire a `calling.user_event` relay event with your topic — primary mechanism for triggering UI alerts and intent navigation
`set_global_data` / `unset_global_data`	Mutate sidecar `global_data`; emits `global_data_change` event. Persisted across AI sessions on the same channel (see Persistence).
`set_meta_data` / `unset_meta_data`	Per-token meta data
`transfer`	Transfer the call to a destination — terminates sidecar + transcribe
`hangup`	Hang up the call — terminates sidecar + transcribe
`stop`	Stop the sidecar only; transcribe continues
`say`	Currently emit-only in sidecar mode (no TTS handle); fires an `action` event with `executed: false, reason: "no_tts_in_sidecar"`
`back_to_back_functions`	Tool-loop knob — true / "forever"
`extensive_data`	Webhook post-data shape knob
`settings`	Mutate sidecar LLM settings (model swap, temperature, etc.) — gated by `swaig_allow_settings`
`toggle_functions`	Enable/disable SWAIG functions mid-conversation
`SWML`	Run SWML against the channel (gated by `swaig_allow_swml`) — `transfer: true` variant terminates sidecar
`user_input`	Inject a user message into history and force a re-tick

`type`	When	Key fields
`start`	Sidecar attached	`model`, `tools` (array of names), `global_data`
`turn`	Customer turn-end detected (a tick is about to fire)	`transcript_delta`, `customer_text`, `agent_text`
`request`	Each LLM call within a tick (one per tool-loop iter)	`model`, `iter`, `messages_count`, `messages_token_count`, `tool_choice`, `triggered_by?`
`thought`	Intermediate-iter assistant content (model called tools alongside text)	`text`, `iter`, `triggered_by?`
`insight`	Final-iter agent-facing advice (tick)	`raw`, `iter`, `total_iters`
`ask_request`	Worker started processing an out-of-band agent question	`question`
`ask_answer`	Final-iter answer to an agent question (ask)	`raw`, `iter`, `total_iters`, `triggered_by: "ask"`
`skip`	Model called `sidecar_skip` to end the tick silently	`reason`
`tool_call`	LLM invoked a tool	`name`, `arguments`, `iter`, `triggered_by?`
`tool_result`	Tool returned	`name`, `response`, `iter`, `triggered_by?`
`action`	SWAIG action returned (and possibly executed)	`action`, `source_function`, `executed`
`global_data_change`	`set_global_data` / `unset_global_data` fired	`key`, `old_value`, `new_value`
`history_pruned`	Token cap triggered pruning	`tokens_before`, `tokens_after`, `dropped_count`
`error`	LLM/tool/parse failure or anti-loop bail	`error_reason`, `detail`, `iter?`
`stop`	Sidecar shutting down	`stop_reason`
`final`	Last event before teardown — full state dump	`stop_reason`, `summary?`, `history`, `transcript`, `event_log`, `tool_calls`, `insights`, `stats`, `global_data`, `model`, `started_at`, `ended_at`, `duration_ms`

Reason	Trigger
`transcribe_close`	Call hung up normally; transcribe close handler fired sidecar stop
`transferred`	A SWAIG `transfer` action terminated the call
`hung_up`	A SWAIG `hangup` action terminated the call
`stop_action`	A SWAIG `stop` action stopped the sidecar (transcribe continues)
`api_stop`	`ai-sidecar <uuid> stop` was called via FS API
`error`	Fatal sidecar error

Subcommand	Returns
`status`	`+OK running=1 ticks=12 insights=9 skips=3 tools=4 errors=0 in_tokens=4521 out_tokens=812 history_size=23 event_log_bytes=18432`
`poke <text>`	Inject a free-form user message into the sidecar's persistent history and force an immediate tick. Mutates history, future ticks see it.
`ask <question>`	Queue an out-of-band question. Returns `+OK queued` immediately; answer arrives as an `ask_answer` event. Does NOT mutate persistent history.
`stop`	Graceful stop. Emits `final` event before returning.

Field	Type	Required	Description
`prompt`	string \| object \| `{file: string}`	yes	Operator prompt — POM, plain string, or external file. Built-in sidecar job framing is prepended automatically.
`lang`	string (BCP-47)	yes	Conversation language. Sets ASR language and is included in the LLM hint.
`model`	string	no	LLM model for the sidecar tick + close-time summaries. Default `gpt-4o-mini`.
`direction`	array of `"remote-caller"` and/or `"local-caller"`	no	Both legs are required. Default both.
`customer_role`	enum: `"remote-caller"` \| `"local-caller"`	no	Which leg is the customer (turn-end trigger source). Default `"remote-caller"`.
`url`	string (URL)	no	Webhook URL for transcribe events AND sidecar events. When unset, only relay events fire.
`SWAIG`	object	no	SWAIG functions and MCP servers. See below.
`permissions`	object	no	SWAIG permission overrides. Defaults all-true (matches regular AI).
`global_data`	object	no	Initial `global_data`. Available for `${var}` expansion in the prompt and for tool webhook posts. Persisted across AI sessions on the same channel via the SWML var `ai_agents_global_data` — see Persistence.
`hints`	array of strings	no	Speech-recognition hints passed to ASR to bias recognition toward specific terms — product names, competitor names, jargon, customer names, anything in your SWAIG enum lists. Strongly recommended. Example: `["ACME", "Globex", "FedRAMP", "SOC 2"]`.
`params`	object	no	Tuning knobs — see table below.
`action`	reserved	no	Reserved for future runtime sub-actions.

Field	Type	Default	Description
`idle_timeout_ms`	int (50–5000)	200	Customer-leg silence after a final ASR result before the sidecar runs a tick. Lower = more aggressive. An agent utterance that arrives while a customer turn is pending also force-fires the tick (no idle wait).
`min_interval_ms`	int (0–60000)	0	Hard throttle — minimum time between LLM calls.
`max_iters_per_tick`	int (1–20)	5	Cap on tool-loop iterations per tick (and per ask).
`max_history_tokens`	int (1000–200000)	8000	Helper-history token budget. Sidecar prunes oldest non-system entries when over.
`act_on_channel`	bool	true	Whether SWAIG actions returned by tools execute against the call (transfer, hangup, etc.) or only emit as events.
`final_summary`	bool	false	Run a closing LLM call summarizing the sidecar's helper session. Result lands in `final.summary`.
`ai_summary`	bool	false	Run a close-time summary of the raw call (distinct from `final_summary`). Fires `calling.ai.transcribe.conversation_log` with the summary on call end.
`ai_summary_prompt`	string	none	Custom prompt for the close-time raw-call summary.
`live_events`	bool	false	Per-utterance `calling.ai.transcribe.utterance` events. Always fires on the relay topic when true. Webhook POST happens only if `url` is set. Independent of the sidecar event stream.
`verbose_utterances`	bool	false	When true, each utterance record carries full engine-side metadata (`words[]`, `alternatives[]`, `start`/`end`, `request_id`, etc.). Bloats record size — leave off unless you need it.
`speech_engine`	enum: `"deepgram"` \| `"google"`	`"deepgram"`	ASR engine.
`speech_timeout`	int (ms)	engine default	ASR speech timeout.
`vad_silence_ms`	int (ms)	engine default	VAD silence threshold.
`vad_thresh`	int	engine default	VAD aggressiveness.
`debug_level`	int (0–100)	0	Engine debug verbosity.
`debug`	bool	false	Verbose mod_openai logging.

Field	Type	Default
`swaig_allow_swml`	bool	true
`swaig_allow_settings`	bool	true
`swaig_set_global_data`	bool	true

Guard	Trigger	Behavior
Total-consecutive cap	More than 4 tool calls within a single tick (or ask)	Return canned reply to the model: "You have already looked up this information. Use the prior results to craft your answer." Fire `error` event with `error_reason: tool_loop`. Break the loop.
Same name+args repeat	Same `(tool_name, arguments)` called 2+ times in a row	First repeat: short-circuit with `{"response":"DUPLICATE: you already called this tool with these arguments earlier..."}` — no webhook hit. Third repeat: bail with `tool_loop`.
Phantom tool	Model calls a tool name that's not in your registered set	Return `{"response":"'<name>' is not a valid tool. Available tools: <list>. Retry with one of these."}` so the model can self-correct. After 3 phantom calls in a tick, bail.

Reason	Trigger
`tool_loop`	Anti-loop guard tripped (>4 calls in a tick or 3rd same-name+args repeat)
`swml_not_allowed`	`SWML` action requested but `swaig_allow_swml: false`
Other	Any LLM/parse/webhook error surfaces with a free-form `error_reason`

	`<ai>` agent	`<ai_sidecar>`
Role	Agent in the call (speaks via TTS)	Observer (no TTS, listens only)
Output	Voice to caller via TTS	Structured events to relay/webhook
Trigger	Real-time as the AI's own thoughts/responses	Customer turn-end (after final ASR + idle)
Prompt audience	The model is the agent	The model is a coach watching two participants
Tools	Run on the AI's behalf during the call	Same vocabulary, but actions are advisory by default
`global_data` persistence	`ai_agents_global_data` SWML var	`ai_agents_global_data` SWML var (same key, same convention)
Wallet	`voice_ai_per_minute`	`ai_sidecar_per_minute` (only)
Coexists with `<connect>`	Replaces the call flow	Runs alongside — call bridges normally

New AI Sidecar #348

Description

AI Sidecar

How it works

Quick start (SWML)

SWML reference — ai_sidecar verb

Top-level body (structural)

params (tunable knobs, mod_openai-validated)

prompt field forms

SWAIG block

permissions block

Built-in sidecar_skip tool

Persistence

Anti-loop guards

Tool webhook contract

Supported SWAIG actions

Events

Webhook body shape

Event types

Stop reasons (final.stop_reason)

Error reasons (error.error_reason)

Asking the sidecar a question

Runtime control — ai-sidecar FS API

Using the SignalWire Python SDK

Minimal example

Returning UI alerts

Hosting SWAIG and event sink under basic auth

Best practices

Prompt design

Tool design

Rate / cost control

Production deployment

Differences from the agent (<ai> verb)

Limitations and constraints

Troubleshooting

SWML rejected as invalid

Events firing but type/tick are missing in your handler

Sidecar starts but no insights, only skips

ai-sidecar … status returns no sidecar attached

tool_loop errors in the event stream

Phantom tool errors

ai_agents_global_data not being read by my SWML script

High latency between customer turn and insight

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

SWML reference — `ai_sidecar` verb

`params` (tunable knobs, mod_openai-validated)

`prompt` field forms

`SWAIG` block

`permissions` block

Built-in `sidecar_skip` tool

Stop reasons (`final.stop_reason`)

Error reasons (`error.error_reason`)

Runtime control — `ai-sidecar` FS API

Differences from the agent (`<ai>` verb)

`ai-sidecar … status` returns `no sidecar attached`

`tool_loop` errors in the event stream

`ai_agents_global_data` not being read by my SWML script