AI Sidecar
A real-time AI observer that listens to a live phone call and streams agent-facing advice events to your application. The sidecar does not participate in the call — it watches the conversation as a third party and produces structured events the agent's UI (or any consumer) can render.
Use it for: live sales coaching, real-time compliance flagging, intent-based UI navigation, voice-of-customer signal extraction, supervisor-on-shoulder workflows.
How it works
A sidecar is live_transcribe running in an extended mode. The same media bug, ASR engine, and conversation log are running underneath. On top of that, a separate worker thread:
- Listens for customer turn-end events — final ASR result + idle timeout, or an agent utterance arriving while a customer turn is pending.
- Sends the running transcript to an LLM with your operator prompt and your SWAIG/MCP tool set.
- The LLM either produces an
insight (one-line agent-facing advice) or calls the built-in sidecar_skip tool to end the tick silently.
- Any tool calls (lookups, alerts, intent triggers) flow through the same SWAIG/MCP infrastructure as a regular AI agent.
- Every step emits a structured event on the
calling.ai.sidecar relay topic and (optionally) to a webhook URL.
A call has either plain live_transcribe or sidecar mode running — never both. They're mutually exclusive.
Sidecar mode bills ai_sidecar_per_minute. There are no per-LLM-call or per-summary tick — the wallclock minute is the only billable unit.
Quick start (SWML)
version: 1.0.0
sections:
main:
- answer: {}
- ai_sidecar:
prompt: "You are a real-time sales copilot. After each customer turn, give the agent one concise piece of advice or call sidecar_skip if no advice is needed."
lang: "en-US"
url: "https://your-app.example.com/sidecar/events"
hints: ["ACME", "Globex", "FedRAMP", "SOC 2"]
SWAIG:
defaults:
web_hook_url: "https://your-app.example.com/sidecar/swaig"
functions:
- function: lookup_account
description: "Look up an account record."
parameters:
type: object
properties:
customer_id: { type: string }
required: [customer_id]
- connect:
from: "+15555550100"
to: "+15555550199"
answer_on_bridge: true
- hangup: {}
That's a working sidecar. The platform routes the verb to mod_openai's ai-sidecar API, transcribe starts in sidecar mode, and on each customer turn the LLM produces advice events delivered to your url and the relay topic.
SWML reference — ai_sidecar verb
The body has two parts: a structural top level with a fixed shape (validated by the platform and SDK schemas) and a params object that holds every tuning knob (validated strictly by mod_openai). All new tunables only land in params going forward — the structural top level is fixed.
Top-level body (structural)
| Field |
Type |
Required |
Description |
prompt |
string | object | {file: string} |
yes |
Operator prompt — POM, plain string, or external file. Built-in sidecar job framing is prepended automatically. |
lang |
string (BCP-47) |
yes |
Conversation language. Sets ASR language and is included in the LLM hint. |
model |
string |
no |
LLM model for the sidecar tick + close-time summaries. Default gpt-4o-mini. |
direction |
array of "remote-caller" and/or "local-caller" |
no |
Both legs are required. Default both. |
customer_role |
enum: "remote-caller" | "local-caller" |
no |
Which leg is the customer (turn-end trigger source). Default "remote-caller". |
url |
string (URL) |
no |
Webhook URL for transcribe events AND sidecar events. When unset, only relay events fire. |
SWAIG |
object |
no |
SWAIG functions and MCP servers. See below. |
permissions |
object |
no |
SWAIG permission overrides. Defaults all-true (matches regular AI). |
global_data |
object |
no |
Initial global_data. Available for ${var} expansion in the prompt and for tool webhook posts. Persisted across AI sessions on the same channel via the SWML var ai_agents_global_data — see Persistence. |
hints |
array of strings |
no |
Speech-recognition hints passed to ASR to bias recognition toward specific terms — product names, competitor names, jargon, customer names, anything in your SWAIG enum lists. Strongly recommended. Example: ["ACME", "Globex", "FedRAMP", "SOC 2"]. |
params |
object |
no |
Tuning knobs — see table below. |
action |
reserved |
no |
Reserved for future runtime sub-actions. |
params (tunable knobs, mod_openai-validated)
All optional. Type/range/enum errors return -ERR: <field>: <reason> from the FS API call. Strings that look like numbers (YAML or ${var} expansion artifacts) are accepted on number fields.
| Field |
Type |
Default |
Description |
idle_timeout_ms |
int (50–5000) |
200 |
Customer-leg silence after a final ASR result before the sidecar runs a tick. Lower = more aggressive. An agent utterance that arrives while a customer turn is pending also force-fires the tick (no idle wait). |
min_interval_ms |
int (0–60000) |
0 |
Hard throttle — minimum time between LLM calls. |
max_iters_per_tick |
int (1–20) |
5 |
Cap on tool-loop iterations per tick (and per ask). |
max_history_tokens |
int (1000–200000) |
8000 |
Helper-history token budget. Sidecar prunes oldest non-system entries when over. |
act_on_channel |
bool |
true |
Whether SWAIG actions returned by tools execute against the call (transfer, hangup, etc.) or only emit as events. |
final_summary |
bool |
false |
Run a closing LLM call summarizing the sidecar's helper session. Result lands in final.summary. |
ai_summary |
bool |
false |
Run a close-time summary of the raw call (distinct from final_summary). Fires calling.ai.transcribe.conversation_log with the summary on call end. |
ai_summary_prompt |
string |
none |
Custom prompt for the close-time raw-call summary. |
live_events |
bool |
false |
Per-utterance calling.ai.transcribe.utterance events. Always fires on the relay topic when true. Webhook POST happens only if url is set. Independent of the sidecar event stream. |
verbose_utterances |
bool |
false |
When true, each utterance record carries full engine-side metadata (words[], alternatives[], start/end, request_id, etc.). Bloats record size — leave off unless you need it. |
speech_engine |
enum: "deepgram" | "google" |
"deepgram" |
ASR engine. |
speech_timeout |
int (ms) |
engine default |
ASR speech timeout. |
vad_silence_ms |
int (ms) |
engine default |
VAD silence threshold. |
vad_thresh |
int |
engine default |
VAD aggressiveness. |
debug_level |
int (0–100) |
0 |
Engine debug verbosity. |
debug |
bool |
false |
Verbose mod_openai logging. |
Tunables are read from params only. Putting any of them at the top level is rejected by the platform/SDK schemas before mod_openai sees them.
prompt field forms
String:
prompt: "You are a real-time sales copilot..."
POM (Prompt Object Model):
prompt:
sections:
- title: Personality
body: "Real-time sales copilot..."
- title: Goal
body: "Detect intent and guide the agent."
- title: Instructions
bullets:
- "Watch for buying signals."
- "Call lookup_competitor when a competitor is mentioned."
File reference:
prompt:
file: "/etc/freeswitch/sidecar_prompts/sales.md"
Variable expansion in any form: ${global_data.*}, ${ai_agents_global_data.*} (persistent), ${caller_id_number}, ${destination_number}, ${local_tz}, ${local_date}, ${local_time}, ${session_uuid}, ${customer_role}.
SWAIG block
SWAIG:
defaults:
web_hook_url: "https://your-app.example.com/swaig"
web_hook_auth_user: "user" # optional basic auth
web_hook_auth_pass: "pass"
functions:
- function: lookup_competitor
description: "Look up a competitor by name."
parameters:
type: object
properties:
competitor: { type: string, description: "Competitor name." }
required: [competitor]
- function: flag_signal
description: "Fire a UI alert."
parameters:
type: object
properties:
signal_type:
type: string
enum: [timeline, budget, authority, urgency, other]
detail: { type: string }
required: [signal_type, detail]
mcp_servers:
- url: "https://crm.example.com/mcp"
headers: { Authorization: "Bearer ${global_data.crm_token}" }
resources: true
resource_vars:
customer_id: "${global_data.customer_id}"
The function name sidecar_skip is reserved — do not declare it. It's auto-registered as a built-in (see below).
The SWAIG schema in this verb is strict: parameter properties only allow JSON Schema fields the platform's SWAIG validator accepts (type, description, enum, default). It does NOT accept minimum/maximum on integers, pattern constraints beyond what's documented, etc. — express extra constraints in description and validate server-side in your handler.
permissions block
Defaults match the regular AI agent.
| Field |
Type |
Default |
swaig_allow_swml |
bool |
true |
swaig_allow_settings |
bool |
true |
swaig_set_global_data |
bool |
true |
act_on_channel: false short-circuits all permissions — actions become event-only.
Built-in sidecar_skip tool
The sidecar registers exactly one built-in tool that's always available to the LLM during ticks:
function: sidecar_skip
description: "Call this when the latest customer turn requires no advice for the agent. Ends this tick immediately."
parameters:
reason: { type: string, description: "Optional one-line reason for the audit log." }
When the LLM calls it, the tick ends silently (no insight event). A skip event is emitted with the reason for the audit trail.
You don't register this — it's auto-added. Your prompt should instruct the model to call it when there's nothing useful to say (otherwise the model fills silence with low-quality advice).
sidecar_skip is not exposed during ad-hoc question handling (see Asking the sidecar a question) — there's no skipping a direct question.
Persistence
global_data survives across AI / sidecar sessions on the same call leg via the SWML var ai_agents_global_data. Both this sidecar and the regular <ai> agent share the convention.
- On entry: mod_openai reads
swml_serialized_vars, looks up ai_agents_global_data, and merges the inherited tree on top of any verb-body global_data: (existing data wins on key collision — the channel's runtime state beats declarative defaults).
- Before any SWML execution: mod_openai flushes the in-memory tree back to the SWML var so the executing script can read
${ai_agents_global_data.foo}.
- After non-transfer SWML execution: mod_openai re-reads the SWML var so any mutations the script made are reflected in memory.
- On exit: the in-memory tree is flushed to the SWML var so the next verb on the same channel inherits.
The SWML var holds the full nested object — ${ai_agents_global_data.deal.mrr}, ${ai_agents_global_data.lookups[0]} etc. expand normally.
The flag is app->persist_global_data, defaults true on both <ai> and <ai_sidecar>. There's no opt-out at the verb level today.
Anti-loop guards
The tool loop has three defensive bail-outs to keep models from running away on misbehaved tools or hallucinating tool names:
| Guard |
Trigger |
Behavior |
| Total-consecutive cap |
More than 4 tool calls within a single tick (or ask) |
Return canned reply to the model: "You have already looked up this information. Use the prior results to craft your answer." Fire error event with error_reason: tool_loop. Break the loop. |
| Same name+args repeat |
Same (tool_name, arguments) called 2+ times in a row |
First repeat: short-circuit with {"response":"DUPLICATE: you already called this tool with these arguments earlier..."} — no webhook hit. Third repeat: bail with tool_loop. |
| Phantom tool |
Model calls a tool name that's not in your registered set |
Return {"response":"'<name>' is not a valid tool. Available tools: <list>. Retry with one of these."} so the model can self-correct. After 3 phantom calls in a tick, bail. |
These run per-tick — counters reset between ticks. Asks (one-off questions) are bounded the same way.
The phantom-tool guard exists primarily because some providers (e.g. Groq) inject server-side tools the model can call that aren't in the request. mod_openai catches those before dispatch.
Tool webhook contract
When the LLM calls one of your SWAIG functions, mod_openai POSTs to your web_hook_url with this body:
{
"function": "lookup_competitor",
"argument": "{\"competitor\":\"ACME\"}",
"global_data": { ... current sidecar global_data ... },
"channel_data": {
"call_id": "...",
"caller_id_name": "...",
"caller_id_number": "...",
"destination_number": "..."
},
"project_id": "...",
"space_id": "..."
}
argument is a JSON-string. Some paths send it as a dict under arguments — accept either.
Your response is the standard FunctionResult shape:
{
"response": "ACME charges $99/seat. We're $79.",
"action": [
{ "user_event": { "topic": "sidecar.alert", "level": "info" } },
{ "set_global_data": { "last_lookup": "ACME" } }
]
}
response becomes the tool result the LLM sees. action is optional — entries are processed by the sidecar's action dispatcher and (if act_on_channel: true) executed against the call. Same key (action, singular) the regular <ai> agent uses; arrays and single objects both work.
Supported SWAIG actions
| Action |
Behavior |
user_event |
Fire a calling.user_event relay event with your topic — primary mechanism for triggering UI alerts and intent navigation |
set_global_data / unset_global_data |
Mutate sidecar global_data; emits global_data_change event. Persisted across AI sessions on the same channel (see Persistence). |
set_meta_data / unset_meta_data |
Per-token meta data |
transfer |
Transfer the call to a destination — terminates sidecar + transcribe |
hangup |
Hang up the call — terminates sidecar + transcribe |
stop |
Stop the sidecar only; transcribe continues |
say |
Currently emit-only in sidecar mode (no TTS handle); fires an action event with executed: false, reason: "no_tts_in_sidecar" |
back_to_back_functions |
Tool-loop knob — true / "forever" |
extensive_data |
Webhook post-data shape knob |
settings |
Mutate sidecar LLM settings (model swap, temperature, etc.) — gated by swaig_allow_settings |
toggle_functions |
Enable/disable SWAIG functions mid-conversation |
SWML |
Run SWML against the channel (gated by swaig_allow_swml) — transfer: true variant terminates sidecar |
user_input |
Inject a user message into history and force a re-tick |
act_on_channel: false short-circuits all of the above — actions still emit as events but don't execute.
Events
The sidecar emits structured events on two paths:
- Relay topic
calling.ai.sidecar — always fires. Subscribe via the SignalWire RELAY SDK / browser SDK to consume in real time.
- Webhook — fires only when
url is set in the verb body. Each event is POSTed to that URL with the event body wrapped under sidecar_event.
Both paths carry the same payload. Each event fires exactly once on the relay topic regardless of whether a webhook is configured — the relay always fires; the webhook is opt-in.
Webhook body shape
{
"call_info": {
"project_id": "...",
"space_id": "...",
"call_id": "...",
"content_type": "text/json",
"content_disposition": "post_data",
"conversation_type": "voice"
},
"sidecar_event": {
"type": "insight",
"ts": 1745870400123456,
"tick_id": 7,
"channel_data": { ... },
"raw": "Confirm the customer's address.",
"iter": 0,
"total_iters": 1
}
}
The actual event is under sidecar_event — unwrap that in your handler.
Event types
type |
When |
Key fields |
start |
Sidecar attached |
model, tools (array of names), global_data |
turn |
Customer turn-end detected (a tick is about to fire) |
transcript_delta, customer_text, agent_text |
request |
Each LLM call within a tick (one per tool-loop iter) |
model, iter, messages_count, messages_token_count, tool_choice, triggered_by? |
thought |
Intermediate-iter assistant content (model called tools alongside text) |
text, iter, triggered_by? |
insight |
Final-iter agent-facing advice (tick) |
raw, iter, total_iters |
ask_request |
Worker started processing an out-of-band agent question |
question |
ask_answer |
Final-iter answer to an agent question (ask) |
raw, iter, total_iters, triggered_by: "ask" |
skip |
Model called sidecar_skip to end the tick silently |
reason |
tool_call |
LLM invoked a tool |
name, arguments, iter, triggered_by? |
tool_result |
Tool returned |
name, response, iter, triggered_by? |
action |
SWAIG action returned (and possibly executed) |
action, source_function, executed |
global_data_change |
set_global_data / unset_global_data fired |
key, old_value, new_value |
history_pruned |
Token cap triggered pruning |
tokens_before, tokens_after, dropped_count |
error |
LLM/tool/parse failure or anti-loop bail |
error_reason, detail, iter? |
stop |
Sidecar shutting down |
stop_reason |
final |
Last event before teardown — full state dump |
stop_reason, summary?, history, transcript, event_log, tool_calls, insights, stats, global_data, model, started_at, ended_at, duration_ms |
Events emitted during an ad-hoc question carry triggered_by: "ask" on request, thought, tool_call, and tool_result — useful for distinguishing tick-driven activity from operator-driven activity in dashboards.
Stop reasons (final.stop_reason)
| Reason |
Trigger |
transcribe_close |
Call hung up normally; transcribe close handler fired sidecar stop |
transferred |
A SWAIG transfer action terminated the call |
hung_up |
A SWAIG hangup action terminated the call |
stop_action |
A SWAIG stop action stopped the sidecar (transcribe continues) |
api_stop |
ai-sidecar <uuid> stop was called via FS API |
error |
Fatal sidecar error |
Error reasons (error.error_reason)
| Reason |
Trigger |
tool_loop |
Anti-loop guard tripped (>4 calls in a tick or 3rd same-name+args repeat) |
swml_not_allowed |
SWML action requested but swaig_allow_swml: false |
| Other |
Any LLM/parse/webhook error surfaces with a free-form error_reason |
Asking the sidecar a question
ai-sidecar <uuid> ask <question> lets a backend (typically the agent's UI) submit an out-of-band question to the sidecar while it's observing a call. The sidecar runs an LLM round-trip against an ephemeral snapshot of its helper-conversation history and emits the answer as a regular calling.ai.sidecar event with params.type=ask_answer.
The persistent helper history is not mutated by an ask. Snapshot built, run, discarded.
Fire-and-forget — the API returns +OK queued immediately. Asks are FIFO-queued through the worker thread alongside ticks; the answer arrives over the event stream. Tools daisy-chain during an ask just like during a tick (capped by max_iters_per_tick).
sidecar_skip is dropped from the tool array exposed during an ask, and the head system message in the snapshot is replaced with an ask-mode framing telling the model to answer the agent's question directly.
Use this for: "What's the customer's main concern?", "Is this customer ready to buy?", "Look up Acme again", "Did the customer mention timeline?"
Runtime control — ai-sidecar FS API
After the sidecar is running, control it with the ai-sidecar FS API verb:
ai-sidecar <uuid> status
ai-sidecar <uuid> poke <text>
ai-sidecar <uuid> ask <question>
ai-sidecar <uuid> stop
| Subcommand |
Returns |
status |
+OK running=1 ticks=12 insights=9 skips=3 tools=4 errors=0 in_tokens=4521 out_tokens=812 history_size=23 event_log_bytes=18432 |
poke <text> |
Inject a free-form user message into the sidecar's persistent history and force an immediate tick. Mutates history, future ticks see it. |
ask <question> |
Queue an out-of-band question. Returns +OK queued immediately; answer arrives as an ask_answer event. Does NOT mutate persistent history. |
stop |
Graceful stop. Emits final event before returning. |
start is also supported for programmatic invocation when you want to attach a sidecar without using the SWML verb form:
ai-sidecar <uuid> start <flat-json-config>
Same JSON body as the SWML verb. Returns +OK started or one of:
-ERR: Invalid Call
-ERR: Invalid JSON config
-ERR: there is already a transcriber attached to this call
-ERR: lang is required
-ERR: sidecar requires both legs (remote-caller and local-caller)
-ERR: failed to start transcribe
-ERR: failed to build sidecar app (model resolution failed?)
Using the SignalWire Python SDK
The signalwire-python SDK's SWMLService natively hosts SWAIG functions and serves SWML — you can build a complete sidecar test app without subclassing AgentBase.
Minimal example
from signalwire.core.swml_service import SWMLService
from signalwire.core.function_result import FunctionResult
class MySidecar(SWMLService):
def __init__(self, host="0.0.0.0", port=3000, route="/sidecar"):
super().__init__(name="my-sidecar", route=route, host=host, port=port)
self.define_tool(
name="lookup_competitor",
description="Look up a competitor by name.",
parameters={
"type": "object",
"properties": {
"competitor": {"type": "string", "description": "Company name."}
},
"required": ["competitor"]
},
handler=self._lookup_competitor,
)
# Event sink — receives every calling.ai.sidecar event
self.register_routing_callback(self._on_event, path="/events")
# Build the SWML doc once
self.build_swml()
def _lookup_competitor(self, args, raw_data):
competitor = args["competitor"]
return FunctionResult(f"{competitor} charges $99/seat. We're $79.")
def _on_event(self, request, body):
# send_debug_data_async wraps the payload
if isinstance(body.get("sidecar_event"), dict):
body = body["sidecar_event"]
evt_type = body.get("type")
tick = body.get("tick_id")
if evt_type == "insight":
print(f"[insight tick={tick}] {body['raw']}")
elif evt_type == "ask_answer":
print(f"[ask_answer] {body['raw']}")
elif evt_type == "skip":
print(f"[skip tick={tick}] {body.get('reason')}")
elif evt_type == "final":
print(f"[final] {body.get('summary', '(no summary)')}")
return None
def build_swml(self):
self.reset_document()
self.add_section("main")
self.add_verb_to_section("main", "answer", {})
self.add_verb_to_section("main", "ai_sidecar", {
"prompt": "You are a real-time sales copilot. Give the agent one short piece of advice per turn or call sidecar_skip.",
"lang": "en-US",
"model": "gpt-4o-mini",
"url": "https://your-app.example.com/sidecar/events",
"hints": ["ACME", "Globex", "FedRAMP"],
"SWAIG": {
"defaults": {"web_hook_url": "https://your-app.example.com/sidecar/swaig"},
"functions": [
{
"function": fn.name,
"description": fn.description,
"parameters": fn.parameters,
}
for fn in self._tool_registry._swaig_functions.values()
],
},
})
self.add_verb_to_section("main", "connect", {
"from": "+15555550100",
"to": "+15555550199",
"answer_on_bridge": True,
})
self.add_verb_to_section("main", "hangup", {})
if __name__ == "__main__":
MySidecar().serve()
That's a complete working sidecar service:
GET http://your-host:3000/sidecar → returns the SWML doc with the ai_sidecar verb
POST http://your-host:3000/sidecar/swaig → receives SWAIG calls, dispatches to _lookup_competitor, returns FunctionResult JSON
POST http://your-host:3000/sidecar/events → receives every sidecar event
Returning UI alerts
Tools that should drive the agent's browser UI return a user_event action — that fires a calling.user_event relay event the SDK widget subscribes to:
def _flag_signal(self, args, raw_data):
return (
FunctionResult(f"Logged {args['signal_type']} signal.")
.add_action("user_event", {
"topic": "sidecar.signal",
"signal_type": args["signal_type"],
"detail": args["detail"],
})
)
Without an action, the tool's text only flows through the LLM — the agent UI sees nothing.
Hosting SWAIG and event sink under basic auth
If your url and web_hook_url embed basic-auth credentials (http://user:pass@host:port/...), pass the same (user, pass) tuple to the SWMLService constructor:
super().__init__(name="my-sidecar", route="/sidecar",
basic_auth=("user", "pass"))
mod_openai's webhook callbacks will use those credentials when posting back.
Best practices
Prompt design
- Tell the model when to skip. Without explicit instruction to call
sidecar_skip, the model will produce filler advice on every turn. Add: "Call sidecar_skip with a one-line reason when no advice is needed. Don't fill silence."
- Speak to the agent, not the customer. Reinforce in the prompt: "Speak directly to the agent. The customer never sees your output."
- Telegraphic tone. Otherwise the agent has to read paragraphs while talking. "One sentence. No preamble. No 'consider' or 'you might want to'."
- Bind tool calls to specific triggers. "If the customer mentions a competitor by name, call lookup_competitor — do not respond from memory." Concrete triggers produce reliable tool use.
- Show what good output looks like with examples. Two or three examples in the prompt sharply improve consistency.
Tool design
- Tools that produce browser/UI alerts should return a
user_event SWAIG action with a topic like sidecar.your_app.intent_name — your browser code subscribes to that topic. Without a user_event action, the tool result is invisible to the UI.
- Keep tool responses short. The LLM uses them as context for the next iteration.
- Mark "quiet" tools (logging, telemetry) by NOT returning a
user_event action — only the agent-facing audit log records them.
- For long-running lookups, time out at the webhook; the sidecar will surface the timeout as an
error event.
- Always populate
hints with your competitor names, product names, and any custom jargon. ASR mishears these otherwise, and your tool triggers (which often match on names) miss.
Rate / cost control
idle_timeout_ms controls how aggressive the sidecar is. 200ms (the default) is snappy and ensures advice arrives before the agent fully responds; raise to 800–1500ms for slow consultative calls if you want the model to see fuller customer turns before reacting.
min_interval_ms is a hard throttle — set to 1000–2000ms to cap LLM cost on chatty calls.
max_iters_per_tick: 5 is usually fine. Increase only if your tools chain (tool A's result drives tool B). The anti-loop guards already prevent runaway tool calls.
max_history_tokens: 8000 is the helper-history budget, not the LLM context window. The sidecar prunes oldest messages when over.
Production deployment
- Run your SWAIG webhook server under TLS with basic auth or token auth. The sidecar embeds credentials in the URL it posts to.
- The relay path (browser SDK subscribing to
calling.ai.sidecar) is the low-latency UI channel and always fires. The webhook path (when url is set) is for reliable server-side logging — use both, one, or neither based on what your stack needs.
- The
final event is the right place to push the call summary to your CRM. It contains summary (helper-session AI summary, when final_summary: true), transcript (raw call), history (LLM helper conversation), stats (token / tool-call counters), event_log, and final global_data.
Differences from the agent (<ai> verb)
|
<ai> agent |
<ai_sidecar> |
| Role |
Agent in the call (speaks via TTS) |
Observer (no TTS, listens only) |
| Output |
Voice to caller via TTS |
Structured events to relay/webhook |
| Trigger |
Real-time as the AI's own thoughts/responses |
Customer turn-end (after final ASR + idle) |
| Prompt audience |
The model is the agent |
The model is a coach watching two participants |
| Tools |
Run on the AI's behalf during the call |
Same vocabulary, but actions are advisory by default |
global_data persistence |
ai_agents_global_data SWML var |
ai_agents_global_data SWML var (same key, same convention) |
| Wallet |
voice_ai_per_minute |
ai_sidecar_per_minute (only) |
Coexists with <connect> |
Replaces the call flow |
Runs alongside — call bridges normally |
A common pattern: <ai_sidecar> to coach a human agent on a <connect>-bridged call. The customer talks to the human, the sidecar coaches the human's screen.
Limitations and constraints
- One transcriber per call. Plain
live_transcribe and ai_sidecar are mutually exclusive. Calling either while the other is active returns there is already a transcriber attached to this call.
- Both legs required. Single-leg
direction: ["remote-caller"] is rejected.
- Deepgram-compatible engine in practice. Other engines may work but turn-end detection relies on Deepgram-style
is_final markers.
say action is event-only in sidecar mode. The sidecar has no TTS handle; the action fires as an event with executed: false.
- Sidecar dies with transcribe. When the call hangs up or transcribe is stopped, the sidecar's
final event fires and the worker exits.
sidecar_skip is reserved. Don't define a function with that name.
- Asks are fire-and-forget. No synchronous answer return — answer arrives as an
ask_answer event on the relay/webhook channel. If you need synchronous Q&A semantics, your front-end has to correlate the ask submission with the answer event.
Troubleshooting
SWML rejected as invalid
The platform validated your SWML and rejected it. The calling.script.warning event on the relay stream names the exact field — subscribe via the SignalWire SDK to see params.message and params.parameter. Common causes:
- Missing
prompt or lang (both required)
direction doesn't include both legs
- Tool parameter uses a JSON Schema field the SWAIG validator doesn't allow (e.g.,
minimum/maximum on an integer — drop it and put the constraint in the description; validate server-side in your handler)
- Unknown top-level field (must be in the documented allowed list)
Events firing but type/tick are missing in your handler
You're reading body.get("type") at the top level. The webhook payload wraps the event under sidecar_event. Unwrap:
if isinstance(body.get("sidecar_event"), dict):
body = body["sidecar_event"]
Sidecar starts but no insights, only skips
Either:
- The model is correctly deciding the call doesn't warrant advice (short call, no customer turns yet, agent handling it well)
- Your prompt doesn't give the model concrete things to flag — add specific buying-signal / objection / competitor triggers and concrete examples
ai-sidecar … status returns no sidecar attached
The sidecar didn't start, or already stopped. The start API returned a -ERR — most common: lang is required, sidecar requires both legs (remote-caller and local-caller), or there is already a transcriber attached to this call.
tool_loop errors in the event stream
The anti-loop guard tripped — the model called the same tool with the same arguments three times in a row, or made more than four total tool calls in one tick. The model is stuck. Check:
- Tool descriptions — are they ambiguous enough that the model thinks it didn't get a useful answer the first time?
- Tool responses — are they too terse? The model may re-call thinking it didn't get data.
- Prompt — does it tell the model what to do after a tool returns? E.g., "After lookup_competitor, summarize the result for the agent in one sentence."
Phantom tool errors
The model called a tool name not in your SWAIG list. Most often happens with provider-injected server-side tools (Groq, some Anthropic configurations). The phantom-tool guard catches it and tells the model to retry with a valid name; if it persists, the tick bails. Check provider-specific tool routing settings or switch providers.
ai_agents_global_data not being read by my SWML script
Make sure you're reading from the SWML var, not a regular channel variable. ${ai_agents_global_data.foo} works in SWML expansion; ${ai_agents_global_data} as a single channel variable does NOT (the data is a nested object in the SWML var tree, not a stringified channel var).
High latency between customer turn and insight
Total latency = ASR finalize + idle_timeout_ms + LLM response time + any tool call round-trips. Reduce idle_timeout_ms (already at 200ms by default) only if you're on a faster pre-200ms path; ensure your SWAIG webhook responds in under a second; consider a faster model only if quality holds.
AI Sidecar
A real-time AI observer that listens to a live phone call and streams agent-facing advice events to your application. The sidecar does not participate in the call — it watches the conversation as a third party and produces structured events the agent's UI (or any consumer) can render.
Use it for: live sales coaching, real-time compliance flagging, intent-based UI navigation, voice-of-customer signal extraction, supervisor-on-shoulder workflows.
How it works
A sidecar is
live_transcriberunning in an extended mode. The same media bug, ASR engine, and conversation log are running underneath. On top of that, a separate worker thread:insight(one-line agent-facing advice) or calls the built-insidecar_skiptool to end the tick silently.calling.ai.sidecarrelay topic and (optionally) to a webhook URL.A call has either plain
live_transcribeor sidecar mode running — never both. They're mutually exclusive.Sidecar mode bills
ai_sidecar_per_minute. There are no per-LLM-call or per-summary tick — the wallclock minute is the only billable unit.Quick start (SWML)
That's a working sidecar. The platform routes the verb to mod_openai's
ai-sidecarAPI, transcribe starts in sidecar mode, and on each customer turn the LLM produces advice events delivered to yoururland the relay topic.SWML reference —
ai_sidecarverbThe body has two parts: a structural top level with a fixed shape (validated by the platform and SDK schemas) and a
paramsobject that holds every tuning knob (validated strictly by mod_openai). All new tunables only land inparamsgoing forward — the structural top level is fixed.Top-level body (structural)
prompt{file: string}langmodelgpt-4o-mini.direction"remote-caller"and/or"local-caller"customer_role"remote-caller"|"local-caller""remote-caller".urlSWAIGpermissionsglobal_dataglobal_data. Available for${var}expansion in the prompt and for tool webhook posts. Persisted across AI sessions on the same channel via the SWML varai_agents_global_data— see Persistence.hints["ACME", "Globex", "FedRAMP", "SOC 2"].paramsactionparams(tunable knobs, mod_openai-validated)All optional. Type/range/enum errors return
-ERR: <field>: <reason>from the FS API call. Strings that look like numbers (YAML or${var}expansion artifacts) are accepted on number fields.idle_timeout_msmin_interval_msmax_iters_per_tickmax_history_tokensact_on_channelfinal_summaryfinal.summary.ai_summaryfinal_summary). Firescalling.ai.transcribe.conversation_logwith the summary on call end.ai_summary_promptlive_eventscalling.ai.transcribe.utteranceevents. Always fires on the relay topic when true. Webhook POST happens only ifurlis set. Independent of the sidecar event stream.verbose_utteranceswords[],alternatives[],start/end,request_id, etc.). Bloats record size — leave off unless you need it.speech_engine"deepgram"|"google""deepgram"speech_timeoutvad_silence_msvad_threshdebug_leveldebugTunables are read from
paramsonly. Putting any of them at the top level is rejected by the platform/SDK schemas before mod_openai sees them.promptfield formsString:
POM (Prompt Object Model):
File reference:
Variable expansion in any form:
${global_data.*},${ai_agents_global_data.*}(persistent),${caller_id_number},${destination_number},${local_tz},${local_date},${local_time},${session_uuid},${customer_role}.SWAIGblockThe function name
sidecar_skipis reserved — do not declare it. It's auto-registered as a built-in (see below).The SWAIG schema in this verb is strict: parameter properties only allow JSON Schema fields the platform's SWAIG validator accepts (
type,description,enum,default). It does NOT acceptminimum/maximumon integers,patternconstraints beyond what's documented, etc. — express extra constraints indescriptionand validate server-side in your handler.permissionsblockDefaults match the regular AI agent.
swaig_allow_swmlswaig_allow_settingsswaig_set_global_dataact_on_channel: falseshort-circuits all permissions — actions become event-only.Built-in
sidecar_skiptoolThe sidecar registers exactly one built-in tool that's always available to the LLM during ticks:
When the LLM calls it, the tick ends silently (no
insightevent). Askipevent is emitted with the reason for the audit trail.You don't register this — it's auto-added. Your prompt should instruct the model to call it when there's nothing useful to say (otherwise the model fills silence with low-quality advice).
sidecar_skipis not exposed during ad-hoc question handling (see Asking the sidecar a question) — there's no skipping a direct question.Persistence
global_datasurvives across AI / sidecar sessions on the same call leg via the SWML varai_agents_global_data. Both this sidecar and the regular<ai>agent share the convention.swml_serialized_vars, looks upai_agents_global_data, and merges the inherited tree on top of any verb-bodyglobal_data:(existing data wins on key collision — the channel's runtime state beats declarative defaults).${ai_agents_global_data.foo}.The SWML var holds the full nested object —
${ai_agents_global_data.deal.mrr},${ai_agents_global_data.lookups[0]}etc. expand normally.The flag is
app->persist_global_data, defaults true on both<ai>and<ai_sidecar>. There's no opt-out at the verb level today.Anti-loop guards
The tool loop has three defensive bail-outs to keep models from running away on misbehaved tools or hallucinating tool names:
errorevent witherror_reason: tool_loop. Break the loop.(tool_name, arguments)called 2+ times in a row{"response":"DUPLICATE: you already called this tool with these arguments earlier..."}— no webhook hit. Third repeat: bail withtool_loop.{"response":"'<name>' is not a valid tool. Available tools: <list>. Retry with one of these."}so the model can self-correct. After 3 phantom calls in a tick, bail.These run per-tick — counters reset between ticks. Asks (one-off questions) are bounded the same way.
The phantom-tool guard exists primarily because some providers (e.g. Groq) inject server-side tools the model can call that aren't in the request. mod_openai catches those before dispatch.
Tool webhook contract
When the LLM calls one of your SWAIG functions, mod_openai POSTs to your
web_hook_urlwith this body:{ "function": "lookup_competitor", "argument": "{\"competitor\":\"ACME\"}", "global_data": { ... current sidecar global_data ... }, "channel_data": { "call_id": "...", "caller_id_name": "...", "caller_id_number": "...", "destination_number": "..." }, "project_id": "...", "space_id": "..." }argumentis a JSON-string. Some paths send it as a dict underarguments— accept either.Your response is the standard
FunctionResultshape:{ "response": "ACME charges $99/seat. We're $79.", "action": [ { "user_event": { "topic": "sidecar.alert", "level": "info" } }, { "set_global_data": { "last_lookup": "ACME" } } ] }responsebecomes the tool result the LLM sees.actionis optional — entries are processed by the sidecar's action dispatcher and (ifact_on_channel: true) executed against the call. Same key (action, singular) the regular<ai>agent uses; arrays and single objects both work.Supported SWAIG actions
user_eventcalling.user_eventrelay event with your topic — primary mechanism for triggering UI alerts and intent navigationset_global_data/unset_global_dataglobal_data; emitsglobal_data_changeevent. Persisted across AI sessions on the same channel (see Persistence).set_meta_data/unset_meta_datatransferhangupstopsayactionevent withexecuted: false, reason: "no_tts_in_sidecar"back_to_back_functionsextensive_datasettingsswaig_allow_settingstoggle_functionsSWMLswaig_allow_swml) —transfer: truevariant terminates sidecaruser_inputact_on_channel: falseshort-circuits all of the above — actions still emit as events but don't execute.Events
The sidecar emits structured events on two paths:
calling.ai.sidecar— always fires. Subscribe via the SignalWire RELAY SDK / browser SDK to consume in real time.urlis set in the verb body. Each event is POSTed to that URL with the event body wrapped undersidecar_event.Both paths carry the same payload. Each event fires exactly once on the relay topic regardless of whether a webhook is configured — the relay always fires; the webhook is opt-in.
Webhook body shape
{ "call_info": { "project_id": "...", "space_id": "...", "call_id": "...", "content_type": "text/json", "content_disposition": "post_data", "conversation_type": "voice" }, "sidecar_event": { "type": "insight", "ts": 1745870400123456, "tick_id": 7, "channel_data": { ... }, "raw": "Confirm the customer's address.", "iter": 0, "total_iters": 1 } }The actual event is under
sidecar_event— unwrap that in your handler.Event types
typestartmodel,tools(array of names),global_dataturntranscript_delta,customer_text,agent_textrequestmodel,iter,messages_count,messages_token_count,tool_choice,triggered_by?thoughttext,iter,triggered_by?insightraw,iter,total_itersask_requestquestionask_answerraw,iter,total_iters,triggered_by: "ask"skipsidecar_skipto end the tick silentlyreasontool_callname,arguments,iter,triggered_by?tool_resultname,response,iter,triggered_by?actionaction,source_function,executedglobal_data_changeset_global_data/unset_global_datafiredkey,old_value,new_valuehistory_prunedtokens_before,tokens_after,dropped_counterrorerror_reason,detail,iter?stopstop_reasonfinalstop_reason,summary?,history,transcript,event_log,tool_calls,insights,stats,global_data,model,started_at,ended_at,duration_msEvents emitted during an ad-hoc question carry
triggered_by: "ask"onrequest,thought,tool_call, andtool_result— useful for distinguishing tick-driven activity from operator-driven activity in dashboards.Stop reasons (
final.stop_reason)transcribe_closetransferredtransferaction terminated the callhung_uphangupaction terminated the callstop_actionstopaction stopped the sidecar (transcribe continues)api_stopai-sidecar <uuid> stopwas called via FS APIerrorError reasons (
error.error_reason)tool_loopswml_not_allowedSWMLaction requested butswaig_allow_swml: falseerror_reasonAsking the sidecar a question
ai-sidecar <uuid> ask <question>lets a backend (typically the agent's UI) submit an out-of-band question to the sidecar while it's observing a call. The sidecar runs an LLM round-trip against an ephemeral snapshot of its helper-conversation history and emits the answer as a regularcalling.ai.sidecarevent withparams.type=ask_answer.The persistent helper history is not mutated by an ask. Snapshot built, run, discarded.
Fire-and-forget — the API returns
+OK queuedimmediately. Asks are FIFO-queued through the worker thread alongside ticks; the answer arrives over the event stream. Tools daisy-chain during an ask just like during a tick (capped bymax_iters_per_tick).sidecar_skipis dropped from the tool array exposed during an ask, and the head system message in the snapshot is replaced with an ask-mode framing telling the model to answer the agent's question directly.Use this for: "What's the customer's main concern?", "Is this customer ready to buy?", "Look up Acme again", "Did the customer mention timeline?"
Runtime control —
ai-sidecarFS APIAfter the sidecar is running, control it with the
ai-sidecarFS API verb:status+OK running=1 ticks=12 insights=9 skips=3 tools=4 errors=0 in_tokens=4521 out_tokens=812 history_size=23 event_log_bytes=18432poke <text>ask <question>+OK queuedimmediately; answer arrives as anask_answerevent. Does NOT mutate persistent history.stopfinalevent before returning.startis also supported for programmatic invocation when you want to attach a sidecar without using the SWML verb form:Same JSON body as the SWML verb. Returns
+OK startedor one of:-ERR: Invalid Call-ERR: Invalid JSON config-ERR: there is already a transcriber attached to this call-ERR: lang is required-ERR: sidecar requires both legs (remote-caller and local-caller)-ERR: failed to start transcribe-ERR: failed to build sidecar app (model resolution failed?)Using the SignalWire Python SDK
The signalwire-python SDK's
SWMLServicenatively hosts SWAIG functions and serves SWML — you can build a complete sidecar test app without subclassingAgentBase.Minimal example
That's a complete working sidecar service:
GET http://your-host:3000/sidecar→ returns the SWML doc with theai_sidecarverbPOST http://your-host:3000/sidecar/swaig→ receives SWAIG calls, dispatches to_lookup_competitor, returnsFunctionResultJSONPOST http://your-host:3000/sidecar/events→ receives every sidecar eventReturning UI alerts
Tools that should drive the agent's browser UI return a
user_eventaction — that fires acalling.user_eventrelay event the SDK widget subscribes to:Without an action, the tool's text only flows through the LLM — the agent UI sees nothing.
Hosting SWAIG and event sink under basic auth
If your
urlandweb_hook_urlembed basic-auth credentials (http://user:pass@host:port/...), pass the same(user, pass)tuple to the SWMLService constructor:mod_openai's webhook callbacks will use those credentials when posting back.
Best practices
Prompt design
sidecar_skip, the model will produce filler advice on every turn. Add: "Call sidecar_skip with a one-line reason when no advice is needed. Don't fill silence."Tool design
user_eventSWAIG action with a topic likesidecar.your_app.intent_name— your browser code subscribes to that topic. Without auser_eventaction, the tool result is invisible to the UI.user_eventaction — only the agent-facing audit log records them.errorevent.hintswith your competitor names, product names, and any custom jargon. ASR mishears these otherwise, and your tool triggers (which often match on names) miss.Rate / cost control
idle_timeout_mscontrols how aggressive the sidecar is. 200ms (the default) is snappy and ensures advice arrives before the agent fully responds; raise to 800–1500ms for slow consultative calls if you want the model to see fuller customer turns before reacting.min_interval_msis a hard throttle — set to 1000–2000ms to cap LLM cost on chatty calls.max_iters_per_tick: 5is usually fine. Increase only if your tools chain (tool A's result drives tool B). The anti-loop guards already prevent runaway tool calls.max_history_tokens: 8000is the helper-history budget, not the LLM context window. The sidecar prunes oldest messages when over.Production deployment
calling.ai.sidecar) is the low-latency UI channel and always fires. The webhook path (whenurlis set) is for reliable server-side logging — use both, one, or neither based on what your stack needs.finalevent is the right place to push the call summary to your CRM. It containssummary(helper-session AI summary, whenfinal_summary: true),transcript(raw call),history(LLM helper conversation),stats(token / tool-call counters),event_log, and finalglobal_data.Differences from the agent (
<ai>verb)<ai>agent<ai_sidecar>global_datapersistenceai_agents_global_dataSWML varai_agents_global_dataSWML var (same key, same convention)voice_ai_per_minuteai_sidecar_per_minute(only)<connect>A common pattern:
<ai_sidecar>to coach a human agent on a<connect>-bridged call. The customer talks to the human, the sidecar coaches the human's screen.Limitations and constraints
live_transcribeandai_sidecarare mutually exclusive. Calling either while the other is active returnsthere is already a transcriber attached to this call.direction: ["remote-caller"]is rejected.is_finalmarkers.sayaction is event-only in sidecar mode. The sidecar has no TTS handle; the action fires as an event withexecuted: false.finalevent fires and the worker exits.sidecar_skipis reserved. Don't define a function with that name.ask_answerevent on the relay/webhook channel. If you need synchronous Q&A semantics, your front-end has to correlate the ask submission with the answer event.Troubleshooting
SWML rejected as invalid
The platform validated your SWML and rejected it. The
calling.script.warningevent on the relay stream names the exact field — subscribe via the SignalWire SDK to seeparams.messageandparams.parameter. Common causes:promptorlang(both required)directiondoesn't include both legsminimum/maximumon an integer — drop it and put the constraint in the description; validate server-side in your handler)Events firing but type/tick are missing in your handler
You're reading
body.get("type")at the top level. The webhook payload wraps the event undersidecar_event. Unwrap:Sidecar starts but no insights, only skips
Either:
ai-sidecar … statusreturnsno sidecar attachedThe sidecar didn't start, or already stopped. The
startAPI returned a-ERR— most common:lang is required,sidecar requires both legs (remote-caller and local-caller), orthere is already a transcriber attached to this call.tool_looperrors in the event streamThe anti-loop guard tripped — the model called the same tool with the same arguments three times in a row, or made more than four total tool calls in one tick. The model is stuck. Check:
Phantom tool errors
The model called a tool name not in your SWAIG list. Most often happens with provider-injected server-side tools (Groq, some Anthropic configurations). The phantom-tool guard catches it and tells the model to retry with a valid name; if it persists, the tick bails. Check provider-specific tool routing settings or switch providers.
ai_agents_global_datanot being read by my SWML scriptMake sure you're reading from the SWML var, not a regular channel variable.
${ai_agents_global_data.foo}works in SWML expansion;${ai_agents_global_data}as a single channel variable does NOT (the data is a nested object in the SWML var tree, not a stringified channel var).High latency between customer turn and insight
Total latency = ASR finalize +
idle_timeout_ms+ LLM response time + any tool call round-trips. Reduceidle_timeout_ms(already at 200ms by default) only if you're on a faster pre-200ms path; ensure your SWAIG webhook responds in under a second; consider a faster model only if quality holds.