deepclaw

Call your OpenClaw over the phone using the Deepgram Voice Agent API.

Why Deepgram?

	ElevenLabs	Deepgram
Turn detection	VAD-based	Semantic (Flux)
TTS latency	~200ms TTFB	90ms TTFB
TTS price	$0.050/1K chars	$0.030/1K chars
Barge-in	Basic VAD	Native StartOfTurn

Deepgram Flux understands when you're done talking semantically and acoustically—not just when you stop making noise. This means fewer awkward interruptions and faster responses.

Provider Comparison: Twilio vs Telnyx

Feature	Twilio	Telnyx
Setup Complexity	Moderate	Easy
Phone Number Cost	~$1/month	~$0.50-$2/month
Call Pricing	$0.085/min	$0.005-$0.025/min
Media Streaming	WebSocket + TwiML	WebSocket + REST API
Authentication	Account SID + Auth Token	API Key + Public Key
Documentation	Extensive	Growing
Global Coverage	Excellent	Excellent

Recommendation:

Twilio: Better for production apps with extensive docs and ecosystem
Telnyx: More cost-effective, simpler API, better for experimenting

How It Works

deepclaw uses the Deepgram Voice Agent API—a single WebSocket that handles STT, TTS, turn-taking, and barge-in together.

Phone Call → Twilio/Telnyx → deepclaw ←──WebSocket──→ Deepgram Voice Agent API
                                │                      (Flux STT + Aura-2 TTS)
                                │
                                ↓
                           OpenClaw (LLM)

You call your phone number
Twilio/Telnyx streams audio to deepclaw via WebSocket
deepclaw forwards audio to Deepgram Voice Agent API
Flux transcribes with semantic turn detection
Deepgram calls your LLM endpoint (OpenClaw via deepclaw proxy)
Aura-2 speaks the response, streamed back through your phone provider

Barge-in support: Start talking while the assistant is speaking and it stops immediately—handled natively by the Voice Agent API.

Quick Setup (Let OpenClaw Do It)

The easiest way to set up deepclaw is to let your OpenClaw do it for you:

# Copy the skill to your OpenClaw
cp -r skills/deepclaw-voice ~/.openclaw/skills/

Then tell your OpenClaw: "I want to call you on the phone"

OpenClaw will walk you through:

Creating a Deepgram account (free $200 credit)
Setting up a Twilio phone number (~$1/month)
Configuring everything automatically

Manual Setup

Prerequisites

Python 3.10+
Deepgram account (free tier available, $200 credit)
Phone Provider (choose one):
- Twilio account with a phone number (~$1/month)
- Telnyx account with a phone number (~$0.50-$2/month)
OpenClaw running locally
ngrok for exposing your local server

1. Clone and install

git clone https://github.com/deepgram/deepclaw.git
cd deepclaw
pip install -e .

2. Configure environment

cp .env.example .env

Edit .env with your credentials:

For Twilio (default):

DEEPGRAM_API_KEY=your_deepgram_api_key
VOICE_PROVIDER=twilio
TWILIO_ACCOUNT_SID=your_twilio_account_sid
TWILIO_AUTH_TOKEN=your_twilio_auth_token
OPENCLAW_GATEWAY_URL=http://127.0.0.1:18789
OPENCLAW_GATEWAY_TOKEN=your_openclaw_gateway_token

For Telnyx:

DEEPGRAM_API_KEY=your_deepgram_api_key
VOICE_PROVIDER=telnyx
TELNYX_API_KEY=your_telnyx_api_key
TELNYX_PUBLIC_KEY=your_telnyx_public_key
OPENCLAW_GATEWAY_URL=http://127.0.0.1:18789
OPENCLAW_GATEWAY_TOKEN=your_openclaw_gateway_token

3. Configure OpenClaw

Enable the chat completions endpoint in your openclaw.json:

openclaw config set gateway.http.endpoints.chatCompletions.enabled true

deepclaw automatically creates a dedicated voice agent in OpenClaw that uses a fast model (Claude Haiku 4.5) for low-latency phone conversations. This happens on first python -m deepclaw launch. To customize the model:

# Use a different model for voice (optional)
export OPENCLAW_VOICE_MODEL=anthropic/claude-sonnet-4-5-20250929

Or create the agent manually:

openclaw agents add voice --model anthropic/claude-haiku-4-5-20251001

4. Start the tunnel

ngrok http 8000

Note your ngrok URL (e.g., https://abc123.ngrok-free.app).

5. Configure Your Phone Provider

Option A: Configure Twilio

Go to your Twilio Console
Navigate to Phone Numbers → Manage → Active Numbers
Click your number
Under "Voice Configuration":
- Set "A Call Comes In" to Webhook
- URL: https://your-ngrok-url.ngrok-free.app/twilio/incoming
- Method: POST
Save

Option B: Configure Telnyx

Go to your Telnyx Mission Control Portal
Navigate to Voice → Programmable Voice
Create a new Voice API Application:
- Application Name: deepclaw-voice
- Webhook URL: https://your-ngrok-url.ngrok-free.app/telnyx/webhook
- Webhook API Version: API v2 (recommended)
- Webhook Failover URL: (optional) same as webhook URL
Click Create
Go to Numbers → My Numbers
Click your phone number
Under Voice Settings:
- Connection: Select your deepclaw-voice application
Save configuration

Getting Telnyx API Keys

In Mission Control Portal, go to API Keys
Create a new API Key or copy existing one
For Public Key: Go to Account → Public Key and copy the key

6. Start deepclaw

python -m deepclaw

7. Call your number

Pick up the phone and talk to your OpenClaw!

Architecture

┌─────────────┐     ┌──────────────────────────────────────────────────────┐
│   Caller    │     │                   Your Machine                        │
│  (Phone)    │     │                                                       │
└──────┬──────┘     │  ┌───────────┐   ┌───────────┐   ┌───────────────┐   │
       │            │  │ Twilio or │   │ deepclaw  │   │   OpenClaw    │   │
       │ PSTN       │  │  Telnyx   │──▶│  Server   │──▶│   Gateway     │   │
       │            │  │ Webhook   │   └─────┬─────┘   └───────────────┘   │
       ▼            │  └───────────┘         │                              │
┌──────────────┐    │                        │ WebSocket                    │
│   Twilio/    │◀───┼────────────────────────┤                              │
│   Telnyx     │    │                        ▼                              │
│ (SIP/Media)  │    │              ┌───────────────────┐                    │
└──────────────┘    │              │ Deepgram Voice    │                    │
       │            │              │ Agent API         │                    │
       │  Audio     │              │ • Flux (STT)      │                    │
       └────────────┼─────────────▶│ • Aura-2 (TTS)    │                    │
                    │              │ • Turn detection  │                    │
                    │              │ • Barge-in        │                    │
                    │              └───────────────────┘                    │
                    └──────────────────────────────────────────────────────┘

The Voice Agent API handles the entire speech pipeline in a single WebSocket connection. deepclaw bridges Twilio's media stream to the Voice Agent API and proxies LLM requests to your local OpenClaw.

Customizing Voice

deepclaw uses Deepgram Aura-2 TTS with 80+ voices in 7 languages. Edit voice_agent_server.py:

"speak": {
    "provider": {
        "type": "deepgram",
        "model": "aura-2-orion-en",  # Change voice here
    },
},

Popular voices:

Voice	Style
`aura-2-thalia-en`	Feminine, American (default)
`aura-2-orion-en`	Masculine, American
`aura-2-draco-en`	Masculine, British
`aura-2-estrella-es`	Feminine, Mexican Spanish
`aura-2-fabian-de`	Masculine, German

See skills/deepclaw-voice/SKILL.md for the complete voice list (80+ voices in 7 languages), or test voices at https://playground.deepgram.com/

Security Considerations

Be aware of these security considerations when using OpenClaw and deepclaw. Like the rest of OpenClaw, use at your own risk.

1. LLM proxy endpoint has no authentication

The /v1/chat/completions endpoint is unauthenticated
Anyone who discovers your ngrok URL can use your OpenClaw/Anthropic API credits
Mitigation: Keep your ngrok URL private. Consider using a fixed ngrok domain.

2. No Twilio signature validation

Incoming webhook requests are not verified as coming from Twilio
Mitigation: For production, add Twilio request validation

3. Credentials in .env file

API keys and tokens are stored in plaintext
Mitigation: The file is gitignored. Set restrictive permissions: chmod 600 .env

4. ngrok exposes your local machine

Your server is accessible from the internet while running
Mitigation: Only run when needed. Use ngrok's IP allowlist on paid plans.

For production deployments, consider:

Adding Twilio signature validation
Running behind a reverse proxy with rate limiting
Using a dedicated server instead of ngrok
Implementing proper authentication on the LLM proxy

Known Limitations

OpenClaw streaming latency: OpenClaw's /v1/chat/completions endpoint currently buffers responses for ~5 seconds before streaming begins. This adds latency to LLM responses regardless of which voice provider you use (Deepgram, ElevenLabs, etc.).

The initial greeting is instant (generated by Deepgram), but subsequent responses wait for OpenClaw's buffer.

This is an upstream limitation. OpenClaw's native WebSocket agent endpoint streams properly, but external voice APIs require the OpenAI-compatible chat completions endpoint.

Coming Soon

Local wake-word mode — Talk to OpenClaw hands-free at your desk, no phone needed
One-click desktop installer — No terminal required
Native OpenClaw plugin — Install with one command

Getting Help

If you run into issues:

Check existing issues: Search GitHub Issues to see if your problem has been reported
Open a new issue: Include:
- What you were trying to do
- What happened instead
- Server logs (redact any API keys)
- Your environment (OS, Python version, OpenClaw version)
Deepgram support: For Deepgram-specific issues, visit Deepgram's Community

Contributing

Contributions are welcome! Here's how to help:

Reporting Bugs

Open an issue with:

Clear description of the bug
Steps to reproduce
Expected vs actual behavior
Logs and environment info

Suggesting Features

Open an issue describing:

The problem you're trying to solve
Your proposed solution
Any alternatives you've considered

Pull Requests

Fork the repo
Create a feature branch (git checkout -b feature/amazing-feature)
Make your changes
Test thoroughly
Commit with clear messages
Push to your fork
Open a PR against main

Note: The main branch is protected. All changes require a pull request and review.

Code Style

Follow existing code patterns
Add comments for complex logic
Update documentation for user-facing changes

License

MIT

Credits

Built with:

Deepgram Voice Agent API — Real-time conversational AI pipeline
Deepgram Flux — Semantic speech recognition
Deepgram Aura-2 — Low-latency text-to-speech
OpenClaw — Open-source AI assistant
Twilio — Phone infrastructure

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
deepclaw		deepclaw
docs/plans		docs/plans
skills/deepclaw-voice		skills/deepclaw-voice
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml

deepgram/deepclaw

Folders and files

Latest commit

History

Repository files navigation