☎️ Phone Calling Agents Course ☎️

How to build an Agent Call Center using FastRTC, Superlinked, Twilio, Opik & RunPod

Table of Contents
Course Overview
Who is this course for?
Course Breakdown: Week by Week
Getting Started
Lesson 0: Project Overview and Architecture
Lesson 1: Building Realtime Voice Agents with FastRTC
Lesson 2: The Missing Layer in Modern AI Retrieval
Lesson 3: Improving STT and TTS Systems
Lesson 4: Deploying a multi-avatar Voice Agent with Full Tracing
The tech stack
Contributors
License

Course Overview

This isn't your typical plug-and-play tutorial where you spin up a demo in five minutes and call it a day.

Instead, we're building a real estate company, but with a twist … the employees will be realtime voice agents!

By the end of this course, you'll have a system capable of:

☎️ Receive inbound calls with Twilio
📞 Make outbound calls through Twilio
🏠 Search live property data using Superlinked
⚡ Run realtime conversations powered by FastRTC
🗣️ Transcribe speech instantly with Moonshine + Fast Whisper
🎙️ Generate lifelike voices using Kokoro + Orpheus 3B
🚀 Deploy open-source models on Runpod for GPU acceleration

Excited? Let's get started!

📬 Stay Updated

Join The Neural Maze and learn to build AI Systems that actually work, from principles to production. Every Wednesday, directly to your inbox. Don't miss out!

🎥 Watch More Content

Join Jesús Copado on YouTube to explore how to build real AI projects—from voice agents to creative tools. Weekly videos with code, demos, and ideas that push what's possible with AI. Don't miss the next drop!

Who is this course for?

This course is for Software Engineers, ML Engineers, and AI Engineers who want to level up by building complex end-to-end apps. It's not just a basic "Hello World" tutorial—it's a deep dive into making production-ready voice agents.

Course Breakdown: Week by Week

Each week, you'll unlock a new chapter of the journey. You'll get:

🧾 A Substack article that walks through the concepts and code in detail
💻 A new batch of code pushed directly to this repo
🎥 A Live Session where we explore everything together

Here’s what the upcoming weeks look like 👇

Lesson Number	Title	Code
0	Project overview and architecture	Week 0
1	Building Realtime Voice Agents with FastRTC	Week 1
2	The Missing Layer in Modern AI Retrieval	Week 2
3	Improving STT and TTS Systems	Week 3
4	Deploying a multi-avatar Voice Agent with Full Tracing	Week 4

Getting Started

Before diving into the lessons, make sure you have everything set up properly:

📋 Initial Setup: Follow the instructions in docs/GETTINGS_STARTED.md to configure your environment and install dependencies.
📚 Learn Lesson by Lesson: Once setup is complete, come back here and follow the lessons in order.

Each lesson builds on the previous one, so it's important to follow them sequentially!

Lesson 0: Project Overview and Architecture

Goal: Understand the big picture and architecture of the realtime phone agent system.

Steps:

📖 Read the Substack article to understand the overall architecture
🎥 Watch the Live Session recording for a deeper dive

This lesson sets the foundation for everything that follows!

Lesson 1: Building Realtime Voice Agents with FastRTC

Goal: Build your first working voice agent using FastRTC and integrate it with Twilio.

Steps:

📖 Read the Article: Start with the Substack article to understand FastRTC fundamentals
📓 Work Through the Notebook: Open and run through notebooks/lesson_1_fastrtc_agents.ipynb to get hands-on experience
💻 Explore the Code: Dive into the repository code to see how everything is implemented
🚀 Run the Applications: Try both deployment options:

Option A: Gradio Application (Quick Demo)

Run the Gradio interface (check out demo videos in the Substack article):

make start-gradio-application

This starts an interactive web interface where you can test the voice agent locally.

NOTE: If you get the error 'No such file or directory: 'ffprobe', just install ffmpeg in your system to solve it

Option B: FastAPI Call Center (Production-Ready)

For a production-ready setup that can receive real phone calls:

Step 1: Start the call center application

make start-call-center

This starts a FastAPI application using Docker Compose on port 8000.

Step 2: Expose your local server to the internet

make start-ngrok-tunnel

Or manually:

ngrok http 8000

Step 3: Connect to Twilio

Follow the instructions in the article to:

Configure your Twilio account
Connect your ngrok URL to Twilio
Start receiving real phone calls!

Lesson 2: The Missing Layer in Modern AI Retrieval

Goal: Learn how to implement advanced search capabilities for realtime voice agents using Superlinked to handle complex, multi-attribute queries.

Steps:

📖 Read the Article: Start with the Substack article to understand:
- Why traditional vector search isn't enough for multi-attribute queries
- How Superlinked combines different data types (text, numbers, categories) into a unified search space
- The limitations of metadata filters, multiple searches, and re-ranking approaches
📓 Work Through the Notebook: Open and run through notebooks/lesson_2_superlinked_property_search.ipynb to learn:
- How to define different Space types (TextSimilaritySpace, NumberSpace, CategoricalSimilaritySpace)
- How to combine spaces into a single searchable index
- How to dynamically adjust weights at query time
💻 Explore the Code: Dive into the repository to see how Superlinked integrates with our voice agent:
- Check out src/realtime_phone_agents/infrastructure/superlinked/ for the implementation
- Review src/realtime_phone_agents/agent/tools/property_search.py to see how the search tool is exposed to the agent
We'll explore the code in detail during the Live Session!
🚀 Test the Complete System: Now it's time to see everything work together!

Step 1: Start the call center application
```
make start-call-center
```
Step 2: Expose your local server (if not already running)
```
make start-ngrok-tunnel
```
Step 3: Call your Twilio number and test the property search

Try asking the agent:

"Do you have apartments in Barrio de Salamanca of at most 900,000 euros?"

Wait for the response. The agent should find and return information about the only apartment in the dataset (data/properties.csv) that meets these criteria!

This demonstrates how the voice agent can now handle complex queries combining location (Barrio de Salamanca) and price constraints (≤ €900,000) in real-time.

Lesson 3: Improving STT and TTS Systems

Goal: Improve the quality of STT and TTS systems used in the voice agent.

Steps:

📖 Read the Article: Start with the Substack article to understand the fundamentals of STT and TTS systems, and how to deploy them on Runpod.
📓 Work Through the Notebook: Open and run through notebooks/lesson_3_stt_tts.ipynb to experience how the new faster-whisper and Orpheus 3B deployments look like.
💻 Explore the Code: It's time to see the additions for week 3. Check out the new stt/ and tts/ modules in src/realtime_phone_agents/:
- STT (Speech-to-Text):
  - local/: Implementation using Moonshine for local inference.
  - groq/: Integration with Groq's fast inference API.
  - runpod/: Self-hosted Faster Whisper implementation.
- TTS (Text-to-Speech):
  - local/: Implementation using Kokoro for high-quality local synthesis.
  - togetherai/: Integration with Together AI.
  - runpod/: Self-hosted Orpheus 3B implementation.
🐳 New Docker Images: We've added two new Dockerfiles to deploy our custom models on RunPod:
- Dockerfile.faster_whisper: Builds a container for the Faster Whisper model (large-v3). It uses the speaches-ai/speaches base image and pre-downloads the model for faster startup.
- Dockerfile.orpheus: Builds a container for the Orpheus 3B model using llama.cpp server with CUDA support, optimized for real-time speech generation.
🚀 Deploy & Interact: Ready to test these models? Follow these steps:

⚠️ IMPORTANT: Before proceeding, ensure you have completed the setup in docs/GETTING_STARTED.md. This includes setting up your API keys and environment variables (especially for RunPod).

Step 1: Deploy to RunPod

Use the Makefile commands to spin up your GPU pods:
```
# Deploy Faster Whisper
make create-faster-whisper-pod

# Deploy Orpheus 3B
make create-orpheus-pod
```
Note: These scripts will automatically print the endpoint URLs once the pods are ready. Make sure to update your .env file with these URLs!

Step 2: Start the Gradio App

Launch the interactive interface to test different combinations:
```
make start-gradio-application
```
Step 3: Experiment!

In the Gradio interface, you can mix and match different implementations:
- STT Options:
  - Moonshine (Local)
  - Whisper (Groq API)
  - Faster Whisper (RunPod - requires Step 1)
- TTS Options:
  - Kokoro (Local)
  - Orpheus (Together AI API)
  - Orpheus (RunPod - requires Step 1)

Lesson 4: Deploying a multi-avatar Voice Agent with Full Tracing

Goal: Deploy a production-ready call center with multiple avatars, full tracing, and Twilio integration for inbound and outbound calls.

Steps:

📖 Read the Article: Start with the Substack article to understand:
- How to build a multi-avatar system with different personas
- How to implement full tracing of every interaction using Opik
- How to version prompts and store transcribed conversations
- How to deploy to Runpod and integrate with Twilio
📓 Work Through the Notebook: Open and run through notebooks/lesson_4_avatar_system.ipynb to explore:
- How to define and work with different avatars
- How each avatar has its own personality, style, and voice
- How to fetch and use avatars in your application
💻 Explore the Code: Check out the new additions for week 4:
- Avatar System (src/realtime_phone_agents/avatars/):
  - base.py: Base Avatar class with system prompt generation and versioning
  - registry.py: Utility to list, fetch, and manage avatars
  - definitions/: YAML files defining each avatar's personality (dan, jess, leah, leo, mia, tara, zac, zoe)
- Observability (src/realtime_phone_agents/observability/):
  - opik_utils.py: Utilities for tracing with Opik
  - prompt_versioning.py: System for versioning all prompts
- Updated Agent (src/realtime_phone_agents/agent/fastrtc_agent.py):
  - Added @opik.track decorators to trace every method in the pipeline
  - Tracks STT transcription, LLM responses, tool calls, and TTS generation
  - Stores complete conversation threads in Opik
🚀 Deploy to Production: Time to deploy your call center to the cloud!
⚠️ IMPORTANT: Make sure your .env file includes all required variables from docs/GETTING_STARTED.md, including:
- Opik API key for tracing
- Qdrant Cloud credentials
- Twilio credentials
- Runpod API key
- All STT/TTS model configurations
Step 1: Deploy the Call Center to Runpod
```
make create-call-center-pod
```
This will deploy your FastAPI application to Runpod and give you a URL like:
```
https://your-pod-id.proxy.runpod.net
```
Step 2: Ingest Properties to Qdrant Cloud
```
make ingest-properties
```
This populates your Qdrant Cloud cluster with property data for the agent to search.

Step 3: Configure Twilio
- Go to your Twilio TwiML App
- Replace your ngrok URL with your Runpod URL:
```
https://your-pod-id.proxy.runpod.net/voice/telephone/incoming
```
- Save the configuration
Step 4: Test Inbound Calls

Call your Twilio number and interact with your deployed agent! The system will:
- Answer with the avatar you configured (AVATAR_NAME in .env)
- Search properties using Superlinked
- Trace every interaction in Opik
- Store the full conversation
Step 5: Make Outbound Calls

You can also make outbound calls programmatically:
```
make outbound-call
```
This will trigger a call from your agent to the specified number!
📊 Monitor with Opik: Open your Opik dashboard to see:
- Traces: Every step of the conversation pipeline with timing information
- Threads: Complete transcribed conversations stored for analysis
- Prompts: Versioned system prompts for each avatar
You'll be able to track:
- How long transcription takes
- LLM response times
- Tool call performance
- TTS generation speed
- Complete conversation flow

The tech stack

Technology	Description
	The python library for real-time communication.
	Superlinked is a Python framework for AI Engineers building high-performance search & recommendation applications that combine structured and unstructured data.
	The end-to-end AI cloud that simplifies building and deploying models.
	Debug, evaluate, and monitor your LLM applications, RAG systems, and agentic workflows with comprehensive tracing, automated evaluations, and production-ready dashboards.
	Twilio is a cloud communications platform that enables developers to build, manage, and automate voice, text, video, and other communication services through APIs.

Contributors

	Miguel Otero Pedrido \| Senior ML / AI Engineer Founder of The Neural Maze. Rick and Morty fan. LinkedIn YouTube The Neural Maze Newsletter
	Jesús Copado \| Senior ML / AI Engineer Equal parts cinema fan and AI enthusiast. YouTube LinkedIn

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
.github/workflows		.github/workflows
data		data
docs		docs
notebooks		notebooks
scripts		scripts
src/realtime_phone_agents		src/realtime_phone_agents
static		static
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
Dockerfile		Dockerfile
Dockerfile.faster_whisper		Dockerfile.faster_whisper
Dockerfile.orpheus		Dockerfile.orpheus
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

☎️ Phone Calling Agents Course ☎️

How to build an Agent Call Center using FastRTC, Superlinked, Twilio, Opik & RunPod

Table of Contents

Course Overview

📬 Stay Updated

🎥 Watch More Content

Who is this course for?

Course Breakdown: Week by Week

Getting Started

Lesson 0: Project Overview and Architecture

Steps:

Lesson 1: Building Realtime Voice Agents with FastRTC

Steps:

Option A: Gradio Application (Quick Demo)

Option B: FastAPI Call Center (Production-Ready)

Lesson 2: The Missing Layer in Modern AI Retrieval

Steps:

Lesson 3: Improving STT and TTS Systems

Steps:

Lesson 4: Deploying a multi-avatar Voice Agent with Full Tracing

Steps:

The tech stack

Contributors

License

About

Uh oh!

Releases 5

Packages

Contributors 3

Languages

License

neural-maze/realtime-phone-agents-course

Folders and files

Latest commit

History

Repository files navigation

☎️ Phone Calling Agents Course ☎️

How to build an Agent Call Center using FastRTC, Superlinked, Twilio, Opik & RunPod

Table of Contents

Course Overview

📬 Stay Updated

🎥 Watch More Content

Who is this course for?

Course Breakdown: Week by Week

Getting Started

Lesson 0: Project Overview and Architecture

Steps:

Lesson 1: Building Realtime Voice Agents with FastRTC

Steps:

Option A: Gradio Application (Quick Demo)

Option B: FastAPI Call Center (Production-Ready)

Lesson 2: The Missing Layer in Modern AI Retrieval

Steps:

Lesson 3: Improving STT and TTS Systems

Steps:

Lesson 4: Deploying a multi-avatar Voice Agent with Full Tracing

Steps:

The tech stack

Contributors

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 5

Packages 0

Contributors 3

Languages

Packages