- Table of Contents
- Course Overview
- Who is this course for?
- Course Breakdown: Week by Week
- Getting Started
- Lesson 0: Project Overview and Architecture
- Lesson 1: Building Realtime Voice Agents with FastRTC
- Lesson 2: The Missing Layer in Modern AI Retrieval
- Lesson 3: Improving STT and TTS Systems
- Lesson 4: Deploying a multi-avatar Voice Agent with Full Tracing
- The tech stack
- Contributors
- License
This isn't your typical plug-and-play tutorial where you spin up a demo in five minutes and call it a day.
Instead, we're building a real estate company, but with a twist β¦ the employees will be realtime voice agents!
By the end of this course, you'll have a system capable of:
- βοΈ Receive inbound calls with Twilio
- π Make outbound calls through Twilio
- π Search live property data using Superlinked
- β‘ Run realtime conversations powered by FastRTC
- π£οΈ Transcribe speech instantly with Moonshine + Fast Whisper
- ποΈ Generate lifelike voices using Kokoro + Orpheus 3B
- π Deploy open-source models on Runpod for GPU acceleration
Excited? Let's get started!
|
|
Join The Neural Maze and learn to build AI Systems that actually work, from principles to production. Every Wednesday, directly to your inbox. Don't miss out! |
|
Join JesΓΊs Copado on YouTube to explore how to build real AI projectsβfrom voice agents to creative tools. Weekly videos with code, demos, and ideas that push what's possible with AI. Don't miss the next drop! |
This course is for Software Engineers, ML Engineers, and AI Engineers who want to level up by building complex end-to-end apps. It's not just a basic "Hello World" tutorialβit's a deep dive into making production-ready voice agents.
Each week, you'll unlock a new chapter of the journey. You'll get:
- π§Ύ A Substack article that walks through the concepts and code in detail
- π» A new batch of code pushed directly to this repo
- π₯ A Live Session where we explore everything together
Hereβs what the upcoming weeks look like π
| Lesson Number | Title | Article | Code | Live Session |
|---|---|---|---|---|
0 |
Project overview and architecture | ![]() |
Week 0 | |
1 |
Building Realtime Voice Agents with FastRTC | ![]() |
Week 1 | |
2 |
The Missing Layer in Modern AI Retrieval | ![]() |
Week 2 | |
3 |
Improving STT and TTS Systems | ![]() |
Week 3 | |
4 |
Deploying a multi-avatar Voice Agent with Full Tracing | ![]() |
Week 4 |
Before diving into the lessons, make sure you have everything set up properly:
- π Initial Setup: Follow the instructions in
docs/GETTINGS_STARTED.mdto configure your environment and install dependencies. - π Learn Lesson by Lesson: Once setup is complete, come back here and follow the lessons in order.
Each lesson builds on the previous one, so it's important to follow them sequentially!
Goal: Understand the big picture and architecture of the realtime phone agent system.
- π Read the Substack article to understand the overall architecture
- π₯ Watch the Live Session recording for a deeper dive
This lesson sets the foundation for everything that follows!
Goal: Build your first working voice agent using FastRTC and integrate it with Twilio.
- π Read the Article: Start with the Substack article to understand FastRTC fundamentals
- π Work Through the Notebook: Open and run through
notebooks/lesson_1_fastrtc_agents.ipynbto get hands-on experience - π» Explore the Code: Dive into the repository code to see how everything is implemented
- π Run the Applications: Try both deployment options:
Run the Gradio interface (check out demo videos in the Substack article):
make start-gradio-applicationThis starts an interactive web interface where you can test the voice agent locally.
NOTE: If you get the error 'No such file or directory: 'ffprobe', just install ffmpeg in your system to solve it
For a production-ready setup that can receive real phone calls:
Step 1: Start the call center application
make start-call-centerThis starts a FastAPI application using Docker Compose on port 8000.
Step 2: Expose your local server to the internet
make start-ngrok-tunnelOr manually:
ngrok http 8000Step 3: Connect to Twilio
Follow the instructions in the article to:
- Configure your Twilio account
- Connect your ngrok URL to Twilio
- Start receiving real phone calls!
Goal: Learn how to implement advanced search capabilities for realtime voice agents using Superlinked to handle complex, multi-attribute queries.
-
π Read the Article: Start with the Substack article to understand:
- Why traditional vector search isn't enough for multi-attribute queries
- How Superlinked combines different data types (text, numbers, categories) into a unified search space
- The limitations of metadata filters, multiple searches, and re-ranking approaches
-
π Work Through the Notebook: Open and run through
notebooks/lesson_2_superlinked_property_search.ipynbto learn:- How to define different Space types (TextSimilaritySpace, NumberSpace, CategoricalSimilaritySpace)
- How to combine spaces into a single searchable index
- How to dynamically adjust weights at query time
-
π» Explore the Code: Dive into the repository to see how Superlinked integrates with our voice agent:
- Check out
src/realtime_phone_agents/infrastructure/superlinked/for the implementation - Review
src/realtime_phone_agents/agent/tools/property_search.pyto see how the search tool is exposed to the agent
We'll explore the code in detail during the Live Session!
- Check out
-
π Test the Complete System: Now it's time to see everything work together!
Step 1: Start the call center application
make start-call-center
Step 2: Expose your local server (if not already running)
make start-ngrok-tunnel
Step 3: Call your Twilio number and test the property search
Try asking the agent:
"Do you have apartments in Barrio de Salamanca of at most 900,000 euros?"
Wait for the response. The agent should find and return information about the only apartment in the dataset (
data/properties.csv) that meets these criteria!This demonstrates how the voice agent can now handle complex queries combining location (Barrio de Salamanca) and price constraints (β€ β¬900,000) in real-time.
Goal: Improve the quality of STT and TTS systems used in the voice agent.
-
π Read the Article: Start with the Substack article to understand the fundamentals of STT and TTS systems, and how to deploy them on Runpod.
-
π Work Through the Notebook: Open and run through
notebooks/lesson_3_stt_tts.ipynbto experience how the newfaster-whisperandOrpheus 3Bdeployments look like. -
π» Explore the Code: It's time to see the additions for
week 3. Check out the newstt/andtts/modules insrc/realtime_phone_agents/:-
STT (Speech-to-Text):
local/: Implementation using Moonshine for local inference.groq/: Integration with Groq's fast inference API.runpod/: Self-hosted Faster Whisper implementation.
-
TTS (Text-to-Speech):
local/: Implementation using Kokoro for high-quality local synthesis.togetherai/: Integration with Together AI.runpod/: Self-hosted Orpheus 3B implementation.
-
-
π³ New Docker Images: We've added two new Dockerfiles to deploy our custom models on RunPod:
Dockerfile.faster_whisper: Builds a container for the Faster Whisper model (large-v3). It uses thespeaches-ai/speachesbase image and pre-downloads the model for faster startup.Dockerfile.orpheus: Builds a container for the Orpheus 3B model usingllama.cppserver with CUDA support, optimized for real-time speech generation.
-
π Deploy & Interact: Ready to test these models? Follow these steps:
β οΈ IMPORTANT: Before proceeding, ensure you have completed the setup indocs/GETTING_STARTED.md. This includes setting up your API keys and environment variables (especially for RunPod).Step 1: Deploy to RunPod
Use the Makefile commands to spin up your GPU pods:
# Deploy Faster Whisper make create-faster-whisper-pod # Deploy Orpheus 3B make create-orpheus-pod
Note: These scripts will automatically print the endpoint URLs once the pods are ready. Make sure to update your
.envfile with these URLs!Step 2: Start the Gradio App
Launch the interactive interface to test different combinations:
make start-gradio-application
Step 3: Experiment!
In the Gradio interface, you can mix and match different implementations:
-
STT Options:
Moonshine(Local)Whisper(Groq API)Faster Whisper(RunPod - requires Step 1)
-
TTS Options:
Kokoro(Local)Orpheus(Together AI API)Orpheus(RunPod - requires Step 1)
-
Goal: Deploy a production-ready call center with multiple avatars, full tracing, and Twilio integration for inbound and outbound calls.
-
π Read the Article: Start with the Substack article to understand:
- How to build a multi-avatar system with different personas
- How to implement full tracing of every interaction using Opik
- How to version prompts and store transcribed conversations
- How to deploy to Runpod and integrate with Twilio
-
π Work Through the Notebook: Open and run through
notebooks/lesson_4_avatar_system.ipynbto explore:- How to define and work with different avatars
- How each avatar has its own personality, style, and voice
- How to fetch and use avatars in your application
-
π» Explore the Code: Check out the new additions for
week 4:-
Avatar System (
src/realtime_phone_agents/avatars/):base.py: Base Avatar class with system prompt generation and versioningregistry.py: Utility to list, fetch, and manage avatarsdefinitions/: YAML files defining each avatar's personality (dan, jess, leah, leo, mia, tara, zac, zoe)
-
Observability (
src/realtime_phone_agents/observability/):opik_utils.py: Utilities for tracing with Opikprompt_versioning.py: System for versioning all prompts
-
Updated Agent (
src/realtime_phone_agents/agent/fastrtc_agent.py):- Added
@opik.trackdecorators to trace every method in the pipeline - Tracks STT transcription, LLM responses, tool calls, and TTS generation
- Stores complete conversation threads in Opik
- Added
-
-
π Deploy to Production: Time to deploy your call center to the cloud!
β οΈ IMPORTANT: Make sure your.envfile includes all required variables fromdocs/GETTING_STARTED.md, including:- Opik API key for tracing
- Qdrant Cloud credentials
- Twilio credentials
- Runpod API key
- All STT/TTS model configurations
Step 1: Deploy the Call Center to Runpod
make create-call-center-pod
This will deploy your FastAPI application to Runpod and give you a URL like:
https://your-pod-id.proxy.runpod.netStep 2: Ingest Properties to Qdrant Cloud
make ingest-properties
This populates your Qdrant Cloud cluster with property data for the agent to search.
Step 3: Configure Twilio
- Go to your Twilio TwiML App
- Replace your ngrok URL with your Runpod URL:
https://your-pod-id.proxy.runpod.net/voice/telephone/incoming - Save the configuration
Step 4: Test Inbound Calls
Call your Twilio number and interact with your deployed agent! The system will:
- Answer with the avatar you configured (
AVATAR_NAMEin.env) - Search properties using Superlinked
- Trace every interaction in Opik
- Store the full conversation
Step 5: Make Outbound Calls
You can also make outbound calls programmatically:
make outbound-call
This will trigger a call from your agent to the specified number!
-
π Monitor with Opik: Open your Opik dashboard to see:
- Traces: Every step of the conversation pipeline with timing information
- Threads: Complete transcribed conversations stored for analysis
- Prompts: Versioned system prompts for each avatar
You'll be able to track:
- How long transcription takes
- LLM response times
- Tool call performance
- TTS generation speed
- Complete conversation flow
![]() |
Miguel Otero Pedrido | Senior ML / AI Engineer Founder of The Neural Maze. Rick and Morty fan. YouTube The Neural Maze Newsletter |
![]() |
JesΓΊs Copado | Senior ML / AI Engineer Equal parts cinema fan and AI enthusiast. YouTube |
This project is licensed under the MIT License - see the LICENSE file for details.












