Skip to content

Build realtime AI voice agents using FastRTC for low-latency streaming, Superlinked for vector search, Twilio for live phone calls, and Runpod for scalable GPU deployment.

License

Notifications You must be signed in to change notification settings

neural-maze/realtime-phone-agents-course

Repository files navigation

☎️ Phone Calling Agents Course ☎️

How to build an Agent Call Center using FastRTC, Superlinked, Twilio, Opik & RunPod


Architecture

Table of Contents

Course Overview

This isn't your typical plug-and-play tutorial where you spin up a demo in five minutes and call it a day.

Instead, we're building a real estate company, but with a twist … the employees will be realtime voice agents!

By the end of this course, you'll have a system capable of:

  • ☎️ Receive inbound calls with Twilio
  • πŸ“ž Make outbound calls through Twilio
  • 🏠 Search live property data using Superlinked
  • ⚑ Run realtime conversations powered by FastRTC
  • πŸ—£οΈ Transcribe speech instantly with Moonshine + Fast Whisper
  • πŸŽ™οΈ Generate lifelike voices using Kokoro + Orpheus 3B
  • πŸš€ Deploy open-source models on Runpod for GPU acceleration

Excited? Let's get started!


The Neural Maze Logo

πŸ“¬ Stay Updated

Join The Neural Maze and learn to build AI Systems that actually work, from principles to production. Every Wednesday, directly to your inbox. Don't miss out!

Subscribe Now

Jesus Copado YouTube Channel

πŸŽ₯ Watch More Content

Join JesΓΊs Copado on YouTube to explore how to build real AI projectsβ€”from voice agents to creative tools. Weekly videos with code, demos, and ideas that push what's possible with AI. Don't miss the next drop!

Subscribe Now


Who is this course for?

This course is for Software Engineers, ML Engineers, and AI Engineers who want to level up by building complex end-to-end apps. It's not just a basic "Hello World" tutorialβ€”it's a deep dive into making production-ready voice agents.

Course Breakdown: Week by Week

Each week, you'll unlock a new chapter of the journey. You'll get:

  • 🧾 A Substack article that walks through the concepts and code in detail
  • πŸ’» A new batch of code pushed directly to this repo
  • πŸŽ₯ A Live Session where we explore everything together

Here’s what the upcoming weeks look like πŸ‘‡

Lesson Number Title Article Code Live Session
0
Project overview and architecture Diagram 0 Week 0 Thumbnail 0
1
Building Realtime Voice Agents with FastRTC Diagram 1 Week 1 Thumbnail 1
2
The Missing Layer in Modern AI Retrieval Diagram 2 Week 2 Thumbnail 2
3
Improving STT and TTS Systems Diagram 3 Week 3 Thumbnail 3
4
Deploying a multi-avatar Voice Agent with Full Tracing Diagram 4 Week 4 Thumbnail 4

Getting Started

Before diving into the lessons, make sure you have everything set up properly:

  1. πŸ“‹ Initial Setup: Follow the instructions in docs/GETTINGS_STARTED.md to configure your environment and install dependencies.
  2. πŸ“š Learn Lesson by Lesson: Once setup is complete, come back here and follow the lessons in order.

Each lesson builds on the previous one, so it's important to follow them sequentially!


Lesson 0: Project Overview and Architecture

Lesson 0 Diagram

Goal: Understand the big picture and architecture of the realtime phone agent system.

Steps:

  1. πŸ“– Read the Substack article to understand the overall architecture
  2. πŸŽ₯ Watch the Live Session recording for a deeper dive

This lesson sets the foundation for everything that follows!


Lesson 1: Building Realtime Voice Agents with FastRTC

Lesson 1 Diagram

Goal: Build your first working voice agent using FastRTC and integrate it with Twilio.

Steps:

  1. πŸ“– Read the Article: Start with the Substack article to understand FastRTC fundamentals
  2. πŸ““ Work Through the Notebook: Open and run through notebooks/lesson_1_fastrtc_agents.ipynb to get hands-on experience
  3. πŸ’» Explore the Code: Dive into the repository code to see how everything is implemented
  4. πŸš€ Run the Applications: Try both deployment options:

Option A: Gradio Application (Quick Demo)

Run the Gradio interface (check out demo videos in the Substack article):

make start-gradio-application

This starts an interactive web interface where you can test the voice agent locally.

NOTE: If you get the error 'No such file or directory: 'ffprobe', just install ffmpeg in your system to solve it

Option B: FastAPI Call Center (Production-Ready)

For a production-ready setup that can receive real phone calls:

Step 1: Start the call center application

make start-call-center

This starts a FastAPI application using Docker Compose on port 8000.

Step 2: Expose your local server to the internet

make start-ngrok-tunnel

Or manually:

ngrok http 8000

Step 3: Connect to Twilio

Follow the instructions in the article to:

  • Configure your Twilio account
  • Connect your ngrok URL to Twilio
  • Start receiving real phone calls!

Lesson 2: The Missing Layer in Modern AI Retrieval

Lesson 2 Diagram

Goal: Learn how to implement advanced search capabilities for realtime voice agents using Superlinked to handle complex, multi-attribute queries.

Steps:

  1. πŸ“– Read the Article: Start with the Substack article to understand:

    • Why traditional vector search isn't enough for multi-attribute queries
    • How Superlinked combines different data types (text, numbers, categories) into a unified search space
    • The limitations of metadata filters, multiple searches, and re-ranking approaches
  2. πŸ““ Work Through the Notebook: Open and run through notebooks/lesson_2_superlinked_property_search.ipynb to learn:

    • How to define different Space types (TextSimilaritySpace, NumberSpace, CategoricalSimilaritySpace)
    • How to combine spaces into a single searchable index
    • How to dynamically adjust weights at query time
  3. πŸ’» Explore the Code: Dive into the repository to see how Superlinked integrates with our voice agent:

    • Check out src/realtime_phone_agents/infrastructure/superlinked/ for the implementation
    • Review src/realtime_phone_agents/agent/tools/property_search.py to see how the search tool is exposed to the agent

    We'll explore the code in detail during the Live Session!

  4. πŸš€ Test the Complete System: Now it's time to see everything work together!

    Step 1: Start the call center application

    make start-call-center

    Step 2: Expose your local server (if not already running)

    make start-ngrok-tunnel

    Step 3: Call your Twilio number and test the property search

    Try asking the agent:

    "Do you have apartments in Barrio de Salamanca of at most 900,000 euros?"

    Wait for the response. The agent should find and return information about the only apartment in the dataset (data/properties.csv) that meets these criteria!

    This demonstrates how the voice agent can now handle complex queries combining location (Barrio de Salamanca) and price constraints (≀ €900,000) in real-time.


Lesson 3: Improving STT and TTS Systems

Lesson 3 Diagram

Goal: Improve the quality of STT and TTS systems used in the voice agent.

Steps:

  1. πŸ“– Read the Article: Start with the Substack article to understand the fundamentals of STT and TTS systems, and how to deploy them on Runpod.

  2. πŸ““ Work Through the Notebook: Open and run through notebooks/lesson_3_stt_tts.ipynb to experience how the new faster-whisper and Orpheus 3B deployments look like.

  3. πŸ’» Explore the Code: It's time to see the additions for week 3. Check out the new stt/ and tts/ modules in src/realtime_phone_agents/:

    • STT (Speech-to-Text):

      • local/: Implementation using Moonshine for local inference.
      • groq/: Integration with Groq's fast inference API.
      • runpod/: Self-hosted Faster Whisper implementation.
    • TTS (Text-to-Speech):

      • local/: Implementation using Kokoro for high-quality local synthesis.
      • togetherai/: Integration with Together AI.
      • runpod/: Self-hosted Orpheus 3B implementation.
  4. 🐳 New Docker Images: We've added two new Dockerfiles to deploy our custom models on RunPod:

    • Dockerfile.faster_whisper: Builds a container for the Faster Whisper model (large-v3). It uses the speaches-ai/speaches base image and pre-downloads the model for faster startup.
    • Dockerfile.orpheus: Builds a container for the Orpheus 3B model using llama.cpp server with CUDA support, optimized for real-time speech generation.
  5. πŸš€ Deploy & Interact: Ready to test these models? Follow these steps:

    ⚠️ IMPORTANT: Before proceeding, ensure you have completed the setup in docs/GETTING_STARTED.md. This includes setting up your API keys and environment variables (especially for RunPod).

    Step 1: Deploy to RunPod

    Use the Makefile commands to spin up your GPU pods:

    # Deploy Faster Whisper
    make create-faster-whisper-pod
    
    # Deploy Orpheus 3B
    make create-orpheus-pod

    Note: These scripts will automatically print the endpoint URLs once the pods are ready. Make sure to update your .env file with these URLs!

    Step 2: Start the Gradio App

    Launch the interactive interface to test different combinations:

    make start-gradio-application

    Step 3: Experiment!

    In the Gradio interface, you can mix and match different implementations:

    • STT Options:

      • Moonshine (Local)
      • Whisper (Groq API)
      • Faster Whisper (RunPod - requires Step 1)
    • TTS Options:

      • Kokoro (Local)
      • Orpheus (Together AI API)
      • Orpheus (RunPod - requires Step 1)

Lesson 4: Deploying a multi-avatar Voice Agent with Full Tracing

Lesson 4 Diagram

Goal: Deploy a production-ready call center with multiple avatars, full tracing, and Twilio integration for inbound and outbound calls.

Steps:

  1. πŸ“– Read the Article: Start with the Substack article to understand:

    • How to build a multi-avatar system with different personas
    • How to implement full tracing of every interaction using Opik
    • How to version prompts and store transcribed conversations
    • How to deploy to Runpod and integrate with Twilio
  2. πŸ““ Work Through the Notebook: Open and run through notebooks/lesson_4_avatar_system.ipynb to explore:

    • How to define and work with different avatars
    • How each avatar has its own personality, style, and voice
    • How to fetch and use avatars in your application
  3. πŸ’» Explore the Code: Check out the new additions for week 4:

    • Avatar System (src/realtime_phone_agents/avatars/):

      • base.py: Base Avatar class with system prompt generation and versioning
      • registry.py: Utility to list, fetch, and manage avatars
      • definitions/: YAML files defining each avatar's personality (dan, jess, leah, leo, mia, tara, zac, zoe)
    • Observability (src/realtime_phone_agents/observability/):

      • opik_utils.py: Utilities for tracing with Opik
      • prompt_versioning.py: System for versioning all prompts
    • Updated Agent (src/realtime_phone_agents/agent/fastrtc_agent.py):

      • Added @opik.track decorators to trace every method in the pipeline
      • Tracks STT transcription, LLM responses, tool calls, and TTS generation
      • Stores complete conversation threads in Opik
  4. πŸš€ Deploy to Production: Time to deploy your call center to the cloud!

    ⚠️ IMPORTANT: Make sure your .env file includes all required variables from docs/GETTING_STARTED.md, including:

    • Opik API key for tracing
    • Qdrant Cloud credentials
    • Twilio credentials
    • Runpod API key
    • All STT/TTS model configurations

    Step 1: Deploy the Call Center to Runpod

    make create-call-center-pod

    This will deploy your FastAPI application to Runpod and give you a URL like:

    https://your-pod-id.proxy.runpod.net
    

    Step 2: Ingest Properties to Qdrant Cloud

    make ingest-properties

    This populates your Qdrant Cloud cluster with property data for the agent to search.

    Step 3: Configure Twilio

    • Go to your Twilio TwiML App
    • Replace your ngrok URL with your Runpod URL:
      https://your-pod-id.proxy.runpod.net/voice/telephone/incoming
      
    • Save the configuration

    Step 4: Test Inbound Calls

    Call your Twilio number and interact with your deployed agent! The system will:

    • Answer with the avatar you configured (AVATAR_NAME in .env)
    • Search properties using Superlinked
    • Trace every interaction in Opik
    • Store the full conversation

    Step 5: Make Outbound Calls

    You can also make outbound calls programmatically:

    make outbound-call

    This will trigger a call from your agent to the specified number!

  5. πŸ“Š Monitor with Opik: Open your Opik dashboard to see:

    • Traces: Every step of the conversation pipeline with timing information
    • Threads: Complete transcribed conversations stored for analysis
    • Prompts: Versioned system prompts for each avatar

    You'll be able to track:

    • How long transcription takes
    • LLM response times
    • Tool call performance
    • TTS generation speed
    • Complete conversation flow

The tech stack

Technology Description
FastRTC Logo The python library for real-time communication.
Superlinked Logo Superlinked is a Python framework for AI Engineers building high-performance search & recommendation applications that combine structured and unstructured data.
Runpod Logo The end-to-end AI cloud that simplifies building and deploying models.
Opik Logo Debug, evaluate, and monitor your LLM applications, RAG systems, and agentic workflows with comprehensive tracing, automated evaluations, and production-ready dashboards.
Twilio Logo Twilio is a cloud communications platform that enables developers to build, manage, and automate voice, text, video, and other communication services through APIs.

Contributors

Miguel Otero Pedrido | Senior ML / AI Engineer
Founder of The Neural Maze. Rick and Morty fan.

LinkedIn
YouTube
The Neural Maze Newsletter
JesΓΊs Copado | Senior ML / AI Engineer
Equal parts cinema fan and AI enthusiast.

YouTube
LinkedIn

License

This project is licensed under the MIT License - see the LICENSE file for details.

About

Build realtime AI voice agents using FastRTC for low-latency streaming, Superlinked for vector search, Twilio for live phone calls, and Runpod for scalable GPU deployment.

Resources

License

Stars

Watchers

Forks

Packages

No packages published