(SpeakBot: Could you repeat that?)
🔗 Demo Link: https://parlabot.io 🔗
ParlaBot is a voice-enabled app that gives you real-time feedback on your Italian pronunciation. Speak into your mic, and ParlaBot will transcribe what you said, compare it to a target phrase, and return constructive feedback — powered by modern open-source AI and traditional DSP filtering techniques.
My first real speech recognition project was my 2007 Master’s thesis — a vowel recognition frontend built using FFTs, Mel filters, and CMU Sphinx. It’s old-school compared to today’s AI toolkits, but this research (not necessarily mine, but those I studied) laid the foundation for the models that power ParlaBot.
More on that here →
Nearly two decades later, I’ve been studying Italian seriously for three years and wanted to build something that merges:
- Revisiting of my past studies in speech recognition
- Hands-on exploration of modern STT and AI toolkits
- My passion for learning Italian
Entrare il ParlaBot
(Enter ParlaBot)
- Build a practical voice-powered Italian pronunciation coach
- Showcase my ability to design, develop, and deploy AI-based microservices
- Reinforce skills in Python, Go, C/C++, and container-based architecture
ParlaBot is composed of several Dockerized microservices:
-
Frontend UI in React
- Displays the target phrase from the PhraseService
- Records mic input and sends audio to the Orchestrator
- Displays multiple transcriptions and feedback
-
API Orchestrator in Go/Gin
- Fetches all target phrases from the Phrase Service
- Fetches all pipelines from the Audio Preprocessing Service
- Exposes a
/transcribeendpoint - Concurrently via goroutines:
- Forwards the user’s audio to each selected preprocessing pipeline
- Forwards the filtered audio to the STT Service for transcription and scoring
- (Planned) Routes results to the Feedback service
-
Audio Preprocessing Service in Python/FastAPI + Torch Transformers
- Accepts
.wavaudio - Runs audio through specified preprocessing pipelines
- (Planned) Consume/integrate compoiled C++ shared objects filter chains for audio preprocessing from registry
- Accepts
-
STT Service in Python/FastAPI + HuggingFace Language Model Transcribers
- Accepts filtered
.wavaudio - Transcribes speech using language model, currently only supports
wav2vec2-large-xlsr-53-italian - Scores the transcription against the target phrase
- Returns the model, preprocessing info, and transcript
- (Planned) Add support for multiple models
- Accepts filtered
-
Phrase Service in Python/FastAPI + MongoDB + coqui (with mozilla and personal speaker training files) + Google TTS API
- Accepts text phrases and TTS speaker and generates audio using TTS
- (Planned) Tracks user progress
All services are containerized and connected via docker-compose.
git clone https://github.com/richvigorito/parlabot.git
cd parlabot
docker-compose up --build
open http://localhost:3000see milestones for project milestones/roadmaps/issues/etc
MIT License
