Skip to content

richvigorito/parlabot

Repository files navigation

☕ ParlaBot: potrebbe ripeterlo?

(SpeakBot: Could you repeat that?)

🔗 Demo Link: https://parlabot.io 🔗

ParlaBot is a voice-enabled app that gives you real-time feedback on your Italian pronunciation. Speak into your mic, and ParlaBot will transcribe what you said, compare it to a target phrase, and return constructive feedback — powered by modern open-source AI and traditional DSP filtering techniques.


First, a Nod to the Past

My first real speech recognition project was my 2007 Master’s thesis — a vowel recognition frontend built using FFTs, Mel filters, and CMU Sphinx. It’s old-school compared to today’s AI toolkits, but this research (not necessarily mine, but those I studied) laid the foundation for the models that power ParlaBot.
More on that here →


Fast-Forward to Now

Nearly two decades later, I’ve been studying Italian seriously for three years and wanted to build something that merges:

  • Revisiting of my past studies in speech recognition
  • Hands-on exploration of modern STT and AI toolkits
  • My passion for learning Italian

Entrare il ParlaBot
(Enter ParlaBot)


Project Goals

  • Build a practical voice-powered Italian pronunciation coach
  • Showcase my ability to design, develop, and deploy AI-based microservices
  • Reinforce skills in Python, Go, C/C++, and container-based architecture

Architecture Overview

ParlaBot is composed of several Dockerized microservices:

  1. Frontend UI in React

    • Displays the target phrase from the PhraseService
    • Records mic input and sends audio to the Orchestrator
    • Displays multiple transcriptions and feedback
  2. API Orchestrator in Go/Gin

    • Fetches all target phrases from the Phrase Service
    • Fetches all pipelines from the Audio Preprocessing Service
    • Exposes a /transcribe endpoint
    • Concurrently via goroutines:
      • Forwards the user’s audio to each selected preprocessing pipeline
      • Forwards the filtered audio to the STT Service for transcription and scoring
    • (Planned) Routes results to the Feedback service
  3. Audio Preprocessing Service in Python/FastAPI + Torch Transformers

    • Accepts .wav audio
    • Runs audio through specified preprocessing pipelines
    • (Planned) Consume/integrate compoiled C++ shared objects filter chains for audio preprocessing from registry
  4. STT Service in Python/FastAPI + HuggingFace Language Model Transcribers

    • Accepts filtered .wav audio
    • Transcribes speech using language model, currently only supports wav2vec2-large-xlsr-53-italian
    • Scores the transcription against the target phrase
    • Returns the model, preprocessing info, and transcript
    • (Planned) Add support for multiple models
  5. Phrase Service in Python/FastAPI + MongoDB + coqui (with mozilla and personal speaker training files) + Google TTS API

    • Accepts text phrases and TTS speaker and generates audio using TTS
    • (Planned) Tracks user progress

All services are containerized and connected via docker-compose.

Current System Architecture

... Where its going


How to Run

git clone https://github.com/richvigorito/parlabot.git
cd parlabot
docker-compose up --build
open http://localhost:3000

Roadmap

see milestones for project milestones/roadmaps/issues/etc


License

MIT License


Want to read this in Italian?


Rich Vigorito | Portland, OR | LinkedIn | GitHub

About

Common Italian Phrase Pronunciation Checker

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published