A production-ready multilingual Speech-to-Text gateway that intelligently routes requests to language-specific backend services. Built with FastAPI and deployed on Modal, this gateway provides unified API access to Hausa, Yoruba, and Igbo transcription services.
- 🌍 Multilingual Support: Hausa, Yoruba, and Igbo languages with easy extensibility
- 🔄 Intelligent Routing: Automatic request routing to language-specific backend services
- 🛡️ Production-Ready Resilience: Circuit breaker pattern, retry logic with exponential backoff, and rate limiting
- 📊 Comprehensive Observability: Real-time metrics, latency percentiles, and structured logging
- ⚡ High Performance: Keep-warm instances for fast response times
- 🔒 Robust Validation: File size limits, format checking, and parameter validation
- Python 3.11+
- Modal account (sign up here)
- Modal CLI installed:
pip install modal
- Clone the repository:
git clone https://github.com/yourusername/stt-gateway.git
cd stt-gateway- Install dependencies:
pip install -r requirements.txt- Authenticate with Modal:
modal token newDeploy the gateway with a single command:
modal deploy gateway_stt_modal.pyAfter deployment, Modal will provide a public HTTPS endpoint URL.
https://your-gateway-url
Transcribe audio files to text.
Request:
curl -X POST "https://your-gateway-url/transcribe" \
-F "file=@audio.wav" \
-F "language=hausa"Parameters:
file(required): Audio file (WAV, MP3, M4A, OGG, FLAC, WebM)language(required): Target language (hausa,yoruba, origbo)
Response:
{
"request_id": "req_1234567890",
"language": "hausa",
"routed_to": "https://voicebreeze--hausa-stt-endpoint-api.modal.run/transcribe",
"transcription": "Sannu, yaya kuke?",
"backend_response": {...},
"latency_seconds": 2.345,
"timestamp": "2025-11-26T14:00:00"
}Check gateway health and capabilities.
Request:
curl "https://your-gateway-url/health"Response:
{
"status": "healthy",
"version": "1.0.0",
"supported_languages": ["hausa", "yoruba", "igbo"],
"timestamp": "2025-11-26T14:00:00"
}Get supported languages and their backend endpoints.
Request:
curl "https://your-gateway-url/languages"Response:
{
"hausa": "https://voicebreeze--hausa-stt-endpoint-api.modal.run/transcribe",
"yoruba": "https://voicebreeze--yoruba-stt-endpoint-api.modal.run/transcribe",
"igbo": "https://voicebreeze--multilingual-speech-translation-endpoint-api.modal.run/transcribe"
}Retrieve comprehensive gateway metrics and observability data.
Request:
curl "https://your-gateway-url/metrics"Response:
{
"total_requests": 1523,
"total_errors": 12,
"error_rate": 0.0079,
"uptime_seconds": 86400,
"latency_p50": 2.1,
"latency_p90": 4.5,
"latency_p95": 6.2,
"latency_p99": 12.8,
"language_traffic": {
"hausa": 645,
"yoruba": 523,
"igbo": 355
},
"backend_errors": {
"hausa": 3,
"yoruba": 5,
"igbo": 4
},
"requests_per_minute": 1.06,
"circuit_breaker": {
"open_circuits": [],
"failures": {}
}
}The gateway implements a microservices architecture pattern where a central gateway routes requests to specialized backend services based on the requested language.
┌─────────────┐
│ Client │
└──────┬──────┘
│
▼
┌─────────────────┐
│ STT Gateway │
│ (This Service) │
└────┬───┬───┬────┘
│ │ │
▼ ▼ ▼
┌─────┐ ┌─────┐ ┌─────┐
│Hausa│ │Yoruba│ │Igbo │
│ STT │ │ STT │ │ STT │
└─────┘ └─────┘ └─────┘
| Language | Backend Service | Endpoint URL |
|---|---|---|
| Hausa | hausa-stt-endpoint-api | https://voicebreeze--hausa-stt-endpoint-api.modal.run/transcribe |
| Yoruba | yoruba-stt-endpoint-api | https://voicebreeze--yoruba-stt-endpoint-api.modal.run/transcribe |
| Igbo | multilingual-speech-translation-endpoint-api | https://voicebreeze--multilingual-speech-translation-endpoint-api.modal.run/transcribe |
Prevents cascading failures by opening the circuit after consecutive failures.
- Failure Threshold: 5 consecutive failures
- Timeout: 60 seconds
- Recovery: Automatic after timeout period
Automatically retries failed requests with increasing delays.
- Max Retries: 3 attempts
- Initial Delay: 1.0 second
- Backoff Strategy: Exponential (2^attempt)
- Max Delay: 4.0 seconds
Example retry sequence:
Request → Fail → Wait 1s → Retry → Fail → Wait 2s → Retry → Fail → Wait 4s → Final result
Protects backend services from overload.
- Max Requests: 10 requests per client
- Time Window: 60 seconds (1 minute)
- Scope: Per client IP address
- Response Code: 429 Too Many Requests
- Request Timeout: 300 seconds (5 minutes)
- Suitable for processing audio files up to 10 minutes
| Rule | Requirement | Error Response |
|---|---|---|
| File Size | 500 MB maximum | 400 Bad Request with file size details |
| Audio Duration | 10 minutes maximum | 400 Bad Request if duration likely exceeds limit |
| Audio Format | WAV, MP3, M4A, OGG, FLAC, WebM | 400 Bad Request for unsupported formats |
| Language Parameter | Must be 'hausa', 'yoruba', or 'igbo' | 400 Bad Request with list of supported languages |
| Missing Language | Language parameter is required | 400 Bad Request indicating missing parameter |
All routing decisions, errors, and important events are logged with structured data:
- Request details (language, file size, client IP)
- Routing decisions and target backend service
- Retry attempts with delays
- Circuit breaker state changes
- Backend response status
- Final metrics (latency, status, request ID)
| Metric | Description | Use Case |
|---|---|---|
| p50 (Median) | 50% of requests complete within this time | Typical user experience |
| p90 | 90% of requests complete within this time | Most users' experience |
| p95 | 95% of requests complete within this time | Performance SLA monitoring |
| p99 | 99% of requests complete within this time | Outlier detection |
| Alert Trigger | Log Message | Recommended Action |
|---|---|---|
| Circuit Breaker Opens | 🚨 ALERT: Circuit breaker opened for {service} | Backend service is unavailable. Investigate service health. |
| All Retries Failed | 🚨 ALERT: All {N} attempts failed for {service} | Backend consistently failing. Check service logs. |
| Service Unavailable | 🚨 ALERT: Service {service} is unavailable | Wait for circuit timeout or fix backend. |
| Gateway Error | 🚨 ALERT: Unexpected error in request {request_id} | Review stack trace and fix bug. |
Parameter Value Purpose
----------------- ------------------ --------------------------------
Modal App Name stt-gateway Unique identifier
Python Version 3.11 Latest stable Python
Base Image debian_slim Minimal image for fast cold starts
Keep Warm 1 instance Maintains warm container
Container Timeout 300 seconds Idle timeout
Request Timeout 600 seconds Maximum processing timeCreate a .env file (optional):
# CORS settings
ALLOWED_ORIGINS=https://yourdomain.com,https://app.yourdomain.com
# Logging level
LOG_LEVEL=INFOAdding support for a new language is simple. Update the LANGUAGE_ENDPOINTS dictionary in gateway_stt_modal.py:
LANGUAGE_ENDPOINTS = {
"hausa": "https://voicebreeze--hausa-stt-endpoint-api.modal.run/transcribe",
"yoruba": "https://voicebreeze--yoruba-stt-endpoint-api.modal.run/transcribe",
"igbo": "https://voicebreeze--multilingual-speech-translation-endpoint-api.modal.run/transcribe",
"swahili": "https://your-new-backend-endpoint.modal.run/transcribe" # Add new language
}No other code changes are required!
Run the gateway locally for development:
modal serve gateway_stt_modal.pyThis starts a local development server with hot reload.
pytest tests/# Format code
black gateway_stt_modal.py
# Type checking
mypy gateway_stt_modal.py
# Linting
flake8 gateway_stt_modal.pyIssue: Circuit breaker keeps opening
- Check backend service health at the direct endpoint URL
- Review backend service logs in Modal dashboard
- Verify network connectivity between services
Issue: High latency (p99 > 30s)
- Check audio file sizes (larger files take longer)
- Monitor backend service performance
- Consider scaling backend services
Issue: Rate limit errors (429)
- Increase rate limit in configuration if legitimate traffic
- Implement exponential backoff in client
- Distribute load across multiple client IPs if possible
Issue: "Language not supported" errors
- Verify language parameter is lowercase
- Check supported languages via
/languagesendpoint - Ensure language is in LANGUAGE_ENDPOINTS dictionary
- File Size: Keep audio files under 100 MB for best performance
- Audio Format: WAV and FLAC provide best transcription quality
- Batch Processing: Use multiple concurrent requests for batch transcription
- Caching: Implement client-side caching for repeated transcriptions
Made with ❤️ by [Your Team Name]