Minimal Whisper-based speech-to-text HTTP server.
Pre-built docker container: rctl/stt-server:latest
docker run rctl/stt-server:latest -p 8081:8000
- Loads an OpenAI Whisper model on startup (
turboby default). - Exposes a single transcription endpoint:
POST /transcribe. - Accepts raw PCM
s16lemono audio in the request body. - Returns Whisper JSON output (including
textandsegmentstiming).
GET /returnsokPOST /transcribe- Headers:
X-Sample-Rate: sample rate (default16000)X-Lang: language code (defaulten)
- Body:
- raw
int16PCM bytes
- raw
- Response:
- Whisper transcription JSON
- Headers:
pip install -r requirements.txt
python main.py --host 0.0.0.0 --port 8000 --model turboOptional flags:
--device cuda|cpu--no-fp16--debug(writes incoming audio samples as*.wav)
curl -X POST "http://localhost:8000/transcribe" \
-H "X-Sample-Rate: 16000" \
-H "X-Lang: en" \
--data-binary @sample.pcm