Skip to content

STT server is a basic standalone HTTP API server for OpenAI Whisper model. It allows for simple sharing of GPU resources across local applications that require STT integration.

Notifications You must be signed in to change notification settings

rctl/stt-server

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

stt-server

Minimal Whisper-based speech-to-text HTTP server.

Usage

Pre-built docker container: rctl/stt-server:latest

docker run rctl/stt-server:latest -p 8081:8000

What It Does

  • Loads an OpenAI Whisper model on startup (turbo by default).
  • Exposes a single transcription endpoint: POST /transcribe.
  • Accepts raw PCM s16le mono audio in the request body.
  • Returns Whisper JSON output (including text and segments timing).

API

  • GET / returns ok
  • POST /transcribe
    • Headers:
      • X-Sample-Rate: sample rate (default 16000)
      • X-Lang: language code (default en)
    • Body:
      • raw int16 PCM bytes
    • Response:
      • Whisper transcription JSON

Run

pip install -r requirements.txt
python main.py --host 0.0.0.0 --port 8000 --model turbo

Optional flags:

  • --device cuda|cpu
  • --no-fp16
  • --debug (writes incoming audio samples as *.wav)

Quick Test

curl -X POST "http://localhost:8000/transcribe" \
  -H "X-Sample-Rate: 16000" \
  -H "X-Lang: en" \
  --data-binary @sample.pcm

About

STT server is a basic standalone HTTP API server for OpenAI Whisper model. It allows for simple sharing of GPU resources across local applications that require STT integration.

Resources

Stars

Watchers

Forks