Whisper Transcriber

Android floating overlay app for voice-to-text using a self-hosted WhisperLiveKit server. Tap the bubble, speak, and the transcription streams directly into the active text field.

Works over Tailscale / ZeroTier — just point it at your server's VPN IP.

How it works

A floating bubble sits over all apps (like Messenger chat heads)
Tap to start recording; WhisperLiveKit silence detection stops the capture, or tap again to stop manually
Audio is streamed to WhisperLiveKit via native WebSocket (/asr) when PCM input is enabled
Partial transcripts replace the in-progress text in the focused input field in real time
If no editable field is focused, the final transcript is copied to the clipboard once the utterance is silent
Optional Kokoro TTS can read clipboard text aloud through the overlay
If live streaming is unavailable, the app falls back to the OpenAI-compatible REST API (/v1/audio/transcriptions)

Setup

Server

Run WhisperLiveKit on your machine. For native Android streaming, start it with PCM input enabled:

whisperlivekit-server --host 0.0.0.0 --port 8090 --pcm-input

For TTS, run Kokoro-FastAPI on your machine or tailnet. The app discovers healthy TTS servers on port 8880 by default and uses the OpenAI-compatible /v1/audio/speech endpoint.

LiteLLM/OpenAI-compatible proxies are supported for REST STT and TTS. Set the proxy URL manually, add the matching STT/TTS API key in Settings, and use the proxy model name for TTS, for example kokoro-tts. Some proxies do not expose /v1/audio/voices; in that case enter the voice name manually.

App

Install the APK (grab from Actions artifacts or build yourself)
Open the app. Leave the server URL blank to auto-discover WhisperLiveKit on local networks and Tailscale port 8090, or set a URL manually in Settings. If the endpoint requires auth, fill STT API Key; the app sends it as Authorization: Bearer ....
Grant permissions when prompted:
- Microphone — for recording audio
- Display over other apps — for the floating bubble
- Notifications — for the foreground service
Enable the Whisper Transcriber accessibility service in Android Settings → Accessibility (needed to type into other apps' text fields)
Tap Start Overlay — the floating bubble appears
Optional: in Settings → Text To Speech, discover your Kokoro server, test the connection to load voices, pick a model/voice/speed, and play sample text. If the endpoint requires auth, fill TTS API Key.

Permissions

Permission	Why
`RECORD_AUDIO`	Capture voice from microphone
`BLUETOOTH_CONNECT`	Use a connected headset microphone on Android 12+
`MODIFY_AUDIO_SETTINGS`	Route recording through the active communication device
`SYSTEM_ALERT_WINDOW`	Floating bubble overlay
`FOREGROUND_SERVICE`	Keep the overlay alive
`INTERNET`	Send audio to whisper server
`POST_NOTIFICATIONS`	Foreground service notification (Android 13+)
Accessibility Service	Type transcription into focused text fields

Building

With Nix (CI uses this)

nix develop --command ./gradlew assembleDebug

The flake.nix provides JDK 17 + Android SDK (platform 34, build-tools 34.0.0).

Without Nix

Requires JDK 17 and Android SDK with platform 34:

export ANDROID_HOME=/path/to/android/sdk
./gradlew assembleDebug

APKs end up in app/build/outputs/apk/.

CI

GitHub Actions builds debug + release APKs on every push using a self-hosted NixOS runner. Artifacts are retained for 7 days, older ones are cleaned up automatically.

Project structure

app/src/main/java/com/whispertranscriber/
├── MainActivity.kt              # Home screen, nav, permissions
├── audio/
│   └── AudioRecorder.kt         # Mic recording → WAV conversion
├── data/
│   ├── SettingsStore.kt          # DataStore-backed preferences
│   └── TranscriptionLog.kt      # Transcription history (last 100)
├── network/
│   ├── WhisperApiClient.kt      # REST fallback via OpenAI-compatible API
│   └── WhisperLiveKitClient.kt  # Native WebSocket streaming client
├── update/                      # GitHub Release update checker/downloader/installer
├── service/
│   ├── FloatingOverlayService.kt          # Bubble UI + record/transcribe flow
│   └── TranscriberAccessibilityService.kt # Types text into focused fields
└── ui/
    ├── LogScreen.kt              # Transcription history viewer
    ├── SettingsScreen.kt         # Server URL + audio quality config
    └── theme/Theme.kt            # Material 3 theme

whisper-client (Rust crate)

whisper-client/ contains an async Rust library for calling a Whisper API with either API key auth or Cashu ecash payment (using cdk 0.8). This is a standalone library, not used by the Android app.

let client = WhisperClient::new("https://whisper.example.com".into());

// With API key
let result = client.transcribe_with_key(
    "sk-...", audio_bytes, "recording.wav", TranscribeOptions::default()
).await?;

// With Cashu payment (10 sats/minute)
let result = client.transcribe_with_cashu(
    &wallet, 10, audio_bytes, "recording.wav", TranscribeOptions::default()
).await?;

println!("{}", result.text);

Updates

The app checks a rolling GitHub Release manifest at app-latest. When a newer versionCode is available, it downloads app.apk, verifies size and SHA-256, then hands off to Android's package installer. Android still requires user approval, and APK signing must stay consistent between builds. CI publishes the release-signed APK directly as app.apk.

Network notes

HTTP / ws:// works out of the box to any IP (cleartext traffic is allowed via network security config)
HTTPS with self-signed certs works — the client trusts all certificates (this is a private VPN tool, not a public app)
Works over Tailscale, ZeroTier, or any VPN — just use the VPN IP as the server URL
Recording prefers the active headset microphone when Android exposes one, then wired/USB headsets, then the built-in mic
Long-press the overlay to open the panel, then tap SPEAK to read the current clipboard with the selected Kokoro voice

Live endpoint probe

Use the probe script to verify a deployed WhisperLiveKit server with a known 16 kHz mono WAV:

python3 scripts/whisperlivekit_live_probe.py \
  --url http://100.101.157.56:8090 \
  --wav /tmp/jfk.wav \
  --expect country

The app needs the server WebSocket config to report "useAudioWorklet": true for real-time Android PCM streaming. Start WhisperLiveKit with --pcm-input for that mode.

Tech stack

Kotlin + Jetpack Compose + Material 3
OkHttp for network
DataStore for preferences
Target SDK 34, min SDK 26
Gradle 8.5, AGP 8.2.2

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
.github/workflows		.github/workflows
app		app
docs/superpowers		docs/superpowers
gradle/wrapper		gradle/wrapper
scripts		scripts
whisper-client		whisper-client
.gitignore		.gitignore
README.md		README.md
build.gradle.kts		build.gradle.kts
flake.lock		flake.lock
flake.nix		flake.nix
gradle.properties		gradle.properties
gradlew		gradlew
gradlew.bat		gradlew.bat
settings.gradle.kts		settings.gradle.kts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Whisper Transcriber

How it works

Setup

Server

App

Permissions

Building

With Nix (CI uses this)

Without Nix

CI

Project structure

whisper-client (Rust crate)

Updates

Network notes

Live endpoint probe

Tech stack

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Whisper Transcriber

How it works

Setup

Server

App

Permissions

Building

With Nix (CI uses this)

Without Nix

CI

Project structure

whisper-client (Rust crate)

Updates

Network notes

Live endpoint probe

Tech stack

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages