Android floating overlay app for voice-to-text using a self-hosted WhisperLiveKit server. Tap the bubble, speak, and the transcription streams directly into the active text field.
Works over Tailscale / ZeroTier — just point it at your server's VPN IP.
- A floating bubble sits over all apps (like Messenger chat heads)
- Tap to start recording; WhisperLiveKit silence detection stops the capture, or tap again to stop manually
- Audio is streamed to WhisperLiveKit via native WebSocket (
/asr) when PCM input is enabled - Partial transcripts replace the in-progress text in the focused input field in real time
- If no editable field is focused, the final transcript is copied to the clipboard once the utterance is silent
- Optional Kokoro TTS can read clipboard text aloud through the overlay
- If live streaming is unavailable, the app falls back to the OpenAI-compatible REST API (
/v1/audio/transcriptions)
Run WhisperLiveKit on your machine. For native Android streaming, start it with PCM input enabled:
whisperlivekit-server --host 0.0.0.0 --port 8090 --pcm-inputFor TTS, run Kokoro-FastAPI on your machine or tailnet. The app discovers healthy TTS servers on port 8880 by default and uses the OpenAI-compatible /v1/audio/speech endpoint.
LiteLLM/OpenAI-compatible proxies are supported for REST STT and TTS. Set the proxy URL manually, add the matching STT/TTS API key in Settings, and use the proxy model name for TTS, for example kokoro-tts. Some proxies do not expose /v1/audio/voices; in that case enter the voice name manually.
- Install the APK (grab from Actions artifacts or build yourself)
- Open the app. Leave the server URL blank to auto-discover WhisperLiveKit on local networks and Tailscale port
8090, or set a URL manually in Settings. If the endpoint requires auth, fill STT API Key; the app sends it asAuthorization: Bearer .... - Grant permissions when prompted:
- Microphone — for recording audio
- Display over other apps — for the floating bubble
- Notifications — for the foreground service
- Enable the Whisper Transcriber accessibility service in Android Settings → Accessibility (needed to type into other apps' text fields)
- Tap Start Overlay — the floating bubble appears
- Optional: in Settings → Text To Speech, discover your Kokoro server, test the connection to load voices, pick a model/voice/speed, and play sample text. If the endpoint requires auth, fill TTS API Key.
| Permission | Why |
|---|---|
RECORD_AUDIO |
Capture voice from microphone |
BLUETOOTH_CONNECT |
Use a connected headset microphone on Android 12+ |
MODIFY_AUDIO_SETTINGS |
Route recording through the active communication device |
SYSTEM_ALERT_WINDOW |
Floating bubble overlay |
FOREGROUND_SERVICE |
Keep the overlay alive |
INTERNET |
Send audio to whisper server |
POST_NOTIFICATIONS |
Foreground service notification (Android 13+) |
| Accessibility Service | Type transcription into focused text fields |
nix develop --command ./gradlew assembleDebugThe flake.nix provides JDK 17 + Android SDK (platform 34, build-tools 34.0.0).
Requires JDK 17 and Android SDK with platform 34:
export ANDROID_HOME=/path/to/android/sdk
./gradlew assembleDebugAPKs end up in app/build/outputs/apk/.
GitHub Actions builds debug + release APKs on every push using a self-hosted NixOS runner. Artifacts are retained for 7 days, older ones are cleaned up automatically.
app/src/main/java/com/whispertranscriber/
├── MainActivity.kt # Home screen, nav, permissions
├── audio/
│ └── AudioRecorder.kt # Mic recording → WAV conversion
├── data/
│ ├── SettingsStore.kt # DataStore-backed preferences
│ └── TranscriptionLog.kt # Transcription history (last 100)
├── network/
│ ├── WhisperApiClient.kt # REST fallback via OpenAI-compatible API
│ └── WhisperLiveKitClient.kt # Native WebSocket streaming client
├── update/ # GitHub Release update checker/downloader/installer
├── service/
│ ├── FloatingOverlayService.kt # Bubble UI + record/transcribe flow
│ └── TranscriberAccessibilityService.kt # Types text into focused fields
└── ui/
├── LogScreen.kt # Transcription history viewer
├── SettingsScreen.kt # Server URL + audio quality config
└── theme/Theme.kt # Material 3 theme
whisper-client/ contains an async Rust library for calling a Whisper API with either API key auth or Cashu ecash payment (using cdk 0.8). This is a standalone library, not used by the Android app.
let client = WhisperClient::new("https://whisper.example.com".into());
// With API key
let result = client.transcribe_with_key(
"sk-...", audio_bytes, "recording.wav", TranscribeOptions::default()
).await?;
// With Cashu payment (10 sats/minute)
let result = client.transcribe_with_cashu(
&wallet, 10, audio_bytes, "recording.wav", TranscribeOptions::default()
).await?;
println!("{}", result.text);The app checks a rolling GitHub Release manifest at app-latest. When a newer versionCode is available, it downloads app.apk, verifies size and SHA-256, then hands off to Android's package installer. Android still requires user approval, and APK signing must stay consistent between builds. CI publishes the release-signed APK directly as app.apk.
- HTTP / ws:// works out of the box to any IP (cleartext traffic is allowed via network security config)
- HTTPS with self-signed certs works — the client trusts all certificates (this is a private VPN tool, not a public app)
- Works over Tailscale, ZeroTier, or any VPN — just use the VPN IP as the server URL
- Recording prefers the active headset microphone when Android exposes one, then wired/USB headsets, then the built-in mic
- Long-press the overlay to open the panel, then tap SPEAK to read the current clipboard with the selected Kokoro voice
Use the probe script to verify a deployed WhisperLiveKit server with a known 16 kHz mono WAV:
python3 scripts/whisperlivekit_live_probe.py \
--url http://100.101.157.56:8090 \
--wav /tmp/jfk.wav \
--expect countryThe app needs the server WebSocket config to report "useAudioWorklet": true for real-time Android PCM streaming. Start WhisperLiveKit with --pcm-input for that mode.
- Kotlin + Jetpack Compose + Material 3
- OkHttp for network
- DataStore for preferences
- Target SDK 34, min SDK 26
- Gradle 8.5, AGP 8.2.2