Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions examples/360-kotlin-android-live-transcription/.env.example
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
# Deepgram — https://console.deepgram.com/
DEEPGRAM_API_KEY=
83 changes: 83 additions & 0 deletions examples/360-kotlin-android-live-transcription/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
# Kotlin Android Live Transcription

A native Android app built with Kotlin and Jetpack Compose that captures microphone audio and streams it to Deepgram for real-time speech-to-text transcription. Interim results appear as you speak, and finalized text accumulates on screen.

## What you'll build

A Kotlin Android app with a single-screen Jetpack Compose UI: tap "Start Recording" to capture 16 kHz mono PCM audio from the device microphone, stream it over WebSocket to Deepgram's live transcription API using the official Java SDK, and see both interim (partial) and final transcription results displayed in real time.

## Prerequisites

- Android Studio Hedgehog (2023.1.1) or later
- Android SDK 26+ (Android 8.0 Oreo)
- A physical Android device (emulator microphone input is unreliable)
- Deepgram account — [get a free API key](https://console.deepgram.com/)

## Environment variables

Copy `.env.example` to `.env` and fill in your key:

| Variable | Where to find it |
|----------|-----------------|
| `DEEPGRAM_API_KEY` | [Deepgram console](https://console.deepgram.com/) |

Pass the key to the build via Gradle property or environment variable:

```bash
# Option 1: environment variable
export DEEPGRAM_API_KEY="your-key-here"

# Option 2: gradle.properties (local.properties for per-machine config)
echo "DEEPGRAM_API_KEY=your-key-here" >> local.properties
```

> **Production note:** Never ship API keys in mobile binaries. Use a backend proxy that issues short-lived Deepgram temporary keys.

## Install and run

```bash
# Clone and open in Android Studio
cd examples/360-kotlin-android-live-transcription
# Set your API key
export DEEPGRAM_API_KEY="your-key-here"
# Build and install on connected device
./gradlew installDebug
```

Or open the project in Android Studio, set the `DEEPGRAM_API_KEY` in `local.properties`, and press Run.

## Key parameters

| Parameter | Value | Description |
|-----------|-------|-------------|
| `model` | `nova-3` | Deepgram's flagship STT model — best accuracy for general audio |
| `encoding` | `linear16` | Raw PCM 16-bit — matches Android's AudioRecord output directly |
| `sample_rate` | `16000` | 16 kHz mono — optimal for speech, keeps bandwidth low |
| `interim_results` | `true` | Shows partial transcripts while the user is still speaking |
| `smart_format` | `true` | Adds punctuation, capitalization, and number formatting |
| `tag` | `deepgram-examples` | Identifies example traffic in the Deepgram console |

## How it works

1. The app requests `RECORD_AUDIO` permission via the Jetpack Compose permission launcher
2. `AudioRecorder` creates an `AudioRecord` instance at 16 kHz mono LINEAR16 and emits ~100ms chunks as a Kotlin `Flow<ByteArray>`
3. `TranscriptionViewModel` creates a `DeepgramClient` using the API key from `BuildConfig`
4. The SDK's `V1WebSocketClient` opens a WebSocket to `wss://api.deepgram.com/v1/listen` with nova-3 model, linear16 encoding, and `tag=deepgram-examples`
5. Audio chunks from the `Flow` are sent to the WebSocket via `ws.send(bytes)`
6. The `onResults` callback receives both interim and final transcription results
7. Interim text is shown in italics; finalized text is appended to the main transcript
8. Tapping "Stop" cancels the coroutine, closes the WebSocket, and stops `AudioRecord`

## Project structure

| File | Purpose |
|------|---------|
| `app/src/main/.../MainActivity.kt` | Entry point — sets Compose content |
| `app/src/main/.../TranscriptionScreen.kt` | Compose UI — record button, live transcript display |
| `app/src/main/.../TranscriptionViewModel.kt` | Connects AudioRecorder → Deepgram WebSocket → UI state |
| `app/src/main/.../AudioRecorder.kt` | Wraps Android AudioRecord as a Kotlin Flow of PCM chunks |
| `app/build.gradle.kts` | Dependencies including `deepgram-java-sdk:0.2.0` |

## Starter templates

[deepgram-starters](https://github.com/orgs/deepgram-starters/repositories)
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
plugins {
id("com.android.application")
id("org.jetbrains.kotlin.android")
id("org.jetbrains.kotlin.plugin.compose")
}

android {
namespace = "com.deepgram.example.livetranscription"
compileSdk = 35

defaultConfig {
applicationId = "com.deepgram.example.livetranscription"
minSdk = 26
targetSdk = 35
versionCode = 1
versionName = "1.0"

val deepgramApiKey: String = project.findProperty("DEEPGRAM_API_KEY") as? String
?: System.getenv("DEEPGRAM_API_KEY")
?: ""
buildConfigField("String", "DEEPGRAM_API_KEY", "\"$deepgramApiKey\"")
}

buildFeatures {
compose = true
buildConfig = true
}

compileOptions {
sourceCompatibility = JavaVersion.VERSION_17
targetCompatibility = JavaVersion.VERSION_17
}

kotlinOptions {
jvmTarget = "17"
}
}

dependencies {
implementation("com.deepgram:deepgram-java-sdk:0.2.0")

implementation(platform("androidx.compose:compose-bom:2024.12.01"))
implementation("androidx.compose.ui:ui")
implementation("androidx.compose.material3:material3")
implementation("androidx.compose.ui:ui-tooling-preview")
implementation("androidx.activity:activity-compose:1.9.3")
implementation("androidx.lifecycle:lifecycle-viewmodel-compose:2.8.7")
implementation("androidx.lifecycle:lifecycle-runtime-compose:2.8.7")
implementation("androidx.core:core-ktx:1.15.0")
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
<?xml version="1.0" encoding="utf-8"?>
<manifest xmlns:android="http://schemas.android.com/apk/res/android">

<uses-permission android:name="android.permission.RECORD_AUDIO" />
<uses-permission android:name="android.permission.INTERNET" />

<application
android:allowBackup="false"
android:label="@string/app_name"
android:supportsRtl="true"
android:theme="@style/Theme.Material3.DayNight.NoActionBar">
<activity
android:name=".MainActivity"
android:exported="true">
<intent-filter>
<action android:name="android.intent.action.MAIN" />
<category android:name="android.intent.category.LAUNCHER" />
</intent-filter>
</activity>
</application>

</manifest>
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
package com.deepgram.example.livetranscription

import android.Manifest
import android.content.Context
import android.content.pm.PackageManager
import android.media.AudioFormat
import android.media.AudioRecord
import android.media.MediaRecorder
import androidx.core.content.ContextCompat
import kotlinx.coroutines.Dispatchers
import kotlinx.coroutines.flow.Flow
import kotlinx.coroutines.flow.flow
import kotlinx.coroutines.flow.flowOn
import kotlinx.coroutines.isActive
import kotlin.coroutines.coroutineContext

// 16 kHz mono LINEAR16 — matches Deepgram's most efficient input format
private const val SAMPLE_RATE = 16_000
private const val CHANNEL_CONFIG = AudioFormat.CHANNEL_IN_MONO
private const val AUDIO_FORMAT = AudioFormat.ENCODING_PCM_16BIT

class AudioRecorder(private val context: Context) {

private var audioRecord: AudioRecord? = null

fun hasPermission(): Boolean =
ContextCompat.checkSelfPermission(context, Manifest.permission.RECORD_AUDIO) ==
PackageManager.PERMISSION_GRANTED

// Streams raw PCM chunks as byte arrays — each chunk is ~100ms of audio
fun recordChunks(): Flow<ByteArray> = flow {
val bufferSize = maxOf(
AudioRecord.getMinBufferSize(SAMPLE_RATE, CHANNEL_CONFIG, AUDIO_FORMAT),
SAMPLE_RATE * 2 // ← 1 second of 16-bit mono at 16 kHz = 32 000 bytes
)

val recorder = AudioRecord(
MediaRecorder.AudioSource.MIC,
SAMPLE_RATE,
CHANNEL_CONFIG,
AUDIO_FORMAT,
bufferSize
).also { audioRecord = it }

// ~100ms read chunks keep latency low without overwhelming the WebSocket
val chunkSize = SAMPLE_RATE * 2 / 10
val buffer = ByteArray(chunkSize)

recorder.startRecording()
try {
while (coroutineContext.isActive) {
val read = recorder.read(buffer, 0, chunkSize)
if (read > 0) {
emit(buffer.copyOf(read))
}
}
} finally {
recorder.stop()
recorder.release()
audioRecord = null
}
}.flowOn(Dispatchers.IO)

fun stop() {
audioRecord?.stop()
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
package com.deepgram.example.livetranscription

import android.os.Bundle
import androidx.activity.ComponentActivity
import androidx.activity.compose.setContent
import androidx.compose.foundation.layout.fillMaxSize
import androidx.compose.material3.MaterialTheme
import androidx.compose.material3.Surface
import androidx.compose.ui.Modifier

class MainActivity : ComponentActivity() {
override fun onCreate(savedInstanceState: Bundle?) {
super.onCreate(savedInstanceState)
setContent {
MaterialTheme {
Surface(modifier = Modifier.fillMaxSize()) {
TranscriptionScreen()
}
}
}
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,115 @@
package com.deepgram.example.livetranscription

import android.Manifest
import androidx.activity.compose.rememberLauncherForActivityResult
import androidx.activity.result.contract.ActivityResultContracts
import androidx.compose.foundation.layout.Arrangement
import androidx.compose.foundation.layout.Column
import androidx.compose.foundation.layout.Row
import androidx.compose.foundation.layout.Spacer
import androidx.compose.foundation.layout.fillMaxSize
import androidx.compose.foundation.layout.fillMaxWidth
import androidx.compose.foundation.layout.height
import androidx.compose.foundation.layout.padding
import androidx.compose.foundation.rememberScrollState
import androidx.compose.foundation.verticalScroll
import androidx.compose.material3.Button
import androidx.compose.material3.ButtonDefaults
import androidx.compose.material3.Card
import androidx.compose.material3.CardDefaults
import androidx.compose.material3.MaterialTheme
import androidx.compose.material3.OutlinedButton
import androidx.compose.material3.Text
import androidx.compose.runtime.Composable
import androidx.compose.runtime.getValue
import androidx.compose.ui.Alignment
import androidx.compose.ui.Modifier
import androidx.compose.ui.text.SpanStyle
import androidx.compose.ui.text.buildAnnotatedString
import androidx.compose.ui.text.font.FontStyle
import androidx.compose.ui.text.withStyle
import androidx.compose.ui.unit.dp
import androidx.lifecycle.compose.collectAsStateWithLifecycle
import androidx.lifecycle.viewmodel.compose.viewModel

@Composable
fun TranscriptionScreen(viewModel: TranscriptionViewModel = viewModel()) {
val state by viewModel.uiState.collectAsStateWithLifecycle()

val permissionLauncher = rememberLauncherForActivityResult(
ActivityResultContracts.RequestPermission()
) { granted ->
if (granted) viewModel.toggleRecording()
}

Column(
modifier = Modifier
.fillMaxSize()
.padding(24.dp),
horizontalAlignment = Alignment.CenterHorizontally
) {
Text(
text = "Deepgram Live Transcription",
style = MaterialTheme.typography.headlineSmall
)

Spacer(modifier = Modifier.height(24.dp))

Row(horizontalArrangement = Arrangement.spacedBy(12.dp)) {
Button(
onClick = {
if (!state.isRecording && !AudioRecorder(viewModel.getApplication()).hasPermission()) {
permissionLauncher.launch(Manifest.permission.RECORD_AUDIO)
} else {
viewModel.toggleRecording()
}
},
colors = if (state.isRecording) {
ButtonDefaults.buttonColors(containerColor = MaterialTheme.colorScheme.error)
} else {
ButtonDefaults.buttonColors()
}
) {
Text(if (state.isRecording) "Stop" else "Start Recording")
}

OutlinedButton(onClick = { viewModel.clearTranscript() }) {
Text("Clear")
}
}

state.error?.let { error ->
Spacer(modifier = Modifier.height(12.dp))
Text(text = error, color = MaterialTheme.colorScheme.error)
}

Spacer(modifier = Modifier.height(24.dp))

Card(
modifier = Modifier
.fillMaxWidth()
.weight(1f),
colors = CardDefaults.cardColors(
containerColor = MaterialTheme.colorScheme.surfaceVariant
)
) {
val scrollState = rememberScrollState()

Text(
text = buildAnnotatedString {
append(state.transcript)
if (state.interimText.isNotBlank()) {
append(" ")
withStyle(SpanStyle(fontStyle = FontStyle.Italic, color = MaterialTheme.colorScheme.outline)) {
append(state.interimText)
}
}
},
modifier = Modifier
.padding(16.dp)
.verticalScroll(scrollState),
style = MaterialTheme.typography.bodyLarge
)
}
}
}
Loading