Record, transcribe, and restructure audio via CLI - microphone/loopback capture, automatic chunking, parallel transcription, and template-based formatting.
- Installation
- Quick Start
- Features
- How it Works
- CLI Reference
- Environment Variables
- Configuration
- Templates
- Troubleshooting
- Known Limitations
- Contributing
go install github.com/alnah/go-transcript/cmd/transcript@latestOther installation methods
git clone https://github.com/alnah/go-transcript.git
cd go-transcript
make buildDownload pre-built binaries from GitHub Releases.
- Go 1.25+
- FFmpeg (downloaded automatically on first run)
- OpenAI API key
Note: FFmpeg is auto-downloaded for macOS (arm64/amd64), Linux (amd64), and Windows (amd64). Set
FFMPEG_PATHto use a custom binary.
Create a .env file in your working directory (auto-loaded on startup):
# Copy the example file
cp .env.example .env
# Then edit with your API keysThe .env.example file contains:
OPENAI_API_KEY=sk-your-key-here # Required for transcription
DEEPSEEK_API_KEY=sk-your-key-here # Required for restructuring (default provider)Or export directly:
export OPENAI_API_KEY=sk-...
export DEEPSEEK_API_KEY=sk-... # Only needed if using --template# Set your API keys
export OPENAI_API_KEY=sk-...
export DEEPSEEK_API_KEY=sk-... # For restructuring
# Record and transcribe a meeting
transcript live -d 1h -o meeting.md -t meeting
# Transcribe an existing recording
transcript transcribe recording.ogg -o notes.md -t brainstorm
# Record system audio (video call)
transcript record -d 30m -s -o call.ogg
# Restructure an existing transcript
transcript structure raw_notes.md -t lecture -o lecture.md- Audio recording - Microphone, system audio (loopback), or both mixed
- Automatic chunking - Splits at silences to respect OpenAI's 25MB limit
- Parallel transcription - Concurrent API requests (configurable 1-10)
- Template restructuring -
brainstorm,meeting,lecture,notesformats - Multi-provider support - OpenAI or DeepSeek for restructuring
- Language support - Specify audio language, translate output
- Graceful interrupts - Ctrl+C stops recording, continues transcription
┌─────────┐ ┌─────────┐ ┌────────────┐ ┌─────────────┐ ┌────────┐
│ Audio │───▶│ FFmpeg │───▶│ Chunking │───▶│ Transcribe │───▶│ Output │
│ Input │ │ Record │ │ (silences) │ │ (OpenAI) │ │ .md │
└─────────┘ └─────────┘ └────────────┘ └─────────────┘ └────────┘
│ │ │
│ ▼ │
│ ┌─────────────┐ │
│ │ Restructure │────────────┘
│ │ (DeepSeek/ │
│ │ OpenAI) │
│ └─────────────┘
│
├── Microphone (default)
├── System audio (-s/--system-record)
└── Both mixed (--mix)
- Record: Capture audio via FFmpeg (mic, system audio, or mixed)
- Chunk: Split at natural silences to respect OpenAI's 25MB limit
- Transcribe: Parallel API calls to OpenAI (
gpt-4o-mini-transcribe) - Restructure (optional): Format with template via DeepSeek or OpenAI
transcript <command> [flags]
Commands:
record Record audio to file
transcribe Transcribe audio file to text
live Record and transcribe in one step
structure Restructure an existing transcript
config Manage configuration
help Help about any command
version Show version informationRecord audio from microphone, system audio, or both.
transcript record -d 2h -o session.ogg # Microphone
transcript record -d 30m -s -o system.ogg # System audio
transcript record -d 1h --mix -o meeting.ogg # Both mixedAll flags
| Flag | Short | Default | Description |
|---|---|---|---|
--duration |
-d |
required | Recording duration (e.g., 30s, 5m, 2h) |
--output |
-o |
recording_<timestamp>.ogg |
Output file path |
--device |
system default | Specific audio input device | |
--system-record |
-s |
false |
Capture system audio instead of microphone |
--mix |
false |
Capture both microphone and system audio |
--system-record and --mix are mutually exclusive.
Transcribe an existing audio file.
transcript transcribe audio.ogg -o notes.md
transcript transcribe lecture.mp3 -o notes.md -t lecture
transcript transcribe french.ogg -o notes.md -l fr -T en -t meetingAll flags
| Flag | Short | Default | Description |
|---|---|---|---|
--output |
-o |
<input>.md |
Output file path |
--template |
-t |
Restructure template: brainstorm, meeting, lecture, notes |
|
--provider |
deepseek |
LLM provider for restructuring: deepseek, openai |
|
--language |
-l |
auto-detect | Audio language (ISO 639-1: en, fr, pt-BR) |
--translate |
-T |
same as input | Translate output to language (requires --template) |
--parallel |
-p |
10 |
Max concurrent API requests (1-10) |
--diarize |
false |
Enable speaker identification |
--translate requires --template.
Record and transcribe in one step. Press Ctrl+C to stop recording early and continue with transcription. Press Ctrl+C twice within 2 seconds to abort entirely.
transcript live -d 30m -o notes.md
transcript live -d 1h -o meeting.md -t meeting -k # Keep audio
transcript live -d 1h -s -t meeting # System audio
transcript live -d 1h -t meeting -K # Keep audio + raw transcriptAll flags
Inherits all flags from record and transcribe, plus:
| Flag | Short | Default | Description |
|---|---|---|---|
--keep-audio |
-k |
false |
Preserve the audio file after transcription |
--keep-raw-transcript |
-r |
false |
Keep raw transcript before restructuring (requires --template) |
--keep-all |
-K |
false |
Keep both audio and raw transcript (equivalent to -k -r) |
Restructure an existing transcript file using a template. Useful for re-processing raw transcripts generated without --template.
transcript structure meeting_raw.md -t meeting -o meeting.md
transcript structure notes.md -t brainstorm
transcript structure lecture.md -t lecture -T fr # Translate to French
transcript structure raw.md -t notes --provider openaiAll flags
| Flag | Short | Default | Description |
|---|---|---|---|
--output |
-o |
<input>_structured.md |
Output file path |
--template |
-t |
required | Restructure template: brainstorm, meeting, lecture, notes |
--provider |
deepseek |
LLM provider for restructuring: deepseek, openai |
|
--translate |
-T |
same as input | Translate output to language (ISO 639-1: en, fr) |
Manage persistent configuration.
transcript config set output-dir ~/Documents/transcripts
transcript config get output-dir
transcript config listExit codes
| Code | Name | Description |
|---|---|---|
| 0 | Success | Operation completed successfully |
| 1 | General | Unexpected or unclassified error |
| 2 | Usage | Invalid flags or arguments |
| 3 | Setup | FFmpeg not found, API key missing, no audio device |
| 4 | Validation | Unsupported format, file not found, invalid language |
| 5 | Transcription | Rate limit, quota exceeded, auth failed |
| 6 | Restructure | Transcript exceeds token limit |
| 130 | Interrupt | Aborted via Ctrl+C |
Priority: CLI flags > environment variables > config file > defaults
| Variable | Required | Default | Description |
|---|---|---|---|
OPENAI_API_KEY |
Yes | OpenAI API key for transcription (and restructuring with --provider openai) |
|
DEEPSEEK_API_KEY |
No | DeepSeek API key (required when using --template with default provider) |
|
TRANSCRIPT_OUTPUT_DIR |
No | . |
Default output directory |
FFMPEG_PATH |
No | auto | Path to FFmpeg binary (skips auto-download) |
Tip: Place a
.envfile in your working directory with these variables. It will be auto-loaded on startup via godotenv. See.env.examplefor reference.
Config files are stored in the user config directory:
| OS | Config Directory |
|---|---|
| Linux | ~/.config/go-transcript/ |
| macOS | ~/.config/go-transcript/ |
| Windows | %APPDATA%\go-transcript\ |
Respects XDG_CONFIG_HOME if set.
| Key | Description |
|---|---|
output-dir |
Default directory for output files |
Example config file
# ~/.config/go-transcript/config
output-dir=/Users/john/Documents/transcriptsTemplates transform raw transcripts into structured markdown.
| Template | Purpose | Output Structure |
|---|---|---|
brainstorm |
Idea generation sessions | H1 topic, H2 themes, bullet points, key insights, actions |
meeting |
Meeting notes | H1 subject, participants, topics discussed, decisions, action items |
lecture |
Course/conference lectures | Readable prose with H1/H2/H3 headers, bold key terms |
notes |
Bullet-point lecture notes | H2 thematic headers, hierarchical bullet points, bold terms |
Templates output English by default. Use --translate / -T to translate:
transcript transcribe audio.ogg -t meeting -T frRestructuring uses DeepSeek (deepseek-reasoner) by default because it delivers excellent results at a fraction of the cost. Use OpenAI (o4-mini) for faster processing:
# Default: DeepSeek (slower, cheaper, excellent quality)
transcript transcribe audio.ogg -t lecture
# OpenAI (faster, more expensive)
transcript transcribe audio.ogg -t lecture --provider openai| Model | Input (per 1M tokens) | Output (per 1M tokens) | Notes |
|---|---|---|---|
gpt-4o-mini-transcribe |
$2.50 | $10.00 | Transcription |
o4-mini |
$1.10 | $4.40 | OpenAI restructuring (100K max output) |
deepseek-reasoner |
$0.21 | $0.32 | DeepSeek restructuring (64K max output) |
Cost estimates (assuming ~150 words/minute, ~200 tokens/minute):
| Operation | 1 hour recording | Cost estimate |
|---|---|---|
| Transcription only | ~12K tokens | ~$0.15 |
| Transcription + restructuring (DeepSeek) | ~12K + ~15K tokens | ~$0.16 |
| Transcription + restructuring (OpenAI) | ~12K + ~15K tokens | ~$0.22 |
DeepSeek is ~10x cheaper for restructuring with comparable quality. It's slower (can take several minutes for long transcripts), but the cost savings are significant for heavy usage.
Use -K (or --keep-all) to preserve intermediate files:
# Keep both audio and raw transcript for re-processing
transcript live -d 1h -t meeting -K -o meeting.mdThis produces three files:
meeting.md- the restructured outputmeeting.ogg- the audio recording (from-k)meeting_raw.md- the raw transcript before restructuring (from-r)
This allows you to:
- Re-transcribe if the initial transcription quality is poor
- Re-restructure with a different template without re-transcribing
- Try multiple templates on the same transcript (e.g.,
lecturevsnotes) - Switch providers to compare DeepSeek vs OpenAI results
# Re-restructure an existing transcript with a different template
transcript structure meeting_raw.md -t notes -o meeting_notes.md
# Try OpenAI instead of DeepSeek
transcript structure meeting_raw.md -t meeting --provider openai -o meeting_openai.mdOpenAI accepts: ogg, mp3, wav, m4a, flac, mp4, mpeg, mpga, webm
Recording output is always OGG Vorbis (16kHz mono, ~50kbps) optimized for voice.
FFmpeg is auto-downloaded on first run. If download fails:
# macOS
brew install ffmpeg
# Ubuntu/Debian
sudo apt install ffmpeg
# Windows
winget install ffmpegOr set FFMPEG_PATH to your binary location.
System audio capture requires a virtual audio driver:
macOS - BlackHole
brew install --cask blackhole-2chImportant: BlackHole is a "black hole" - audio sent to it is NOT audible. To hear audio while recording:
- Open "Audio MIDI Setup" (Spotlight search)
- Click "+" > "Create Multi-Output Device"
- Check both your speakers AND BlackHole 2ch
- Set this Multi-Output as your system output
Linux - PulseAudio/PipeWire
Usually pre-installed. Loopback uses the monitor device of your default sink.
# Verify PulseAudio is working
pactl get-default-sink
# Install if missing
sudo apt install pulseaudio pulseaudio-utilsWindows - Stereo Mix or VB-Cable
Option 1 - Enable Stereo Mix (recommended):
- Right-click speaker icon > Sound settings > More sound settings
- Recording tab > Right-click > Show Disabled Devices
- Enable "Stereo Mix" if present
Option 2 - Install VB-Audio Virtual Cable:
Download from: https://vb-audio.com/Cable/
| Error | Cause | Solution |
|---|---|---|
| "OPENAI_API_KEY not set" | Missing API key | export OPENAI_API_KEY=sk-... |
| "DEEPSEEK_API_KEY not set" | Missing key for DeepSeek | export DEEPSEEK_API_KEY=sk-... |
| "rate limit exceeded" | Too many requests | Reduce --parallel or wait |
| "quota exceeded" | Billing issue | Check OpenAI/DeepSeek account billing |
| "authentication failed" | Invalid API key | Verify your API key |
Output token limits depend on the restructuring provider:
| Provider | Max output tokens |
|---|---|
o4-mini (OpenAI) |
100,000 |
deepseek-reasoner |
64,000 |
For very long recordings:
- Skip restructuring (no
--template) and usestructurecommand later - Split the audio file manually
- Use shorter recording sessions
- Try
--provider openaifor higher token limit
| Not Supported | Why |
|---|---|
| Real-time streaming | Uses batch API, not Realtime API |
| Offline mode | Requires internet (cloud APIs only) |
| Video input | Audio extraction not implemented |
| Issue | Solution |
|---|---|
| No loopback on Linux without PulseAudio | Install pulseaudio |
| BlackHole mutes audio on macOS | Create Multi-Output Device |
| Stereo Mix disabled on Windows | Enable in Sound settings |
This project is currently in active development and not accepting external contributions.
Feel free to:
- Open issues for bug reports
- Suggest features via issues
- Fork for personal use
See CONTRIBUTING.md for development setup, or docs/ for architecture and project layout.
See: BSD-3-Clause.