Ham to Text

Offline speech-to-text for ham radio and MARS (Military Auxiliary Radio System) audio. Processes pre-recorded files and live audio streams, optimized for narrowband HF SSB voice.

Features

Fully offline — no cloud dependencies
Progressive JSON streaming output (JSONL)
Optimized for narrowband ham/MARS radio audio (8kHz-16kHz SSB)
WebRTC VAD with energy-based fallback for reliable speech detection
Optional spectral gating denoiser (noisereduce)
SoX preprocessing with configurable EQ, bandpass, and compression
Conversational context carries between segments for better accuracy
Cross-platform: macOS, Windows, Linux
Configurable via TOML files or CLI flags

Requirements

Python 3.11+
uv
SoX installed and on PATH

Install SoX

# macOS
brew install sox

# Ubuntu/Debian
sudo apt install sox

# Windows
choco install sox

Quick Start

Clone the repo and run directly with uv run — no install step needed:

git clone https://github.com/shadowcodex/ham-to-text.git
cd ham-to-text

# Transcribe a file
uv run ham-to-text file audio.wav

# With noisereduce denoiser (recommended)
uv run --extra noisereduce ham-to-text file audio.wav --denoiser noisereduce

# Transcribe with JSON output
uv run ham-to-text file audio.wav --json

# Stream from default microphone (requires stream extra)
uv run --extra stream ham-to-text stream --json

# List audio devices
uv run --extra stream ham-to-text devices

# Use a different model
uv run ham-to-text file audio.wav --model small

Optional Extras

# With noisereduce denoiser (recommended for ham/MARS audio)
uv run --extra noisereduce ham-to-text file audio.wav --denoiser noisereduce

# With live streaming support
uv run --extra stream ham-to-text stream

# With all extras
uv run --extra all ham-to-text file audio.wav

Denoisers

Denoiser	Install	Best For
`none`	Built-in	Clean signals, no processing needed
`noisereduce`	`--extra noisereduce`	Recommended. Narrowband ham/MARS audio (8-16kHz). Spectral gating, lightweight
`deepfilter`	`--extra deepfilter`	Wideband (48kHz) speech. Not recommended for narrowband radio audio

Set the denoiser via CLI flag or config:

uv run --extra noisereduce ham-to-text file audio.wav --denoiser noisereduce

Whisper Models

This project uses faster-whisper (CTranslate2). The default model is distil-large-v3. Models are downloaded automatically on first use (~1-3 GB depending on size).

Model	Size	Speed	Accuracy	Best For
`tiny`	~75 MB	Fastest	Low	Quick testing
`base`	~150 MB	Very fast	Fair	Low-resource machines
`small`	~500 MB	Fast	Good	General use
`medium`	~1.5 GB	Moderate	Very good	Better accuracy
`large-v3`	~3 GB	Slow	Best	Maximum accuracy
`distil-large-v3`	~1.5 GB	Fast	Very good	Default — best speed/accuracy tradeoff

Set the model via CLI flag or config file:

uv run ham-to-text file audio.wav --model small

Configuration

Create a hamstt.toml in your working directory or ~/.config/hamstt/config.toml for global settings.

Precedence (highest wins): CLI flags > --config file > ./hamstt.toml > ~/.config/hamstt/config.toml > defaults

Example `hamstt.toml`

[whisper]
model = "distil-large-v3"    # See model table above
language = "en"
beam_size = 5
best_of = 5
temperature = 0.0
compute_type = "int8"        # "int8", "float16", "float32"
device = "cpu"               # "cpu" or "cuda"
context_segments = 5         # Prior segments fed as context (0 to disable)

[denoiser]
name = "noisereduce"         # "none", "noisereduce", or "deepfilter"

[noisereduce]
stationary = false           # false = non-stationary mode (better for varying radio noise)
prop_decrease = 0.75         # Noise reduction strength (0.0-1.0)
n_fft = 512                  # FFT size
time_constant_s = 2.0        # Smoothing window

[sox]
highpass_hz = 200            # High-pass filter cutoff
lowpass_hz = 3400            # Low-pass filter cutoff
eq_center_hz = 1800          # Clarity EQ center frequency (0 boost to disable)
eq_boost_db = 6.0            # Clarity EQ boost in dB
norm_level_db = -3.0         # Normalization level

[vad]
filter = true                # Enable voice activity detection
aggressiveness = 0           # 0 = least aggressive (more speech), 3 = most aggressive
frame_ms = 30                # Frame size: 10, 20, or 30 ms
min_silence_ms = 300         # Min silence to split segments
speech_pad_ms = 300          # Padding around speech segments
energy_threshold = 0.02      # RMS threshold for energy-based gap recovery

[deepfilter]
attenuation_limit = 80.0     # Max noise suppression in dB
post_filter = true           # Extra suppression of noisy bins

[streaming]
chunk_duration_s = 0.5
buffer_duration_s = 30.0
silence_timeout_s = 1.5
sample_rate = 44100
# input_device = 0           # Audio device index (from `devices` command)

You can also point to a specific config file:

uv run ham-to-text file audio.wav --config my-config.toml

Debugging Audio Stages

Use --debug-audio to save intermediate WAV files after each pipeline stage:

uv run --extra noisereduce ham-to-text file audio.wav --denoiser noisereduce --debug-audio /tmp/debug

This produces:

/tmp/debug/
├── 00_input.wav                  # Raw input audio
├── 01_sox_preprocess.wav         # After bandpass/EQ/compand/normalize
├── 02_noisereduce_seg000.wav     # After denoiser (per VAD segment)
├── 02_noisereduce_seg001.wav
└── ...

See docs/audio-processing-guide.md for detailed tuning guidance.

JSON Output Format

Output is newline-delimited JSON (JSONL). Each line has a "type" field:

{"type":"transcription","text":"CQ CQ this is W1AW","is_valid":true,...}
{"type":"error","error":"Device not found","code":"STREAM_ERROR"}

Development

# Run tests
uv run pytest                                # fast tests
uv run pytest -m slow                        # include model-loading tests
uv run pytest -m requires_sox                # include SoX integration tests
uv run pytest --audio-file recording.wav     # test with real audio files

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
.claude		.claude
docs		docs
ham_to_text		ham_to_text
tests		tests
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
hamstt.toml.example		hamstt.toml.example
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Ham to Text

Features

Requirements

Install SoX

Quick Start

Optional Extras

Denoisers

Whisper Models

Configuration

Example `hamstt.toml`

Debugging Audio Stages

JSON Output Format

Development

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Ham to Text

Features

Requirements

Install SoX

Quick Start

Optional Extras

Denoisers

Whisper Models

Configuration

Example hamstt.toml

Debugging Audio Stages

JSON Output Format

Development

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Example `hamstt.toml`

Packages