GitHub - ttlequals0/MinusPod: MinusPod is a self-hosted server that removes ads before you ever hit play

MinusPod is a self-hosted server that removes ads before you ever hit play. It transcribes episodes with Whisper, uses an LLM to detect and cut ad segments, and gets smarter over time by building cross-episode ad patterns and learning from your corrections. Bring your own LLM -- Claude, Ollama, OpenRouter, or any OpenAI-compatible provider.

How It Works
Advanced Features (Quick Reference)
Requirements
Quick Start
Upgrading to 2.0.0+
Web Interface
- Ad Editor Workflow
- Screenshots
Configuration
Finding Podcast RSS Feeds
Usage
- Audiobookshelf
Environment Variables
- Using Claude Code Wrapper (Max Subscription)
Using Ollama (Local or Cloud)
Whisper / Transcription
Using OpenRouter
LLM Pricing
API
Webhooks
Remote Access / Security
Data Storage
Custom Assets (Optional)
Disclaimer

How It Works

Transcription - Whisper converts audio to text with timestamps (local GPU via faster-whisper, or remote API via OpenAI-compatible endpoint)
Ad Detection - An LLM analyzes the transcript to identify ad segments, with an automatic verification pass
Audio Processing - FFmpeg removes detected ads and inserts short audio markers
Serving - Flask serves modified RSS feeds and processed audio files

Processing happens on-demand when you play an episode, or automatically when new episodes appear. An episode is processed once; processing time depends on episode length, hardware, and chosen models. After processing, the output is stored on disk and served directly on subsequent plays.

Advanced Features (Quick Reference)

Feature	Description	Enable In
Verification Pass	Post-cut re-detection catches missed ads by re-transcribing processed audio	Automatic
Audio Enforcement	Volume and transition signals programmatically validate and extend ad detections	Automatic
Pattern Learning	System learns from corrections, patterns promote from podcast to network to global scope	Automatic
Confidence Thresholds	>=80% confidence: cut (configurable); 50-79%: kept for review; <50%: rejected	Automatic

See detailed sections below for configuration and usage.

Verification Pass

After the first pass detects and removes ads, a verification pipeline runs on the processed audio:

Re-transcribe - The processed audio is re-transcribed on CPU using Whisper
Audio Analysis - Volume analysis and transition detection run on the processed audio
LLM Detection - A "what doesn't belong" prompt detects any remaining ad content
Audio Enforcement - Programmatic signal matching validates and extends detections
Re-cut - If missed ads are found, the pass 1 output is re-cut directly

Each detected ad shows a badge indicating which stage found it:

First Pass (blue) - Found during first pass detection
Audio Enforced (orange) - Found by programmatic audio signal matching
Verification (purple) - Found by the post-cut verification pass

The verification model can be configured separately from the first pass model in Settings.

Sliding Window Processing

For long episodes, transcripts are processed in overlapping 10-minute windows:

Window Size - 10 minutes of transcript per API call
Overlap - 3 minutes between windows ensures ads at boundaries aren't missed
Deduplication - Ads detected in multiple windows are automatically merged

A 60-minute episode is processed as 9 overlapping windows, with duplicate detections merged.

Processing Queue

To prevent memory issues from concurrent processing, episodes are processed one at a time:

Only one episode processes at a time (Whisper + FFmpeg are memory-intensive)
Processing runs in a background thread, keeping the UI responsive
Episodes stuck in "processing" status reset automatically on server restart
View and cancel processing episodes in Settings

When you request an episode that needs processing:

If nothing is processing, it starts in the background and returns HTTP 503 with Retry-After: 30
If another episode is currently processing, it returns HTTP 503 with Retry-After: 30
If the queue is busy and the episode gets queued behind another, it returns HTTP 503 with Retry-After: 60
Once processed, subsequent requests serve the stored file directly from disk

HEAD requests (sent by podcast apps like Pocket Casts during feed refresh) proxy headers from the upstream audio source without triggering processing. This prevents feed refreshes from flooding the processing queue.

Post-Detection Validation

After ad detection, a validation layer reviews each detection before audio processing:

Duration checks - Rejects ads outside configurable duration limits
Confidence thresholds - Rejects very low confidence detections; only cuts ads above the minimum confidence threshold (adjustable in Settings)
Position heuristics - Boosts confidence for typical ad positions (pre-roll, mid-roll, post-roll)
Transcript verification - Checks for sponsor names and ad signals in the transcript
Auto-correction - Merges ads with tiny gaps, clamps boundaries to valid range

Ads are classified as:

ACCEPT - High confidence, removed from audio
REVIEW - Medium confidence, removed but flagged for review
REJECT - Too short/long, low confidence, or missing ad signals - kept in audio

Rejected ads appear in a separate "Rejected Detections" section in the UI so you can verify the validator's decisions.

Pattern Learning

When an ad is detected and validated, text patterns are extracted and stored for future matching.

Pattern Hierarchy:

Global Patterns - Match across all podcasts (e.g., common sponsors like Squarespace, BetterHelp)
Network Patterns - Match within a podcast network (TWiT, Relay FM, Gimlet, etc.)
Podcast Patterns - Match only for a specific podcast

When processing new episodes, the system first checks for known patterns before sending to the LLM. Patterns with high confirmation counts and low false positive rates are matched with high confidence.

Pattern Sources:

Audio Fingerprinting - Identifies DAI-inserted ads using Chromaprint acoustic fingerprints
Text Pattern Matching - TF-IDF similarity and fuzzy matching against learned patterns
LLM Analysis - Falls back to AI analysis for uncovered segments

User Corrections: In the ad editor, you can confirm, reject, or adjust detected ads:

Confirm - Creates/updates patterns in the database, incrementing confirmation count
Adjust Boundaries - Corrects start/end times for an ad; also creates patterns from adjusted boundaries (like confirm), so the learned pattern text matches the corrected range
Mark as Not Ad - Flags as false positive and stores the transcript text. Similar text is automatically excluded in future episodes of the same podcast using TF-IDF similarity matching (cross-episode false positive learning)

Pattern Management: Access the Patterns page from the navigation bar to:

View all patterns with their scope, sponsor, and statistics
Filter by scope (Global, Network, Podcast) or search by sponsor name
Toggle patterns active/inactive
View confirmation and false positive counts

Real-Time Processing Status

A global status bar shows real-time processing progress via Server-Sent Events. It displays the current episode title, processing stage (Transcribing, Detecting Ads, Processing Audio), a progress bar, and queue depth. Click it to navigate to the processing episode.

Reprocessing Modes

When reprocessing an episode from the UI, two modes are available:

Reprocess (default) -- uses learned patterns from the pattern database plus LLM analysis
Full Analysis -- skips the pattern database entirely for a fresh LLM-only analysis

Full Analysis is useful when you want to re-evaluate an episode without learned patterns (e.g., after disabling patterns that caused false positives).

Audio Analysis

Audio analysis runs automatically on every episode (lightweight, uses only ffmpeg):

Volume Analysis - Detects loudness anomalies using EBU R128 measurement. Identifies sections mastered at different levels than the content baseline.
Transition Detection - Finds abrupt frame-to-frame loudness jumps that indicate dynamically inserted ad (DAI) boundaries. Pairs up/down transitions into candidate ad regions.
Audio Enforcement - After LLM detection, uncovered audio signals with ad language in the transcript are promoted to ads. DAI transitions with high confidence (>=0.8) or sponsor matches are also promoted. Existing ad boundaries are extended when signals partially overlap.

Requirements

Docker with NVIDIA GPU support (for local Whisper), or a remote Whisper backend (no GPU needed)
Anthropic API key, OpenRouter API key, Ollama for local inference, or any OpenAI-compatible endpoint

Memory Requirements

GPU VRAM:

Whisper Model	VRAM Required
tiny	~1 GB
base	~1 GB
small	~2 GB
medium	~4 GB
large-v3	~5-6 GB
turbo	~5 GB

System RAM:

Episode Length	RAM Required
< 1 hour	8 GB
1-2 hours	8 GB
2-4 hours	12 GB
> 4 hours	16 GB

Quick Start

# 1. Create environment file
cat > .env << EOF
ANTHROPIC_API_KEY=your-key-here
BASE_URL=http://localhost:8000
APP_PASSWORD=your-password
MINUSPOD_MASTER_PASSPHRASE=long-random-string-you-will-not-lose
EOF

# 2. Create data directory
mkdir -p data

# 3. Run
docker-compose up -d

Access the web UI at http://localhost:8000/ui/ to add and manage feeds.

MINUSPOD_MASTER_PASSPHRASE is strongly recommended for production. Without it, provider API keys go into the database as plaintext. Setting it later migrates existing plaintext rows to enc:v1: encrypted storage on the next boot, with a mandatory pre-migration SQLite snapshot in data/backups/. Restoring a backup requires the same passphrase that created it, so pick a long random value and keep it somewhere separate from the database.

Upgrading to 2.0.0+

2.0.0 is a security hardening release. A docker pull && restart on a 1.x data volume boots without config changes, but several defaults tightened so a few setups need env-var tweaks. Full detail in CHANGELOG.md.

Likely to bite you if you do nothing:

Plain HTTP: SESSION_COOKIE_SECURE now defaults to true, so browsers drop the session cookie. Login looks like it works then the next request is anonymous. Set SESSION_COOKIE_SECURE=false if you're not on HTTPS.
Behind a reverse proxy (Cloudflare, cloudflared, nginx, Traefik): set MINUSPOD_TRUSTED_PROXY_COUNT=1. Without it, login lockout and per-IP rate limits silently never fire -- the container sees the proxy hop instead of the client. A startup WARN flags it. Full impact in Remote Access / Security > Client IP for login lockout.
External API clients (cron scripts, homegrown tools, third-party integrations): every POST / PUT / DELETE on /api/v1/* now needs an X-CSRF-Token header matching the minuspod_csrf cookie. The built-in UI handles it; raw curl scripts have to echo the cookie.
/docs and /openapi.yaml bookmarks / health checks: moved to /api/v1/docs and /api/v1/openapi.yaml. The old paths return 404.
OpenAI-compatible provider relying on the ANTHROPIC_API_KEY fallback: that fallback is gone. Set OPENAI_API_KEY explicitly or ad detection 401s. A startup WARN fires when the old shape is detected.

Quieter changes worth knowing:

SESSION_COOKIE_SAMESITE now Strict. Flip to Lax only if a specific cross-site integration breaks.
Frontend and API must share an origin (flask-cors was removed). Put them behind the same reverse proxy.
Password minimum is now 12 characters. Existing hashes verify fine; the next password change picks up the new minimum.
Saving provider keys via PUT /api/v1/settings/ad-detection returns 409 unless MINUSPOD_MASTER_PASSPHRASE is set. Existing plaintext keys keep working; the setting endpoint refuses to write a new secret in cleartext.
Container runs as UID 1000. First boot chowns the data volume in place. Override with APP_UID / APP_GID or bypass with docker run --user <N> if the host volume belongs to a different UID.

Web Interface

The server includes a web-based management UI at /ui/:

Dashboard with feed artwork and episode counts
Add feeds by RSS URL with optional episode cap
Feed management: refresh, delete, copy URLs, set network override, per-feed episode cap
Episode discovery: all episodes surface on refresh, process any episode from the feed detail page
Bulk actions: select multiple episodes to process, reprocess, reprocess (full), or delete
Sort by publish date, episode number, or creation date; paginated (25/50/100/500 per page)
Pattern management: view and manage cross-episode ad patterns with sponsor names
Processing history with stats, filtering by podcast, and CSV/JSON export
Stats dashboard with charts: avg/min/max metrics, top podcasts by ads, episodes by day, token usage, sortable podcast table
Settings for LLM provider, AI models, ad detection prompts, retention, system stats, token usage and cost
Real-time status bar showing processing progress across all pages
OPML export with original or ad-free (modified) feed URLs
Per-feed auto-process control (enable, disable, or use global default)
Webhook notifications for processed episodes and auth failures
Podcast search via PodcastIndex.org
Multiple dark themes (Tokyo Night, Dracula, Catppuccin, Nord, Gruvbox, Solarized, and more) with light/dark toggle
Installable as Progressive Web App (PWA)

Ad Editor Workflow

The ad editor supports two review modes, selected by a toggle above the ads list:

Processed (default) -- plays the post-cut output so you can verify what the final listener will hear. Ad timestamps map onto the new timeline.
Original -- plays the pre-cut download at the ad's original timestamps, so you can hear exactly what was removed.

Original mode requires the pre-cut audio to have been retained. That's controlled by the "Keep original audio for ad boundary review" toggle under Settings > Storage & Retention (default on). Keeping originals roughly doubles per-episode storage; disable it if disk is tight. Episodes processed before v1.6.0 have no retained original -- the toggle is disabled (with a tooltip) until you reprocess.

The Original Transcript panel on the Episode Detail page shows the full pre-cut transcript so you can see exactly what text was identified and removed.

Ad Editor

The ad editor lets you review and adjust ad detections in the browser. The layout is mobile-first since that's where most reviewing happens.

Each ad shows why it was flagged, confidence percentage, and detection stage. You can adjust start/end boundaries with per-second steppers, navigate between ads by timestamp, and play audio inline (auto-seeks to ad start when switching). Boundary adjustments and actions trigger haptic feedback on mobile.

On mobile, start/end controls stack full-width with a bottom sheet for playback and prev/next navigation. Action row: Not Ad, Reset, Confirm, Save.

On desktop, keyboard shortcuts are available: Space play/pause, J/K nudge end, Shift+J/K nudge start, C confirm, X reject, Esc reset. Start/end controls sit inline with keyboard hints.

Screenshots

Dashboard

Desktop	Mobile

Feed Detail

Desktop	Mobile

Episode Detail

Desktop	Mobile

Detected Ads

Desktop	Mobile

Ad Editor

Desktop	Mobile

Ad Patterns

Desktop	Mobile

History

Desktop	Mobile

Stats

Desktop	Mobile

Settings

Desktop	Mobile

API Documentation

Configuration

All configuration is managed through the web UI or REST API. No config files needed.

Adding Feeds

Open http://your-server:8000/ui/
Click "Add Feed"
Enter the podcast RSS URL
Optionally set a custom slug (URL path)

Ad Detection Settings

Customize ad detection in Settings:

LLM Provider - Switch between Anthropic (direct API), OpenRouter, Ollama (local), or OpenAI-compatible endpoints at runtime without restarting the container
AI Model - Model for first pass ad detection
Verification Model - Separate model for the post-cut verification pass
Chapters Model - Model for chapter generation (a small model like Haiku works well here)
Audio Bitrate - Output bitrate for processed audio (default 128k)
System Prompts - Customizable prompts for first pass and verification detection

Provider API Keys

You can set the Anthropic, OpenAI-compatible, OpenRouter, Ollama, and remote Whisper keys from the UI (Settings > LLM Provider and Settings > Transcription) or via PUT /api/v1/settings/providers/<name>. No container restart needed. Keys are encrypted with AES-256-GCM.

Two things have to be in place first:

MINUSPOD_MASTER_PASSPHRASE set in the container environment. PBKDF2 derives the encryption key from it, so treat it like any other production secret: back it up, keep it stable, don't commit it. To rotate, use Settings > Security > Provider Key Encryption (or POST /api/v1/settings/providers/rotate-passphrase). The call re-encrypts every stored key in one transaction, then you must update the env var to the new value before the next restart — otherwise the next boot won't decrypt anything.
An admin password set in the UI, so Settings is reachable. The password gates the surface only; it isn't part of the crypto. Changing it leaves stored keys untouched.

If the passphrase is missing, the key inputs collapse to a "Setup required" note, the API returns 409 provider_crypto_unavailable, and env-var credentials keep working. GET responses never include key values, only booleans plus a db/env/none source marker.

Finding Podcast RSS Feeds

MinusPod includes a built-in podcast search powered by PodcastIndex.org. Search by name directly from the Add Feed page. To enable search, get free API credentials at api.podcastindex.org/signup and add them in Settings > Podcast Search.

You can also find RSS feeds manually:

Podcast website - Look for "RSS" link in footer or subscription options
Apple Podcasts - Search on podcastindex.org using the Apple Podcasts URL
Spotify-exclusive - Not available (Spotify doesn't expose RSS feeds)
Hosting platforms - Common patterns:
- Libsyn: https://showname.libsyn.com/rss
- Spreaker: https://www.spreaker.com/show/{id}/episodes/feed
- Omny: Check page source for omnycontent.com URLs

Usage

Add your modified feed URL to any podcast app:

http://your-server:8000/your-feed-slug

The feed URL is shown in the web UI and can be copied to clipboard.

Audiobookshelf

If using Audiobookshelf as your podcast client, its SSRF protection will block requests to MinusPod when running on a local/private network. Add your MinusPod hostname or IP to Audiobookshelf's whitelist:

SSRF_REQUEST_FILTER_WHITELIST=minuspod.local,192.168.1.100

This is a comma-separated list of domains excluded from Audiobookshelf's SSRF filter. See Audiobookshelf Security docs for details.

Environment Variables

Grouped by how often you'll touch them. Standard is what a typical deployment sets; Security is the 2.0.0+ hardening surface; Advanced are tuning knobs for edge cases; Optional are opt-in features.

Standard

Variable	Default	Description
`ANTHROPIC_API_KEY`	(none)	Claude API key (required when `LLM_PROVIDER=anthropic`, not needed for Ollama)
`LLM_PROVIDER`	`anthropic`	LLM backend: `anthropic`, `openrouter`, `openai-compatible`, or `ollama`
`OPENROUTER_API_KEY`	(none)	OpenRouter API key (required when `LLM_PROVIDER=openrouter`)
`OPENAI_BASE_URL`	`http://localhost:8000/v1`	Base URL for OpenAI-compatible API (only used with non-anthropic providers)
`OPENROUTER_BASE_URL`	`https://openrouter.ai/api/v1`	OpenRouter API base URL. Override for private proxies or regional endpoints.
`ANTHROPIC_BASE_URL`	(anthropic default)	Anthropic API base URL. Override for private proxies.
`OPENAI_API_KEY`	`not-needed`	API key for OpenAI-compatible endpoint (not required for Ollama or local wrappers)
`OPENAI_MODEL`	(none)	Model for OpenAI-compatible/Ollama providers. Required for Ollama (e.g. `qwen3:14b`). Defaults to `claude-sonnet-4-5-20250929` for openai-compatible if unset.
`BASE_URL`	`http://localhost:8000`	Public URL for generated feed links
`UI_BASE_URL`	(falls back to BASE_URL)	Public URL for UI links in webhooks (set if UI is on a different domain than feeds)
`WHISPER_MODEL`	`small`	Whisper model size. `tiny`, `base`, `small`, `medium`, `large-v3`, `turbo`, plus `.en` variants
`WHISPER_DEVICE`	`cuda`	`cuda` or `cpu`. Set to `cpu` when using API backend to skip GPU init.
`WHISPER_BACKEND`	`local`	`local` (faster-whisper) or `openai-api` (remote HTTP)
`WHISPER_API_BASE_URL`	(none)	Base URL for OpenAI-compatible whisper API
`WHISPER_API_KEY`	(none)	API key for whisper API
`WHISPER_API_MODEL`	`whisper-1`	Model name sent to whisper API
`WHISPER_LANGUAGE`	`en`	ISO 639-1 language code, or `auto`. Seeds fresh installs only; runtime value is in Settings > Transcription.
`APP_PASSWORD`	(none)	Initial password for web UI (can also be set in Settings > Security)
`OLLAMA_API_KEY`	(none)	Ollama Cloud key. Leave unset for local.
`PODCAST_INDEX_API_KEY`	(none)	PodcastIndex.org API key for podcast search
`PODCAST_INDEX_API_SECRET`	(none)	PodcastIndex.org API secret
`LOG_LEVEL`	`INFO`	`DEBUG`, `INFO`, `WARNING`, or `ERROR`
`LOG_FORMAT`	`text`	`text` or `json`. JSON output plays nicely with log aggregators (Loki, CloudWatch).
`DATA_DIR`	`/app/data`	Data storage directory. Aliases `DATA_PATH` and `MINUSPOD_DATA_DIR` are also honored.

Security

Variable	Default	Description
`MINUSPOD_MASTER_PASSPHRASE`	(unset)	Unlocks encrypted provider-key store. Strongly recommended for production; first boot migrates any plaintext rows to `enc:v1:`. Losing it makes stored keys unrecoverable (env fallback still works).
`SESSION_COOKIE_SECURE`	`true`	Set to `false` only when serving over plain HTTP.
`SESSION_COOKIE_SAMESITE`	`Strict`	Override to `Lax` only if a specific integration breaks.
`MINUSPOD_ENABLE_HSTS`	`false`	Set to `true` once the deployment is HTTPS-only. HSTS traps browsers so don't flip this on a dual-protocol setup.
`MINUSPOD_TRUSTED_PROXY_COUNT`	`0`	Reverse-proxy hops to trust when reading `X-Forwarded-For`. `1` behind Cloudflare / cloudflared / nginx / Traefik, higher for a multi-proxy chain. Leaving this at `0` behind a proxy breaks login lockout (the proxy IP is private/loopback, which the lockout excludes) and per-IP rate limits (they key on the proxy instead of the client); audit logs + auth-failure webhooks also carry the wrong IP. Startup logs a WARN when unset.

Advanced

Variable	Default	Description
`PROCESSING_SOFT_TIMEOUT`	`3600`	Seconds before a stuck job is auto-cleared. Seeds fresh installs; runtime value lives in Settings > Transcription.
`PROCESSING_HARD_TIMEOUT`	`7200`	Seconds before the processing lock is force-released. Must exceed the soft timeout.
`AD_DETECTION_MAX_TOKENS`	`2000`	Max tokens for LLM ad detection responses.
`MINUSPOD_MAX_ARTWORK_BYTES`	`5242880` (5 MB)	Cap on podcast artwork download size. Clamped to `[65536, 52428800]`.
`MINUSPOD_MAX_RSS_BYTES`	`209715200` (200 MB)	Cap on RSS response body size. Floor is 1 MB.
`RATE_LIMIT_STORAGE_URI`	`memory://`	Flask-limiter storage backend. Default is per-worker; set to `redis://host:6379` + run a Redis sidecar for exact declared limits across workers.
`APP_UID`	`1000`	UID gunicorn runs as inside the container. Override to match host volume ownership.
`APP_GID`	`1000`	GID counterpart to `APP_UID`.
`GUNICORN_WORKERS`	`2`	Worker count. Lower means single-threaded UI blocking during RSS refresh; higher multiplies per-worker rate-limit counters (when using `memory://`).
`GUNICORN_TIMEOUT`	`600`	Per-request hard timeout.
`GUNICORN_GRACEFUL_TIMEOUT`	`330`	Seconds between SIGTERM and SIGKILL on shutdown.
`SECRET_KEY`	(auto-generated)	Flask session signing key. If unset, a random value is generated on first boot and persisted at `$DATA_DIR/.secret_key`. Set explicitly only for multi-instance deployments sharing a session store. Rotating invalidates all existing sessions.
`SESSION_LIFETIME_HOURS`	`24`	How long authenticated sessions stay valid, in hours.

Optional

Variable	Default	Description
`TUNNEL_TOKEN`	(none)	Cloudflare tunnel token. See Remote Access / Security > Before enabling the tunnel profile.
`SENTRY_DSN`	(none)	Opt-in Sentry. Requires `sentry-sdk` installed. Headers, cookies, CSRF tokens, and credential-like query params are scrubbed before send; no performance tracing.
`MINUSPOD_RELEASE`	(none)	Optional release tag forwarded to Sentry.
`SENTRY_ENVIRONMENT`	`production`	Environment tag forwarded to Sentry.

Deprecated

Variable	Description
`RETENTION_PERIOD`	Legacy minutes-based retention. Auto-converted to days on first startup. Use Settings UI or `PUT /api/v1/settings/retention` instead.

Using Claude Code Wrapper (Max Subscription)

Instead of using API credits, you can use the Claude Code OpenAI Wrapper to use your Claude Max subscription instead.

Quick Start:

Start the wrapper service:
```
docker compose --profile wrapper up -d
```

Authenticate with Claude (first time only):

docker compose --profile wrapper run --rm claude-wrapper claude auth login

Configure minuspod to use the wrapper by updating your .env:

LLM_PROVIDER=openai-compatible
OPENAI_BASE_URL=http://claude-wrapper:8000/v1
OPENAI_API_KEY=not-needed

Restart minuspod:
```
docker compose up -d minuspod
```

Other OpenAI-Compatible Endpoints:

The openai-compatible provider can work with other endpoints by configuring OPENAI_BASE_URL and OPENAI_API_KEY accordingly. The model is selected via the Settings UI.

Example .env for OpenAI-compatible mode:

# LLM Configuration (OpenAI-compatible)
LLM_PROVIDER=openai-compatible
OPENAI_BASE_URL=http://claude-wrapper:8000/v1
OPENAI_API_KEY=not-needed

# Server Configuration
BASE_URL=http://localhost:8000

Note: The AI model is configured via the Settings UI, not environment variables.

Using Ollama (Local or Cloud)

Ollama is an alternative to the Anthropic API. MinusPod supports both flavors:

Local (http://host:11434) — no auth, no API costs, nothing leaves the machine.
Cloud (https://ollama.com/api) — same OpenAI-compatible endpoints, just with Authorization: Bearer <key> on every request. Free tier works for this pipeline. Grab a key at ollama.com/settings/keys.

Configuration is identical either way: pick Ollama in Settings > LLM Provider, set the Base URL, and (for Cloud) paste the key. The key is stored encrypted. Leave it blank for local.

Heads up about Ollama Cloud model selection

Ollama Cloud's /v1/models advertises the full Ollama library, including previews and local-only tags that Cloud won't actually route. The dropdown shows whatever the endpoint returns, so entries like gemma4:31b, kimi-k2:1t, and gpt-oss:120b can appear but 404 when called.

If an episode processes with zero ads and the logs show {"type":"error","error":{"type":"not_found_error"}}, the model isn't really on Cloud. The reliable list is at ollama.com/search?c=cloud. Cross-check the base name (before the :) before saving.

Setup

Install and start Ollama on your host machine
Pull a model (see recommendations below): ollama pull qwen3:14b
Update your docker-compose.yml:

environment:
  - LLM_PROVIDER=ollama
  - OPENAI_BASE_URL=http://host.docker.internal:11434/v1
  - OPENAI_MODEL=qwen3:14b

Linux users: host.docker.internal doesn't resolve by default on Linux. Add extra_hosts: ["host.docker.internal:host-gateway"] to your Docker service definition.

The OPENAI_API_KEY variable is not required for Ollama. Token counts will still be tracked in the UI but cost will always show as $0.00, which is accurate since local inference is free.

Recommended Models

Models are loaded sequentially, not concurrently -- VRAM requirements are not additive between passes.

Note: Detection quality varies between models and between runs of the same model. LLMs are non-deterministic, so identical inputs can yield different outputs. Expect variation, and treat the recommendations below as starting points rather than guarantees.

Pass 1 -- First Pass Detection

Hardest task. Contextual reasoning, host-read ads, new sponsors. Use your best model here.

VRAM	Model	Quantization	Notes
8GB	`qwen3:8b`	Q4_K_M	Entry level. Handles standard sponsor reads well.
12GB	`qwen3:14b`	Q4_K_M	Best quality-to-VRAM ratio. Recommended.
16GB	`qwen3:14b`	Q5_K_M	Higher quality quant; use if you have headroom.
24GB	`qwen3.5:27b`	Q4_K_M	Strong contextual reasoning. 256K context.
24GB	`qwen3.5:35b`	Q4_K_M	Best quality under 40GB. 256K context.
40GB+	`qwen3.5:122b`	Q4_K_M	In the same tier as Claude Sonnet for this task in author testing.

Verification Pass

Easier task. Looks for remnants in already-cut audio. Speed matters more than raw accuracy.

VRAM	Model	Quantization	Notes
8GB	`qwen3:4b`	Q8_0	Fast, good JSON compliance. Verification prompt is simpler.
12GB	`qwen3:8b`	Q5_K_M	Strong JSON compliance, faster than 14B.
16GB	`mistral-nemo:12b`	Q4_K_M	Excellent JSON reliability, fast inference.
24GB	`qwen3:14b`	Q5_K_M	Overkill for verification but uses available VRAM productively.

Chapters

Simplest task. Summarization only -- no structured detection. Minimize VRAM usage and latency.

VRAM	Model	Quantization	Notes
Any	`qwen3:4b`	Q4_K_M	Sufficient for summarization. Fast.
Any	`phi4-mini`	Q4_K_M	Lean alternative, strong instruction following.
Any	`llama3.2:3b`	Q4_K_M	Smallest viable option if VRAM is tight.

Example split for 16GB VRAM: Pass 1 -> qwen3:14b Q5_K_M / Verification -> qwen3:8b Q5_K_M / Chapters -> qwen3:4b Q4_K_M

Avoid models under 7B for production use. JSON reliability degrades significantly at smaller sizes, which causes silent detection failures rather than recoverable errors. See JSON Reliability Risks.

Accuracy vs. Claude

Switching to a local model will reduce detection accuracy. The impact depends on the content and model size.

What is unaffected: Audio fingerprinting, text pattern matching, pre/post-roll heuristics, and audio signal enforcement all run without the LLM. These catch a substantial portion of ads regardless of which model is used.

What is affected: The LLM passes (first pass and verification) handle the hard cases -- host-read ads that blend into content, new sponsors not yet in the pattern database, and ambiguous mid-rolls without explicit promo codes. This is where open-weights models fall short of Claude.

Content Type	Expected Impact
Podcasts with standard sponsor reads and promo codes	Minimal -- patterns and fingerprinting cover most of these
Podcasts with heavy host-read / conversational ad integrations	Noticeable -- these require strong contextual reasoning
New sponsors not yet in the pattern database	Moderate -- depends heavily on model capability

As a rough guide: a capable model like qwen3:14b will perform well on most podcasts. The gap becomes more apparent on shows where hosts weave sponsor content naturally into conversation without clear transitions.

JSON Reliability Risks

MinusPod's ad detection pipeline requires models to return structured JSON. The Anthropic API enforces this reliably. With Ollama, enforcement is model-dependent and failures are more likely.

How failures manifest:

Malformed JSON -- Missing brackets, trailing commas, or unquoted keys. The parser has multiple fallback strategies (direct parse, markdown code block extraction, regex scan) but structurally broken JSON will fall through all of them.
Truncated output -- Models under memory pressure or processing long transcript windows may cut off mid-response, producing valid-looking but incomplete JSON that fails to parse.
Preamble text -- Some models prefix their JSON with conversational text ("Sure, here are the ads I found:"). The parser handles this in most cases, but it adds fragility.

When a window fails to parse, those ads are silently missed. There is no error surfaced to the UI -- the episode will process normally but with gaps in detection coverage.

How to reduce this risk:

Use a model of at least 7B parameters
Prefer the Qwen3 or Mistral model families, which have strong JSON compliance
Avoid running other GPU workloads concurrently -- memory pressure increases truncation risk
Check processing logs for parse failures if detection quality seems lower than expected

How to check for failures:

Look for json_parse_failed or extraction_method entries in the application logs. A healthy run will show json_array_direct as the extraction method. Fallback methods (markdown_code_block, regex variants) indicate the model isn't returning clean JSON and you should consider upgrading to a larger model.

Whisper / Transcription

By default, MinusPod uses faster-whisper with a local NVIDIA GPU for transcription. If you don't have an NVIDIA GPU (e.g. Apple Silicon Mac), you can use any OpenAI-compatible whisper API as the transcription backend.

whisper.cpp with Docker (NVIDIA GPU)

A ready-to-use compose file is provided at docker-compose.whisper.yml. It runs whisper.cpp as a standalone GPU-accelerated transcription server.

1. Download the model:

git clone --depth 1 https://github.com/ggml-org/whisper.cpp
bash whisper.cpp/models/download-ggml-model.sh large-v3-turbo
mkdir -p models && mv whisper.cpp/models/ggml-large-v3-turbo.bin models/

Other models are available -- replace large-v3-turbo with tiny, base, small, medium, or large-v3. See the whisper.cpp models README for the full list.

2. Start the server:

docker compose -f docker-compose.whisper.yml up -d

3. Configure MinusPod (.env or docker-compose.yml):

WHISPER_BACKEND=openai-api
WHISPER_API_BASE_URL=http://whisper-server:8765/v1
WHISPER_DEVICE=cpu

If MinusPod and whisper-server are on the same Docker network, use the container name (whisper-server). If they are on separate hosts, use the host IP and the exposed port (http://your-server:8765/v1).

The --dtw large.v3.turbo flag enables word-level timestamps for precise ad boundary detection. On CUDA GPUs, --no-flash-attn is required alongside --dtw -- flash attention silently disables DTW, causing word-level timestamps to be missing from the API response. On Apple Silicon (Metal), this flag is not needed. WHISPER_DEVICE=cpu prevents MinusPod from attempting to initialize a local CUDA GPU. MinusPod already preprocesses audio to 16kHz mono WAV before sending it to the API, so the whisper.cpp --convert flag is not needed.

Warning: If you add --convert for use with other clients, be aware that whisper.cpp writes temporary converted files to the current working directory. In Docker, the default CWD may not be writable, causing whisper.cpp to silently return empty transcription results (200 with 0 segments). Set working_dir: /tmp in your compose file or mount a writable volume if you need --convert.

whisper.cpp on Apple Silicon (native)

whisper.cpp runs natively on Apple Silicon with Metal acceleration. Build from source or use Homebrew:

# Download model
git clone --depth 1 https://github.com/ggml-org/whisper.cpp
bash whisper.cpp/models/download-ggml-model.sh large-v3-turbo

# Build and run the server
cd whisper.cpp && make -j
./build/bin/whisper-server \
  --host 0.0.0.0 --port 8765 \
  --model models/ggml-large-v3-turbo.bin \
  --inference-path /v1/audio/transcriptions \
  --dtw large.v3.turbo

# Configure MinusPod
WHISPER_BACKEND=openai-api
WHISPER_API_BASE_URL=http://host.docker.internal:8765/v1
WHISPER_DEVICE=cpu

Linux users: Replace host.docker.internal with your host IP, or add extra_hosts: ["host.docker.internal:host-gateway"] to your Docker service definition.

Groq

Groq offers fast cloud-based whisper transcription:

WHISPER_BACKEND=openai-api
WHISPER_API_BASE_URL=https://api.groq.com/openai/v1
WHISPER_API_KEY=gsk_your_key_here
WHISPER_API_MODEL=whisper-large-v3-turbo
WHISPER_DEVICE=cpu

OpenAI Whisper API

WHISPER_BACKEND=openai-api
WHISPER_API_BASE_URL=https://api.openai.com/v1
WHISPER_API_KEY=sk-your_key_here
WHISPER_API_MODEL=whisper-1
WHISPER_DEVICE=cpu

All settings can also be configured via the Settings UI under the Transcription section.

Transcription language

Whisper is pinned to English by default. That keeps it from misdetecting on music intros or cold opens (a common failure mode on podcasts). If you run a non-English show, pick the language in Settings > Transcription or set WHISPER_LANGUAGE on first boot. Use auto for multilingual feeds -- Whisper will detect per request. Full list: supported languages.

Processing timeouts

Two knobs for long-running jobs, both in the same panel:

Soft timeout (default 60 min): how long a job can sit in the queue before it's treated as stuck and cleared. Jobs killed by a worker restart are cleared in seconds regardless, via the queue's flock probe.
Hard timeout (default 120 min): how long before the processing lock is force-released even when a worker still holds it. Backstop for a hung ffmpeg or runaway Whisper call. Must be greater than the soft timeout.

Three-hour CPU runs with the largest Whisper model hit these. When they fire, the log line names the setting to raise. Values live in the DB and take effect immediately; PROCESSING_SOFT_TIMEOUT and PROCESSING_HARD_TIMEOUT only seed fresh installs.

Using OpenRouter

OpenRouter is a unified API that routes to 200+ models (Claude, GPT, Gemini, open-weights) with one API key. OpenRouter is supported as an LLM provider only -- it does not support the /v1/audio/transcriptions endpoint required for Whisper transcription. For transcription without a GPU, use a remote Whisper backend such as Groq.

Setup

Get an API key from openrouter.ai/keys
Use the pre-configured compose file:

# Create .env
echo "OPENROUTER_API_KEY=sk-or-v1-your-key-here" > .env

# Start
docker compose -f docker-compose.openrouter.yml up -d

Or add OpenRouter to an existing setup:

LLM_PROVIDER=openrouter
OPENROUTER_API_KEY=sk-or-v1-your-key-here

Model Selection

Change the model in the Settings UI or with the OPENAI_MODEL env var. Any OpenRouter model ID works:

anthropic/claude-sonnet-4-5 -- Claude Sonnet via OpenRouter
openai/gpt-4o -- GPT-4o via OpenRouter
google/gemini-2.5-flash-preview -- Gemini Flash via OpenRouter

All of these can be changed at runtime from the Settings UI -- no container restart needed.

See docker-compose.openrouter.yml for a full working example.

LLM Pricing

MinusPod tracks token usage and cost for every LLM call. The Settings page and GET /api/v1/system/token-usage show per-model breakdowns.

Where pricing data comes from

Pricing is fetched automatically based on your configured provider:

Provider	Source	Method
Anthropic	pricepertoken.com	HTML scrape
OpenRouter	OpenRouter API (`/api/v1/models`)	JSON API
OpenAI, Groq, Mistral, DeepSeek, xAI, Together, Fireworks, Perplexity, Google	pricepertoken.com	HTML scrape
Ollama / localhost	N/A	Always $0

Pricing refreshes once every 24 hours in the background. You can also force a refresh from the API:

curl -X POST http://your-server:8000/api/v1/system/model-pricing/refresh

Or view current pricing:

curl http://your-server:8000/api/v1/system/model-pricing

How model matching works

Different sources use different names for the same model. A normalization step strips provider prefixes, date suffixes, and punctuation so that claude-sonnet-4-5-20250929 (Anthropic API), anthropic/claude-sonnet-4-5 (OpenRouter), and Claude Sonnet 4.5 (pricepertoken.com display name) all resolve to the same pricing entry.

Offline / air-gapped installs

If the pricing fetch fails on startup and no pricing data exists in the database, MinusPod seeds from a built-in table of Anthropic model prices. Non-Anthropic models will show $0 until the next successful fetch. Existing cached pricing in the database is never lost on fetch failure.

Pricing accuracy

Pricing data comes from third-party sources and may lag behind provider announcements. Check your provider's billing dashboard for authoritative cost figures. MinusPod's cost tracking is an estimate for convenience, not a billing system.

API

REST API available at /api/v1/. Interactive docs at /api/v1/docs. Full specification: openapi.yaml.

Key endpoints:

GET /api/v1/health - Readiness check (database, storage); returns 503 if either is down
GET /api/v1/health/live - Liveness probe (process up); always 200, safe for frequent polling
GET /api/v1/feeds - List all feeds
POST /api/v1/feeds - Add a new feed (supports maxEpisodes for RSS cap)
POST /api/v1/feeds/import-opml - Import feeds from OPML file
GET /api/v1/feeds/export-opml?mode=original|modified - Export feeds as OPML (original or ad-free URLs)
GET /api/v1/podcast-search?q=query - Search podcasts via PodcastIndex.org
GET /api/v1/feeds/{slug}/episodes - List episodes (supports sort_by, sort_dir, status filter, pagination)
POST /api/v1/feeds/{slug}/episodes/bulk - Bulk episode actions (process, reprocess, reprocess_full, delete)
GET /api/v1/feeds/{slug}/episodes/{id} - Get episode detail with ad markers and transcript
POST /api/v1/feeds/{slug}/episodes/{id}/reprocess - Force reprocess (supports mode: reprocess/full)
POST /api/v1/feeds/{slug}/episodes/{id}/cancel - Cancel processing for a stuck episode
POST /api/v1/feeds/{slug}/episodes/{id}/regenerate-chapters - Regenerate chapter markers
POST /api/v1/feeds/{slug}/reprocess-all - Batch reprocess all episodes
POST /api/v1/feeds/{slug}/episodes/{id}/retry-ad-detection - Retry ad detection only
POST /api/v1/feeds/{slug}/episodes/{id}/corrections - Submit ad corrections
GET /api/v1/patterns - List ad patterns (filter by scope)
GET /api/v1/patterns/stats - Pattern database statistics
GET /api/v1/sponsors - List/create/update/delete sponsors (full CRUD)
GET /api/v1/search?q=query - Full-text search across all content
GET /api/v1/episodes/processing - List episodes currently processing
GET /api/v1/history - Processing history with pagination and export
GET /api/v1/stats/dashboard - Aggregate stats (avg/min/max time saved, ads, cost, tokens) with optional podcast filter
GET /api/v1/stats/by-day - Episodes processed by day of week
GET /api/v1/stats/by-podcast - Per-podcast stats (ads, time saved, tokens, cost)
GET /api/v1/status - Current processing status
GET /api/v1/status/stream - SSE endpoint for real-time status updates
GET /api/v1/system/token-usage - LLM token usage and cost breakdown by model
GET /api/v1/system/model-pricing - All known LLM model pricing rates
POST /api/v1/system/model-pricing/refresh - Force refresh pricing from provider source
GET /api/v1/system/queue - Auto-process queue status
POST /api/v1/system/vacuum - Trigger SQLite VACUUM to reclaim disk space
GET /api/v1/system/backup - Download SQLite database backup
GET /api/v1/settings - Get current settings (includes LLM provider, API key status)
GET/PUT /api/v1/settings/retention - Get or update retention configuration (days, enabled/disabled)
GET/PUT /api/v1/settings/audio - Toggle whether originals are kept for ad editor review (keepOriginalAudio)
GET/PUT /api/v1/settings/processing-timeouts - Soft and hard processing timeouts in seconds
GET /api/v1/feeds/{slug}/episodes/{id}/original.mp3 - Stream the retained pre-cut audio (used by ad editor Review mode)
PUT /api/v1/settings/ad-detection - Update ad detection config (model, provider, prompts)
GET /api/v1/settings/models - List available AI models from current provider
POST /api/v1/settings/models/refresh - Force refresh model list from provider
GET/POST/PUT/DELETE /api/v1/settings/webhooks - Webhook CRUD
POST /api/v1/settings/webhooks/{id}/test - Fire test webhook
POST /api/v1/settings/webhooks/validate-template - Validate and preview a payload template

Webhooks

MinusPod fires an HTTP POST to configured URLs when episodes complete processing, permanently fail, or when LLM authentication fails. Works with any HTTP endpoint. Use a custom Jinja2 payload template to match the receiver's expected format.

Configure webhooks in Settings > Webhooks in the web UI, or via the REST API.

Events

Event	Fires when
`Episode Processed`	Episode completes processing successfully
`Episode Failed`	Episode reaches permanently failed status
`Auth Failure`	LLM provider returns 401/403 (rate-limited to one per 5 minutes)

Template Variables

Custom payload templates are Jinja2 strings rendered against these variables:

Variable	Type	Description
`event`	string	`Episode Processed`, `Episode Failed`, or `Auth Failure`
`timestamp`	string	ISO 8601 UTC timestamp
`podcast.name`	string	Podcast title (falls back to slug if unavailable)
`podcast.slug`	string	Feed slug
`episode.id`	string	Episode ID
`episode.title`	string	Episode title
`episode.slug`	string	Feed slug
`episode.url`	string	Full UI URL to episode
`episode.ads_removed`	int	Number of ads removed
`episode.processing_time_secs`	float	Processing duration in seconds
`episode.processing_time`	string	Processing duration formatted as M:SS or H:MM:SS
`episode.llm_cost`	float	LLM cost in USD
`episode.llm_cost_display`	string	LLM cost formatted as $X.XX
`episode.time_saved_secs`	float/null	Seconds of audio removed
`episode.time_saved`	string/null	Time saved formatted as M:SS or H:MM:SS
`episode.error_message`	string/null	Error message (failed events only)
`test`	bool	`true` only on test webhook fires; absent on real events

Auth Failure events use a different payload:

Variable	Type	Description
`event`	string	`Auth Failure`
`timestamp`	string	ISO 8601 UTC timestamp
`provider`	string	LLM provider name (anthropic, openrouter, etc.)
`model`	string	Model that failed authentication
`error_message`	string	Error details from the provider
`status_code`	int/null	HTTP status code (401 or 403)

Default Payloads

When no custom template is configured, MinusPod sends these JSON payloads.

Episode Processed:

{
  "event": "Episode Processed",
  "timestamp": "2026-04-12T00:15:42Z",
  "podcast": {
    "name": "My Favorite Podcast",
    "slug": "my-favorite-podcast"
  },
  "episode": {
    "id": "a1b2c3d4e5f6",
    "title": "Episode 42: The Answer",
    "slug": "my-favorite-podcast",
    "url": "http://your-server:8000/ui/feeds/my-favorite-podcast/episodes/a1b2c3d4e5f6",
    "ads_removed": 3,
    "processing_time_secs": 42.5,
    "processing_time": "0:42",
    "llm_cost": 0.0035,
    "llm_cost_display": "$0.00",
    "time_saved_secs": 187.0,
    "time_saved": "3:07",
    "error_message": null
  }
}

Episode Failed:

{
  "event": "Episode Failed",
  "timestamp": "2026-04-12T00:15:42Z",
  "podcast": {
    "name": "My Favorite Podcast",
    "slug": "my-favorite-podcast"
  },
  "episode": {
    "id": "a1b2c3d4e5f6",
    "title": "Episode 42: The Answer",
    "slug": "my-favorite-podcast",
    "url": "http://your-server:8000/ui/feeds/my-favorite-podcast/episodes/a1b2c3d4e5f6",
    "ads_removed": 0,
    "processing_time_secs": 12.3,
    "processing_time": "0:12",
    "llm_cost": 0.001,
    "llm_cost_display": "$0.00",
    "time_saved_secs": null,
    "time_saved": null,
    "error_message": "Transcription failed: audio file is corrupt or unsupported format"
  }
}

Auth Failure:

{
  "event": "Auth Failure",
  "timestamp": "2026-04-12T00:15:42Z",
  "provider": "anthropic",
  "model": "claude-sonnet-4-20250514",
  "error_message": "Invalid API key provided",
  "status_code": 401
}

Example: Pushover

Pushover supports native webhook ingestion with data extraction selectors. No custom payload template needed -- MinusPod's default JSON payload works directly.

Log in to pushover.net/dashboard, scroll to "Your Webhooks", click "Create a Webhook". Name it MinusPod.
Copy the unique webhook URL.
In MinusPod Settings > Webhooks: paste the URL, select events, leave payload template blank.
Click Test in MinusPod to fire a sample payload to Pushover.
In Pushover dashboard: click "Check for Update" in Last Payload to load MinusPod's JSON.
Configure data extraction selectors:

Field	Selector
Title	`{{podcast.name}} - {{event}}`
Body	`{{episode.title}}` `{{episode.ads_removed}} ads removed. Saved {{episode.time_saved}}. Cost {{episode.llm_cost_display}}`
URL	`{{episode.url}}`
URL Title	`Open in MinusPod`

Click "Test Selectors on Last Payload" to preview, then Save.

Pushover's {{...}} selector syntax is evaluated on Pushover's side -- these are not Jinja2 templates.

Example: ntfy

ntfy requires a custom payload template to match its expected JSON format.

Self-hosted or ntfy.sh -- set your topic name

Add a webhook in Settings > Webhooks:

URL: https://ntfy.sh/your-topic (or your self-hosted instance)

Payload template:

{
  "topic": "your-topic",
  "title": "{{ podcast.name }} - {{ episode.title }}",
  "message": "Removed {{ episode.ads_removed }} ads in {{ episode.processing_time }}. Cost {{ episode.llm_cost_display }}",
  "actions": [{"action": "view", "label": "Open Episode", "url": "{{ episode.url }}"}]
}

ntfy also supports header-based delivery (X-Title, X-Message, X-Click headers with plain text body) -- either approach works with MinusPod's template system.

Request Signing

If a webhook has a secret configured, MinusPod adds an X-MinusPod-Signature: sha256=<hmac> header to each POST, computed with HMAC-SHA256 over the request body.

Remote Access / Security

The docker-compose includes an optional Cloudflare tunnel service for secure remote access without port forwarding:

Create a tunnel at Cloudflare Zero Trust
Add TUNNEL_TOKEN to your .env file
Configure the tunnel to point to http://minuspod:8000

Before enabling the tunnel profile

The tunnel exposes the admin interface to the public internet. Without all of these set, anyone who reaches the tunnel URL can hit unauthenticated paths and attempt to log in:

Set a password via Settings > Security (or seed with APP_PASSWORD).
SESSION_COOKIE_SECURE=true (default in 2.0.0+).
MINUSPOD_MASTER_PASSPHRASE set so provider API keys are encrypted at rest.
Cloudflare WAF rule blocking /ui and /api (see below). The docs and OpenAPI spec live under /api/v1/docs and /api/v1/openapi.yaml, so they're already covered by the /api block.
MINUSPOD_TRUSTED_PROXY_COUNT=1 so login lockout keys on the real client IP, not the tunnel loopback.

Client IP for login lockout

The 2.0.0+ login lockout feature (5 fails / 15 min / 15 min block) keys on request.remote_addr. Depending on how traffic reaches the container, that address may or may not be the real client:

Direct exposure (no proxy, ports published): remote_addr is the client. No config needed.
Docker with published ports and no reverse proxy: remote_addr is the Docker bridge gateway; lockout will not fire. A startup WARN surfaces this. Deploy behind a proxy or switch to network_mode: host.
Behind Cloudflare, nginx, Traefik, or cloudflared: set MINUSPOD_TRUSTED_PROXY_COUNT=1. Cloudflare sets X-Forwarded-For automatically.
Multi-proxy chain (e.g., Cloudflare -> nginx -> MinusPod): set the count to the number of proxies you actually trust. Setting it too high lets an attacker spoof their client IP by prepending entries to X-Forwarded-For.

What happens if you leave MINUSPOD_TRUSTED_PROXY_COUNT=0 on a proxy-fronted deployment:

Login lockout never fires. Every failed login appears to come from the proxy's IP, which is private or loopback (Cloudflare tunnel loopback, Docker bridge gateway, nginx on 127.0.0.1, ...). The lockout excludes private IPs on purpose so NAT neighbors can't DoS each other, so it never triggers. Attackers can brute-force with no rate limit.
Per-IP rate limits degrade to per-proxy. POST /feeds (3/min), POST /system/cleanup (1/h), DELETE /system/queue (6/h), and the rest all key on the proxy as one client. One user can exhaust them for everyone; an attacker can't.
Audit logs carry the wrong IP. Every [ip] bracket in the access log is the proxy hop. Forensics are much harder.
Auth-failure webhooks carry the wrong IP in the clientIp field, so any Auth Failure alerting points at the proxy.

Startup logs a WARN (Running in a container without MINUSPOD_TRUSTED_PROXY_COUNT set ...) when the variable is unset. Treat the WARN as load-bearing: if you're behind a reverse proxy and it's still firing after a deploy, your lockout and rate limits are not working.

Security Recommendations

Operator checklist:

Serve over HTTPS (SESSION_COOKIE_SECURE=true is the default).
MINUSPOD_TRUSTED_PROXY_COUNT=1 if behind a reverse proxy.
MINUSPOD_MASTER_PASSPHRASE set so provider keys encrypt at rest.
MINUSPOD_ENABLE_HSTS=true once the deployment is HTTPS-only.
WAF block on /ui and /api. Public feed paths must stay reachable: /<slug>, /episodes/<slug>/<episode>.mp3, .vtt, /chapters.json, and /api/v1/feeds/<slug>/artwork.

2.0.0+ ships the rest by default: CSRF, login lockout, SSRF guards, artwork magic-number validation, XXE defense, baseline security headers, non-root container, rate limits on destructive endpoints. See CHANGELOG.md for the full list.

Cloudflare WAF example. Allow only Pocket Casts on the feed host, block admin paths:

(http.request.full_uri wildcard r"http*://feed.example.com/*" and not http.user_agent wildcard "*Pocket*Casts*") or starts_with(http.request.uri.path, "/ui") or starts_with(http.request.uri.path, "/api")

Swap the User-Agent pattern for your app (*Overcast*, *Castro*, *AntennaPod*, ...).

Rate limiting storage

Rate limits are tracked per worker (memory-backed), so with the default two workers each declared limit is effectively doubled. For exact limits or multi-host scaling, set RATE_LIMIT_STORAGE_URI=redis://redis:6379 and add a Redis sidecar. Don't drop below two workers -- the UI freezes during bulk RSS refresh with only one.

Request correlation

Every response carries an X-Request-ID header. If you supply one on the request (up to 128 chars), it's preserved; otherwise a 16-char hex value is generated. When reporting a bug, including the X-Request-ID from the affected response makes log lookup one grep instead of a guessing game. Aggregated log viewers can filter by the request_id field on the JSON log records.

Data Storage

All data is stored in the ./data directory:

podcast.db - SQLite database with feeds, episodes, and settings
{slug}/ - Per-feed directories with cached RSS and processed audio
backups/ - Pre-migration SQLite snapshots + periodic cleanup backups

Container user

Runs as UID 1000 (minuspod). First boot chowns the data volume, then drops privileges via gosu. Override with APP_UID / APP_GID if your host volume belongs to a different UID, or bypass entirely with docker run --user <N>.

Database backup sensitivity

The SQLite backup files produced by GET /api/v1/system/backup and by the periodic cleanup task contain:

Provider API keys (encrypted with MINUSPOD_MASTER_PASSPHRASE when set; plaintext legacy rows otherwise)
Flask session signing key
Webhook HMAC secrets
Password hash (scrypt)

Treat the file like a credential. When MINUSPOD_MASTER_PASSPHRASE is set, both backup paths produce AES-GCM encrypted .db.enc files, and restoring requires the same passphrase the source used. Without a passphrase, unencrypted .db files are written and a WARN is logged at creation time.

Decrypting a backup

Encrypted backup files (*.db.enc) use AES-GCM with a key derived from MINUSPOD_MASTER_PASSPHRASE via PBKDF2 (600k iterations, SHA-256). The per-instance salt is stored in the running DB's settings table (provider_crypto_salt), not in the backup envelope. Decryption needs three things:

The encrypted file.
The same MINUSPOD_MASTER_PASSPHRASE that produced it.
The live container's DB (for the salt row).

The ship-in-repo CLI handles this:

MINUSPOD_MASTER_PASSPHRASE=your-passphrase \
    python scripts/decrypt_backup.py /path/to/backup.db.enc /path/to/backup.db

Runs from inside the container or any host with the repo checked out and cryptography installed. DATA_PATH points at the running instance's data dir (default /app/data).

Important caveat: the salt is per-DB, not per-passphrase. If you rotate the passphrase (POST /api/v1/settings/providers/rotate-passphrase), old backups made under the previous passphrase are still decryptable, because rotation re-encrypts rows under a new salt + new DEK in place. But if you lose the DB entirely (full-volume loss, fresh install) and only have backup files, you cannot decrypt them even with the original passphrase — the salt is gone. Treat the passphrase and the DB together as the recovery bundle.

Unencrypted .db files are regular SQLite databases. Restore them with sqlite3 or by copying into place on a stopped instance.

Pattern import / export

Before doing a replace import, export first so there's a round-trip backup:

curl -b cookies.txt https://your-minuspod/api/v1/patterns/export?include_corrections=true > patterns-backup.json

POST /api/v1/patterns/import runs validation on the entire payload before any writes and wraps the delete/update/insert pass in a single BEGIN IMMEDIATE transaction. A malformed entry or mid-transaction error rolls back to the pre-import state; replace mode can no longer leave an empty pattern table on a bad payload. Modes are merge (update matches, add new), replace (wipe then import), or supplement (add new only).

Custom Assets (Optional)

By default, a short audio marker is played where ads were removed. You can customize this by providing your own replacement audio:

Create an assets directory next to your docker-compose.yml
Place your custom replace.mp3 file in the assets directory

Uncomment the assets volume mount in docker-compose.yml:

volumes:
  - ./data:/app/data
  - ./assets:/app/assets:ro  # Uncomment this line

Restart the container

The replace.mp3 file will be inserted at each ad break. Keep it short (1-3 seconds). If no custom asset is provided, the built-in default marker is used.

Disclaimer

This tool is for personal use only. Only use it with podcasts you have permission to modify or where such modification is permitted under applicable laws. Respect content creators and their terms of service.

LLM accuracy notice: Most testing and development has been done with Anthropic Claude models. Detection accuracy may vary when using other LLM providers (Ollama, OpenRouter with non-Claude models, OpenAI-compatible endpoints). See the Accuracy vs. Claude section for details.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 626 Commits
.githooks		.githooks
.github		.github
assets		assets
docs		docs
frontend		frontend
scripts		scripts
smoke		smoke
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.openrouter.yml		docker-compose.openrouter.yml
docker-compose.whisper.yml		docker-compose.whisper.yml
docker-compose.yml		docker-compose.yml
entrypoint.sh		entrypoint.sh
gunicorn.conf.py		gunicorn.conf.py
openapi.yaml		openapi.yaml
pytest.ini		pytest.ini
requirements.in		requirements.in
requirements.txt		requirements.txt
version.py		version.py

Folders and files

Latest commit

History

Repository files navigation

Table of Contents

How It Works

Advanced Features (Quick Reference)

Verification Pass

Sliding Window Processing

Processing Queue

Post-Detection Validation

Pattern Learning

Real-Time Processing Status

Reprocessing Modes

Audio Analysis

Requirements

Memory Requirements

Quick Start

Upgrading to 2.0.0+

Web Interface

Ad Editor Workflow

Ad Editor

Screenshots

Dashboard

Feed Detail

Episode Detail

Detected Ads

Ad Editor

Ad Patterns

History

Stats

Settings

API Documentation

Configuration

Adding Feeds

Ad Detection Settings

Provider API Keys

Finding Podcast RSS Feeds

Usage

Audiobookshelf

Environment Variables

Standard

Security

Advanced

Optional

Deprecated

Using Claude Code Wrapper (Max Subscription)

Using Ollama (Local or Cloud)

Heads up about Ollama Cloud model selection

Setup

Recommended Models

Pass 1 -- First Pass Detection

Verification Pass

Chapters

Accuracy vs. Claude

JSON Reliability Risks

Whisper / Transcription

whisper.cpp with Docker (NVIDIA GPU)

whisper.cpp on Apple Silicon (native)

Groq

OpenAI Whisper API

Transcription language

Processing timeouts

Using OpenRouter

Setup

Model Selection

LLM Pricing

Where pricing data comes from

How model matching works

Offline / air-gapped installs

Pricing accuracy

API

Webhooks

Events

Template Variables

Default Payloads

Example: Pushover

Example: ntfy

Request Signing

Remote Access / Security

Packages

Contributors