Skip to content

building speech segmentation heuristics with silero vad #194

@lukeocodes

Description

@lukeocodes

Integration: Silero VAD Speech Segmentation with Deepgram STT

What this should show

A Python example demonstrating how to use Silero VAD (Voice Activity Detection) to segment audio into speech regions, then send those segments to Deepgram STT for transcription. This covers a common pre-processing pipeline: detect speech boundaries with Silero VAD, extract speech segments, and transcribe each segment with Deepgram.

Key features to demonstrate:

  • Loading and running the Silero VAD model (via torch or silero-vad package)
  • Processing audio to detect speech vs. silence boundaries
  • Applying segmentation heuristics (min speech duration, min silence gap, padding)
  • Sending detected speech segments to Deepgram for transcription
  • Reconstructing a timeline of transcribed segments

Credentials likely needed

  • DEEPGRAM_API_KEY (Silero VAD runs locally, no additional API key needed)

Original request:

What's on your mind?

building speech segmentation heuristics with silero vad

Any extra context? (optional)

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    action:generateAction: ready for code generationpriority:userUser-submitted suggestion — builds before bot-queued examplesqueue:new-exampleQueue: build a new example

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions