Skip to content

Feature request: Add FunASR for ASR with built-in speaker diarization #10

@LauraGPT

Description

@LauraGPT

Feature Request

I'd like to suggest considering FunASR as an ASR option for Offmute, particularly because it provides a complete speech processing pipeline with built-in speaker diarization.

Why this fits Offmute

Offmute handles meeting transcription and diarization. FunASR provides all the components needed in a single toolkit — no need to stitch together separate VAD, ASR, and diarization systems:

  1. VAD (FSMN-VAD): Robust voice activity detection
  2. ASR (Paraformer): Fast, accurate speech recognition
  3. Speaker Diarization (CAM++): State-of-the-art speaker clustering
  4. Punctuation: Automatic punctuation restoration
  5. Timestamps: Word-level and sentence-level timing

Key technical advantages

  • End-to-end pipeline: All components work together out of the box
  • CAM++ diarization: Competitive with pyannote on meeting scenarios
  • Fast inference: Non-autoregressive Paraformer is significantly faster than autoregressive models
  • Simple API: pip install funasr — unified interface for the full pipeline
  • No API keys: Fully local, aligns with privacy-conscious meeting tools

Integration potential

Could serve as an alternative or complement to the current LLM-based approach, potentially reducing compute requirements while providing structured speaker-attributed transcripts.

Link

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions