Skip to content

Feature Request: Add SenseVoice/FunASR as STT option #40

@LauraGPT

Description

@LauraGPT

Hi! Verbi's modular architecture for experimenting with different STT/LLM/TTS components is excellent.

I'd like to suggest adding SenseVoice as a new STT option. It fits Verbi's modular philosophy well:

Why SenseVoice:

  • 5x faster than Whisper large-v3 (234M non-autoregressive model)
  • Emotion detection built-in — useful for adjusting assistant behavior
  • Audio event detection — laugh, applause, music, cough, etc.
  • 50+ languages with strong Chinese/English/Japanese/Korean support
  • Simple pip install funasr — no extra dependencies

Integration example:

from funasr import AutoModel

model = AutoModel(model="iic/SenseVoiceSmall")
result = model.generate(input="audio.wav")
text = result[0]["text"]

Or via OpenAI-compatible API:

funasr-server --device cuda
# POST http://localhost:8000/v1/audio/transcriptions

Would be a great addition to the existing Deepgram/AssemblyAI/Groq STT options.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions