Feature Request: Add SenseVoice/FunASR as STT option

Hi! Verbi's modular architecture for experimenting with different STT/LLM/TTS components is excellent.

I'd like to suggest adding [SenseVoice](https://github.com/FunAudioLLM/SenseVoice) as a new STT option. It fits Verbi's modular philosophy well:

**Why SenseVoice:**
- **5x faster** than Whisper large-v3 (234M non-autoregressive model)
- **Emotion detection** built-in — useful for adjusting assistant behavior
- **Audio event detection** — laugh, applause, music, cough, etc.
- **50+ languages** with strong Chinese/English/Japanese/Korean support
- Simple `pip install funasr` — no extra dependencies

**Integration example:**

```python
from funasr import AutoModel

model = AutoModel(model="iic/SenseVoiceSmall")
result = model.generate(input="audio.wav")
text = result[0]["text"]
```

Or via OpenAI-compatible API:

```bash
funasr-server --device cuda
# POST http://localhost:8000/v1/audio/transcriptions
```

Would be a great addition to the existing Deepgram/AssemblyAI/Groq STT options.

- FunASR: https://github.com/modelscope/FunASR (16K+ stars)
- SenseVoice: https://github.com/FunAudioLLM/SenseVoice (8K+ stars)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: Add SenseVoice/FunASR as STT option #40

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Feature Request: Add SenseVoice/FunASR as STT option #40

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions