Hi! Verbi's modular architecture for experimenting with different STT/LLM/TTS components is excellent.
I'd like to suggest adding SenseVoice as a new STT option. It fits Verbi's modular philosophy well:
Why SenseVoice:
- 5x faster than Whisper large-v3 (234M non-autoregressive model)
- Emotion detection built-in — useful for adjusting assistant behavior
- Audio event detection — laugh, applause, music, cough, etc.
- 50+ languages with strong Chinese/English/Japanese/Korean support
- Simple
pip install funasr — no extra dependencies
Integration example:
from funasr import AutoModel
model = AutoModel(model="iic/SenseVoiceSmall")
result = model.generate(input="audio.wav")
text = result[0]["text"]
Or via OpenAI-compatible API:
funasr-server --device cuda
# POST http://localhost:8000/v1/audio/transcriptions
Would be a great addition to the existing Deepgram/AssemblyAI/Groq STT options.
Hi! Verbi's modular architecture for experimenting with different STT/LLM/TTS components is excellent.
I'd like to suggest adding SenseVoice as a new STT option. It fits Verbi's modular philosophy well:
Why SenseVoice:
pip install funasr— no extra dependenciesIntegration example:
Or via OpenAI-compatible API:
funasr-server --device cuda # POST http://localhost:8000/v1/audio/transcriptionsWould be a great addition to the existing Deepgram/AssemblyAI/Groq STT options.