Hi! Verbi's modular architecture is perfect for experimenting with different STT backends.
I'd like to suggest SenseVoice as a transcription module option:
Why SenseVoice?
- 5x faster than Whisper — non-autoregressive (single forward pass)
- 234M params — lightweight, can run alongside your LLM
- 50+ languages — auto-detection
- Emotion detection — useful for more natural voice assistant responses
- OpenAI-compatible API — minimal integration effort
Integration (fits your modular design)
from funasr import AutoModel
class SenseVoiceTranscriber:
def __init__(self):
self.model = AutoModel(model="iic/SenseVoiceSmall", vad_model="fsmn-vad")
def transcribe(self, audio):
result = self.model.generate(input=audio)
return result[0]["text"]
Links
Hi! Verbi's modular architecture is perfect for experimenting with different STT backends.
I'd like to suggest SenseVoice as a transcription module option:
Why SenseVoice?
Integration (fits your modular design)
Links