Skip to content

Feature: Add SenseVoice as STT module — 5x faster, non-autoregressive #41

@LauraGPT

Description

@LauraGPT

Hi! Verbi's modular architecture is perfect for experimenting with different STT backends.

I'd like to suggest SenseVoice as a transcription module option:

Why SenseVoice?

  • 5x faster than Whisper — non-autoregressive (single forward pass)
  • 234M params — lightweight, can run alongside your LLM
  • 50+ languages — auto-detection
  • Emotion detection — useful for more natural voice assistant responses
  • OpenAI-compatible API — minimal integration effort

Integration (fits your modular design)

from funasr import AutoModel

class SenseVoiceTranscriber:
    def __init__(self):
        self.model = AutoModel(model="iic/SenseVoiceSmall", vad_model="fsmn-vad")
    
    def transcribe(self, audio):
        result = self.model.generate(input=audio)
        return result[0]["text"]

Links

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions