Feature: Add SenseVoice as STT module — 5x faster, non-autoregressive

Hi! Verbi's modular architecture is perfect for experimenting with different STT backends.

I'd like to suggest **SenseVoice** as a transcription module option:

## Why SenseVoice?

- **5x faster than Whisper** — non-autoregressive (single forward pass)
- **234M params** — lightweight, can run alongside your LLM
- **50+ languages** — auto-detection
- **Emotion detection** — useful for more natural voice assistant responses
- **OpenAI-compatible API** — minimal integration effort

## Integration (fits your modular design)

```python
from funasr import AutoModel

class SenseVoiceTranscriber:
    def __init__(self):
        self.model = AutoModel(model="iic/SenseVoiceSmall", vad_model="fsmn-vad")
    
    def transcribe(self, audio):
        result = self.model.generate(input=audio)
        return result[0]["text"]
```

## Links
- SenseVoice: https://github.com/FunAudioLLM/SenseVoice (8.3K stars)
- FunASR: https://github.com/modelscope/FunASR (16.7K stars)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: Add SenseVoice as STT module — 5x faster, non-autoregressive #41

Why SenseVoice?

Integration (fits your modular design)

Links

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Feature: Add SenseVoice as STT module — 5x faster, non-autoregressive #41

Description

Why SenseVoice?

Integration (fits your modular design)

Links

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions