Feature: Add SenseVoice for faster local meeting transcription

Hi! Interesting approach — meeting transcription using just an LLM.

For the speech-to-text step, have you considered **SenseVoice**? It could complement or replace the LLM-only approach for initial transcription:

## SenseVoice advantages

- **5x faster than Whisper** — non-autoregressive architecture
- **234M params** — lightweight, runs on CPU
- **Built-in features**: speaker diarization (cam++), emotion detection, audio event classification
- **50+ languages** — auto-detects language
- **OpenAI-compatible API** — drop-in replacement

## Quick start

```python
from funasr import AutoModel

model = AutoModel(
    model="iic/SenseVoiceSmall",
    vad_model="fsmn-vad",
    spk_model="cam++",  # speaker diarization
)
result = model.generate(input="meeting.wav")
# Returns: text + speaker labels + timestamps + emotion
```

The combination of SenseVoice (fast ASR) + your LLM approach (structuring/summarization) could give best of both worlds — accurate transcription with intelligent formatting.

## Links
- SenseVoice: https://github.com/FunAudioLLM/SenseVoice (8.3K stars)
- FunASR: https://github.com/modelscope/FunASR (16.7K stars)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: Add SenseVoice for faster local meeting transcription #9

SenseVoice advantages

Quick start

Links

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Feature: Add SenseVoice for faster local meeting transcription #9

Description

SenseVoice advantages

Quick start

Links

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions