Skip to content

Feature: Add SenseVoice for faster local meeting transcription #9

@LauraGPT

Description

@LauraGPT

Hi! Interesting approach — meeting transcription using just an LLM.

For the speech-to-text step, have you considered SenseVoice? It could complement or replace the LLM-only approach for initial transcription:

SenseVoice advantages

  • 5x faster than Whisper — non-autoregressive architecture
  • 234M params — lightweight, runs on CPU
  • Built-in features: speaker diarization (cam++), emotion detection, audio event classification
  • 50+ languages — auto-detects language
  • OpenAI-compatible API — drop-in replacement

Quick start

from funasr import AutoModel

model = AutoModel(
    model="iic/SenseVoiceSmall",
    vad_model="fsmn-vad",
    spk_model="cam++",  # speaker diarization
)
result = model.generate(input="meeting.wav")
# Returns: text + speaker labels + timestamps + emotion

The combination of SenseVoice (fast ASR) + your LLM approach (structuring/summarization) could give best of both worlds — accurate transcription with intelligent formatting.

Links

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions