Skip to content

12分钟音频识别,报错. #94

@evenZh

Description

@evenZh

笔记本2060显卡.用paraformer-zh,iic/SenseVoiceSmall 都可以正常识别
但是FunAudioLLM/Fun-ASR-Nano-2512,总是报这个错误.调整配置也没用,关上之后爆显存.-.-,问AI说,是VAD 模型把音频切成了几段,但 ASR 模型(Nano-2512)在处理其中某一段(通常是纯静音段)时返回了空结果。 FunASR 的合并逻辑没有做判空处理,直接去取第 0 个索引,就炸了。
vad_model="fsmn-vad",
vad_kwargs={"max_single_segment_time": 60000},

Traceback (most recent call last):███████████████████████████████████████████████████████| 1/1 [00:05<00:00,  5.61s/it]
  File "D:\code\workbuddy\asr\funnano.py", line 38, in <module>
    res = model.generate(
          ^^^^^^^^^^^^^^^
  File "C:\Python312\Lib\site-packages\funasr\auto\auto_model.py", line 329, in generate
    return self.inference_with_vad(
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python312\Lib\site-packages\funasr\auto\auto_model.py", line 558, in inference_with_vad
    t[0] += vadsegments[j][0]
    ~^^^
KeyError: 0
  0%|                                                                                            | 0/1 [02:23<?, ?it/s]
Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions