add function of am streaming inference by EricFuma · Pull Request #84 · modelscope/KAN-TTS

EricFuma · 2023-10-07T07:32:31Z

Adding chunk_forward function for FsmnEncoderV2 and MemoryBlockV2 module, which is based on cache and implement streaming inference chunk by chunk;
Reconstruct the forward function of KanTtsSAMBERT, extract the common part into the pre_forward function, and use it as a common pre-module for the forward and forward_chunk functions to reduce the amount of redundant code; among them, chunk_forward implements The frame-level streaming inference function, which can control the mel length of each inference by changing the mel_chunk_size parameter;
In the infer_sambert.py script, add the --inference_type and --mel_chunk_size parameters. Among them, --inference_type controls am's inference method, --mel_chunk_size specifies the chunk size of streaming inference (need to specify --inference_type == "streaming" at the same time)
This update is an incremental update, and existing training and inference scripts and commands can run normally; the results of streaming inference and non-streaming inference have passed the consistency test, and the code has passed the pre-commit check.

lancelee98 · 2023-11-06T09:09:15Z

试了试合成后会有一些“噗噗”声，是声码器还需要做什么配置吗？

wawaa · 2023-11-13T06:32:17Z

@lancelee98 我尝试的还好，没有噗噗声。你那里用的什么版本模型？

lancelee98 · 2023-12-12T01:24:24Z

@wawaa 我这边是自己微调的模型，可能是我模型非流式也会有一些噪音

wawaa · 2023-12-19T05:39:18Z

@lancelee98 听了一下我的还是也有噗噗声

EricFuma · 2023-12-22T06:25:07Z

@lancelee98 听了一下我的还是也有噗噗声

这部分只是 AM 模型推理的流式改造， Vocoder 也要做相应的改造才能与非流式的效果对等

wawaa · 2023-12-22T06:30:19Z

@lancelee98 听了一下我的还是也有噗噗声

这部分只是 AM 模型推理的流式改造， Vocoder 也要做相应的改造才能与非流式的效果对等

是的Vocoder也对AM的输出先做了pad但是还是有噗噗声，您那里有方便推荐的输入给Vocoder的chunk size和pad设置参数嘛？

EricFuma · 2023-12-22T06:41:24Z

@lancelee98 听了一下我的还是也有噗噗声

这部分只是 AM 模型推理的流式改造， Vocoder 也要做相应的改造才能与非流式的效果对等

是的Vocoder也对AM的输出先做了pad但是还是有噗噗声，您那里有方便推荐的输入给Vocoder的chunk size和pad设置参数嘛？

pad 设置到 12 帧（含）以上，且需要确保你的 vocoder 是 casual cnn 而非 cnn，chunk size 其实并不影响

EricFuma · 2023-12-22T06:46:58Z

@lancelee98 听了一下我的还是也有噗噗声

这部分只是 AM 模型推理的流式改造， Vocoder 也要做相应的改造才能与非流式的效果对等

是的Vocoder也对AM的输出先做了pad但是还是有噗噗声，您那里有方便推荐的输入给Vocoder的chunk size和pad设置参数嘛？

另外你可以测试下，将这个脚本生成的 mel 特征全部输入到 vocoder 中，看是否还有噗噗声，来验证下am 流式推理部分是不是好的，也辛苦反馈一下结果。后面会将 vocoder 流式改造也上传。

wawaa · 2023-12-22T08:11:40Z

@lancelee98 听了一下我的还是也有噗噗声

这部分只是 AM 模型推理的流式改造， Vocoder 也要做相应的改造才能与非流式的效果对等

是的Vocoder也对AM的输出先做了pad但是还是有噗噗声，您那里有方便推荐的输入给Vocoder的chunk size和pad设置参数嘛？

另外你可以测试下，将这个脚本生成的 mel 特征全部输入到 vocoder 中，看是否还有噗噗声，来验证下am 流式推理部分是不是好的，也辛苦反馈一下结果。后面会将 vocoder 流式改造也上传。

谢谢关于 casual cnn 的提醒。调整这一点后：
1、全部 mel 特征输入 vocoder 音频正常；
2、mel chunk pad 后输入 vocoder 音频正常。

yuanmaitian · 2024-01-15T07:20:46Z

@lancelee98 听了一下我的还是也有噗噗声

这部分只是 AM 模型推理的流式改造， Vocoder 也要做相应的改造才能与非流式的效果对等

是的Vocoder也对AM的输出先做了pad但是还是有噗噗声，您那里有方便推荐的输入给Vocoder的chunk size和pad设置参数嘛？

pad 设置到 12 帧（含）以上，且需要确保你的 vocoder 是 casual cnn 而非 cnn，chunk size 其实并不影响

请问pad的修改是在models\hifigan中hifigan.py吗？具体怎么改能告知一下吗？感谢大佬！

yuanmaitian · 2024-01-15T07:20:54Z

请问pad的修改是在models\hifigan中hifigan.py吗？具体怎么改能告知一下吗？感谢大佬！

fanglinyang · 2025-10-25T01:59:25Z

@EricFuma 请问pad的修改是怎样的呢？如何联系到您

add am function of streaming inference

aa88e4c

slin000111 mentioned this pull request Jun 26, 2024

TTS流式合成功能的需求 modelscope/modelscope#883

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add function of am streaming inference#84

add function of am streaming inference#84
EricFuma wants to merge 1 commit intomodelscope:mainfrom
EricFuma:feat_sambert_stream_inference

EricFuma commented Oct 7, 2023

Uh oh!

lancelee98 commented Nov 6, 2023

Uh oh!

wawaa commented Nov 13, 2023

Uh oh!

lancelee98 commented Dec 12, 2023

Uh oh!

wawaa commented Dec 19, 2023

Uh oh!

EricFuma commented Dec 22, 2023 •

edited

Loading

Uh oh!

wawaa commented Dec 22, 2023

Uh oh!

EricFuma commented Dec 22, 2023

Uh oh!

EricFuma commented Dec 22, 2023

Uh oh!

wawaa commented Dec 22, 2023

Uh oh!

yuanmaitian commented Jan 15, 2024

Uh oh!

yuanmaitian commented Jan 15, 2024

Uh oh!

fanglinyang commented Oct 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

EricFuma commented Oct 7, 2023

Uh oh!

lancelee98 commented Nov 6, 2023

Uh oh!

wawaa commented Nov 13, 2023

Uh oh!

lancelee98 commented Dec 12, 2023

Uh oh!

wawaa commented Dec 19, 2023

Uh oh!

EricFuma commented Dec 22, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wawaa commented Dec 22, 2023

Uh oh!

EricFuma commented Dec 22, 2023

Uh oh!

EricFuma commented Dec 22, 2023

Uh oh!

wawaa commented Dec 22, 2023

Uh oh!

yuanmaitian commented Jan 15, 2024

Uh oh!

yuanmaitian commented Jan 15, 2024

Uh oh!

fanglinyang commented Oct 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

EricFuma commented Dec 22, 2023 •

edited

Loading