add function of am streaming inference#84
Conversation
|
试了试 合成后会有一些“噗噗”声,是声码器还需要做什么配置吗? |
|
@lancelee98 我尝试的还好,没有噗噗声。你那里用的什么版本模型? |
|
@wawaa 我这边是自己微调的模型,可能是我模型非流式也会有一些噪音 |
|
@lancelee98 听了一下我的还是也有噗噗声 |
这部分只是 AM 模型推理的流式改造, Vocoder 也要做相应的改造才能与非流式的效果对等 |
是的Vocoder也对AM的输出先做了pad但是还是有噗噗声,您那里有方便推荐的输入给Vocoder的chunk size和pad设置参数嘛? |
pad 设置到 12 帧(含)以上,且需要确保你的 vocoder 是 casual cnn 而非 cnn,chunk size 其实并不影响 |
另外你可以测试下,将这个脚本生成的 mel 特征全部输入到 vocoder 中,看是否还有噗噗声,来验证下am 流式推理部分是不是好的,也辛苦反馈一下结果。后面会将 vocoder 流式改造也上传。 |
谢谢关于 casual cnn 的提醒。调整这一点后: |
请问pad的修改是在models\hifigan中hifigan.py吗?具体怎么改能告知一下吗?感谢大佬! |
|
请问pad的修改是在models\hifigan中hifigan.py吗?具体怎么改能告知一下吗?感谢大佬! |
|
@EricFuma 请问pad的修改是怎样的呢?如何联系到您 |
Adding chunk_forward function for FsmnEncoderV2 and MemoryBlockV2 module, which is based on cache and implement streaming inference chunk by chunk;
Reconstruct the forward function of KanTtsSAMBERT, extract the common part into the pre_forward function, and use it as a common pre-module for the forward and forward_chunk functions to reduce the amount of redundant code; among them, chunk_forward implements The frame-level streaming inference function, which can control the mel length of each inference by changing the mel_chunk_size parameter;
In the infer_sambert.py script, add the --inference_type and --mel_chunk_size parameters. Among them, --inference_type controls am's inference method, --mel_chunk_size specifies the chunk size of streaming inference (need to specify --inference_type == "streaming" at the same time)
This update is an incremental update, and existing training and inference scripts and commands can run normally; the results of streaming inference and non-streaming inference have passed the consistency test, and the code has passed the pre-commit check.