docs(tts): add Qwen3-TTS documentation#28
docs(tts): add Qwen3-TTS documentation#28yuuhikaze wants to merge 7 commits intoOpen-LLM-VTuber:mainfrom
Conversation
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ove model size bias Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
✅ Files skipped from review due to trivial changes (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughAdds comprehensive Qwen3-TTS documentation covering local setup, optional dependency installation, Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (3)
i18n/en/docusaurus-plugin-content-docs/current/user-guide/backend/tts.md (2)
109-110: Consider adjusting the tone for consistency.The casual phrasing ("a bit of a square peg in a round hole", "don't expect seamless results out of the box") is notably more informal than the rest of the documentation. While the honesty about limitations is valuable, you might consider a more neutral tone that still conveys the caveat without potentially undermining user confidence in the feature.
✍️ Alternative phrasing
-Honestly, `voice_design` is a bit of a square peg in a round hole here. It wasn't really designed with a sentence-by-sentence streaming runtime in mind, so don't expect seamless results out of the box. It's there if you want to experiment, but the design→clone workflow below is how you'd actually use it for anything consistent. +Note: `voice_design` was not originally designed for sentence-by-sentence streaming runtimes. For production use requiring consistent voice characteristics across a session, the design→clone workflow below is recommended.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@i18n/en/docusaurus-plugin-content-docs/current/user-guide/backend/tts.md` around lines 109 - 110, The paragraph referencing "voice_design" uses informal idioms ("a bit of a square peg in a round hole", "don't expect seamless results out of the box"); revise it to a more neutral, consistent tone by replacing those phrases with a concise caveat about limitations and intended use: state that voice_design was not primarily built for sentence-by-sentence streaming, note that results may vary, and recommend the design→clone workflow for consistent production use while keeping experimental encouragement; update the sentence containing "voice_design" to this neutral wording.
120-122: Consider the longevity of the personal repository link.The tip references a specific personal NixOS configuration. While this provides a helpful real-world example, personal repository links can become stale if the repository is renamed, deleted, or reorganized. Consider whether this example could be generalized or whether the core setup steps could be documented inline.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@i18n/en/docusaurus-plugin-content-docs/current/user-guide/backend/tts.md` around lines 120 - 122, The tip in tts.md currently links to a personal NixOS config (the codeberg URL) which may go stale; replace that specific personal repository link with either: (a) a short, generalized summary of the key ComfyUI + Qwen3-TTS NixOS setup steps to include inline in the tip, or (b) a link to an authoritative/maintained resource (ComfyUI docs or a community NixOS module example) and remove the personal URL. Locate the tip block referencing ComfyUI and Qwen3-TTS and update the content to provide stable, reproducible guidance rather than a single-person repo link.docs/user-guide/backend/tts.md (1)
109-110: 考虑调整语气以保持文档一致性。这段表述("方榫插圆孔"、"开箱即用的效果不要抱太高期望")的语气明显比文档其他部分更随意。虽然坦诚说明局限性很有价值,但可以考虑使用更中性的语气,既传达注意事项又不会削弱用户对功能的信心。
✍️ 备选表述
-坦白说,`voice_design` 放在这里有点像方榫插圆孔。它本来就不是为逐句流式合成的运行时设计的,所以开箱即用的效果不要抱太高期望。如果你想折腾,功能是有的,但要真正用于实际场景,还是下面的设计→克隆工作流更靠谱。 +注意:`voice_design` 并非专为逐句流式合成运行时设计。对于需要在整个会话中保持声音一致性的生产环境使用,推荐采用下述设计→克隆工作流。🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/user-guide/backend/tts.md` around lines 109 - 110, 本段对 voice_design 的描述语气过于随意(如“方榫插圆孔”“开箱即用的效果不要抱太高期望”),请以文档一致的中性专业语气重写:保留对 voice_design 不是为逐句流式合成设计的事实说明,提示其在该场景下可能受限,同时指出更可靠的替代方案(如文中后续提到的“克隆工作流”)并给出简短建议或链接以便用户进一步操作;定位修改时请查找文中对 voice_design 的段落并替换为中性表述。
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/user-guide/backend/tts.md`:
- Line 107:
将文中关于流式模式的描述中不规范的短语“单独的调用”改为副词结构“单独地调用”,确保句子为“在流式模式下,每句话是一次单独地调用,因此各句之间的声音会不一致。”;定位参考标识符包括
`voice_design`、`generate_audio` 和 `instruct` 来找到该段落并替换该短语以符合“副词+地+动词”的语法要求。
- Line 12: The sentence "无需单独安装运行时依赖" is misleading—update the text in
docs/user-guide/backend/tts.md to clarify that the inference code for Qwen3-TTS
is included in the source tree but Python runtime dependencies must be installed
via the optional extras (e.g., `pip install ".[qwen3-tts]"`); replace the phrase
"无需单独安装运行时依赖" with wording like "推理代码已内置于源代码树中,仅需通过可选依赖组安装 Python 依赖即可(例如运行 `pip
install \" .[qwen3-tts]\"`)" so readers understand code is included but
dependencies still need the optional install.
In `@i18n/en/docusaurus-plugin-content-docs/current/user-guide/backend/tts.md`:
- Line 11: Update the TTS introduction to clarify that while the Qwen3-TTS
inference code is vendored into the repo, users still must install Python
dependencies via the optional dependency group; replace or reword the sentence
containing "no separate runtime installation is required" (referencing the
Qwen3-TTS paragraph and the Installation section that uses uv pip install
".[qwen3-tts]") to say something like: "The inference code is vendored into the
source tree — only Python dependencies need to be installed via the optional
dependency group (e.g., uv pip install '.[qwen3-tts]')."
- Around line 74-78: Update the comment for the configuration key device: 'cuda'
to state that PyTorch interprets 'cuda' as the first GPU (cuda:0) rather than
performing automatic load-balancing across multiple GPUs; explicitly note that
on multi-GPU machines users must configure DataParallel or
DistributedDataParallel (or other multi-GPU strategies) to utilize more than one
device, and keep the existing examples 'cuda:0'/'cuda:1'/... as the way to
target a specific GPU.
---
Nitpick comments:
In `@docs/user-guide/backend/tts.md`:
- Around line 109-110: 本段对 voice_design
的描述语气过于随意(如“方榫插圆孔”“开箱即用的效果不要抱太高期望”),请以文档一致的中性专业语气重写:保留对 voice_design
不是为逐句流式合成设计的事实说明,提示其在该场景下可能受限,同时指出更可靠的替代方案(如文中后续提到的“克隆工作流”)并给出简短建议或链接以便用户进一步操作;定位修改时请查找文中对
voice_design 的段落并替换为中性表述。
In `@i18n/en/docusaurus-plugin-content-docs/current/user-guide/backend/tts.md`:
- Around line 109-110: The paragraph referencing "voice_design" uses informal
idioms ("a bit of a square peg in a round hole", "don't expect seamless results
out of the box"); revise it to a more neutral, consistent tone by replacing
those phrases with a concise caveat about limitations and intended use: state
that voice_design was not primarily built for sentence-by-sentence streaming,
note that results may vary, and recommend the design→clone workflow for
consistent production use while keeping experimental encouragement; update the
sentence containing "voice_design" to this neutral wording.
- Around line 120-122: The tip in tts.md currently links to a personal NixOS
config (the codeberg URL) which may go stale; replace that specific personal
repository link with either: (a) a short, generalized summary of the key ComfyUI
+ Qwen3-TTS NixOS setup steps to include inline in the tip, or (b) a link to an
authoritative/maintained resource (ComfyUI docs or a community NixOS module
example) and remove the personal URL. Locate the tip block referencing ComfyUI
and Qwen3-TTS and update the content to provide stable, reproducible guidance
rather than a single-person repo link.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: c3f3ab1b-0d4c-4b7f-b557-57178fc47174
📒 Files selected for processing (2)
docs/user-guide/backend/tts.mdi18n/en/docusaurus-plugin-content-docs/current/user-guide/backend/tts.md
- Clarify vendored claim: inference code included, Python deps still needed - Fix ZH grammar: 单独的调用 → 单独地调用 - Neutralise voice_design tone in EN - Correct device: 'cuda' description (defaults to cuda:0, not auto load-balance) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Note that instruct can optionally style-modify predefined speakers in custom_voice mode, not just voice_design. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Round-2 review follow-up (commit 54cd09d). Fixed in this round:
Already correctly fixed — skipped:
On the If the review suggested adding guidance on DataParallel or DistributedDataParallel for multi-GPU setups: this is intentionally out of scope. Qwen3-TTS is a single-model inference tool, not a training framework. DDP and DataParallel are training-time constructs and are not applicable here. Multi-GPU inference for this use case means pinning to a specific card via |
|
Addressed all round-2 feedback:
|
What was documented
Added Qwen3-TTS as a new TTS engine to the backend TTS page (both English and Chinese versions).
Coverage
voice_clone,voice_design, andcustom_voicevoice_cloneusage tipsvoice_designper-sentence consistency caveat with ComfyUI workflow tipPages updated
Related
Related backend PR: Open-LLM-VTuber/Open-LLM-VTuber#378
🤖 Generated with Claude Code
Summary by CodeRabbit
Documentation