Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request enhances the lightx2v project by introducing a new "bridge" version for the WAN model, specifically designed for text-to-video generation with integrated LTX2 upsampling. This allows for a more sophisticated multi-stage resolution changing process, where the LTX2 model is leveraged for high-quality spatial upscaling of clean latents. The changes provide a complete pipeline, from configuration to execution, for utilizing this advanced upsampling technique to produce higher-resolution video outputs. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
The pull request successfully introduces a new bridge version for WAN models, specifically wan2.1_ltx2_bridge, along with its configuration and runner implementation. This includes new files for the runner, scheduler interface, and pixel resizer, as well as updates to infer.py and pipeline.py to integrate the new model class. The changes appear to correctly implement the new functionality. However, a significant portion of the newly added docstrings and comments in the Python files are in Chinese. For a codebase that appears to be predominantly English, this can hinder maintainability and collaboration for non-Chinese speakers. It is recommended to translate these comments and docstrings into English to improve code readability and maintainability.
|
|
||
| @RUNNER_REGISTER("wan2.1_ltx2_bridge") | ||
| class WanLTX2BridgeRunner(WanRunner): | ||
| """独立的 WAN + LTX2 upsample 实验链路。""" |
| """把 changing_resolution 能力动态混入任意 WAN scheduler。 | ||
|
|
||
| 这里不直接继承某个固定的 scheduler,而是在运行时构造一个新类: | ||
| 1. 保留 father_scheduler 的基础采样能力; | ||
| 2. 用 WanScheduler4ChangingResolution 覆盖 prepare_latents / step_post; | ||
| 3. 其余方法仍复用 father_scheduler 的实现。 | ||
|
|
||
| 这样同一套 changing_resolution 逻辑就能挂到普通 WAN scheduler、 | ||
| feature caching scheduler 等不同父类上,而不用为每个父类单独复制代码。 | ||
| """ |
| def __new__(cls, father_scheduler, config): | ||
| class NewClass(WanScheduler4ChangingResolution, father_scheduler): | ||
| def __init__(self, config): | ||
| # 先初始化父 scheduler,再初始化 changing_resolution 相关配置。 |
| class WanScheduler4ChangingResolution: | ||
| """WAN 的分阶段变分辨率采样逻辑。 | ||
|
|
||
| 核心思路: | ||
| 1. 预先为每个分辨率阶段采样一份噪声 latent; | ||
| 2. 正常步沿用父类 step_post 的去噪更新; | ||
| 3. 命中切换步时,把当前 x_t 还原成近似 x0,再插值到下一阶段分辨率; | ||
| 4. 用下一阶段预先采样好的噪声重新加噪,继续 diffusion 采样。 | ||
|
|
||
| 注意: | ||
| - resolution_rate 的顺序就是实际分辨率路径; | ||
| - 代码会额外补一份“原始分辨率”的 latent 作为最后阶段; | ||
| - changing_resolution_steps 是 1-based,表示“第 N 步结束后切换分辨率”。 | ||
| """ |
| """ | ||
|
|
||
| def __init__(self, config): | ||
| # 默认策略:先在 0.75 倍分辨率上采样,再切回原始分辨率。 |
| # 只在“空间 x2 升分”时走 LTX2 bridge,其余情况继续回退到原始插值逻辑。 | ||
| can_use_bridge = ( | ||
| self.clean_latent_resizer is not None | ||
| and target_latent_shape[1] == denoised_sample.shape[1] | ||
| and target_latent_shape[2] == denoised_sample.shape[2] * 2 | ||
| and target_latent_shape[3] == denoised_sample.shape[3] * 2 | ||
| ) |
| logger.info( | ||
| "Use LTX2 bridge to resize WAN clean latent: " | ||
| f"{tuple(denoised_sample.shape)} -> {tuple(target_latent_shape)}" | ||
| ) | ||
| return self.clean_latent_resizer.resize( | ||
| latent=denoised_sample, | ||
| target_latent_shape=target_latent_shape, | ||
| step_index=self.step_index, | ||
| changing_resolution_index=self.changing_resolution_index, | ||
| ) |
|
|
||
|
|
||
| class LTX2PixelBridgeResizer: | ||
| """把 WAN clean latent 借道 RGB / LTX2 latent 做一次无训练的 x2 升分。""" |
| "LTX2 pixel bridge resize: " | ||
| f"step={step_index}, stage={changing_resolution_index}, " | ||
| f"wan_latent={current_shape} -> {target_latent_shape}, " | ||
| f"target_rgb=({target_pixel_h}, {target_pixel_w})" |
| logger.info( | ||
| "Pad low-res RGB for LTX2 VAE: " | ||
| f"({height}, {width}) -> ({padded_h}, {padded_w}), " | ||
| f"pad=(left={left}, right={right}, top={top}, bottom={bottom})" | ||
| ) |
No description provided.