Skip to content

Fix nested tensor noise mismatch in CFGGuider.sample#13318

Open
djdarcy wants to merge 1 commit intoComfy-Org:masterfrom
djdarcy:fix/nested-tensor-noise-mismatch
Open

Fix nested tensor noise mismatch in CFGGuider.sample#13318
djdarcy wants to merge 1 commit intoComfy-Org:masterfrom
djdarcy:fix/nested-tensor-noise-mismatch

Conversation

@djdarcy
Copy link
Copy Markdown

@djdarcy djdarcy commented Apr 7, 2026

Summary

Fixes RuntimeError: The size of tensor a (N) must match the size of tensor b (M) at non-singleton dimension 2 when using LTXAV audio+video workflows with SamplerCustomAdvanced.

Problem

In CFGGuider.sample() (comfy/samplers.py:1008-1010), when latent_image is a NestedTensor (e.g. from LTXVConcatAVLatent combining video + audio latents), the code unconditionally calls noise.unbind():

if latent_image.is_nested:
    latent_image, latent_shapes = comfy.utils.pack_latents(latent_image.unbind())
    noise, _ = comfy.utils.pack_latents(noise.unbind())  # <-- crashes here

When noise is a regular (non-nested) tensor, unbind(dim=0) splits along the channel dimension, producing 128 small tensors instead of the expected 2 nested components (video + audio). After pack_latents flattens these, the shapes are completely mismatched (e.g. [128, 1, 3751] vs [1, 1, 512384]), causing the RuntimeError at model_sampling.py:72 in noise_scaling().

How to reproduce
  1. Use the attached ltx2-example.json or create an LTXAV image-to-video workflow using LTXVConcatAVLatent to combine video and audio latents
  2. Feed the combined latent into SamplerCustomAdvanced
  3. Queue the prompt -- crashes immediately at sampling

This affects any LTXAV workflow where the noise generator produces non-nested noise for a nested latent. No custom nodes required to trigger this.

Fix

Check noise.is_nested before unbinding. If noise is not nested, treat it as the first component (video) and pad remaining components (audio) with torch.zeros_like(). Zero noise for the audio component I believe is semantically correct. No denoising is applied to the audio padding in the video sampler.

Note: the denoise_mask handling a few lines below (line 1014-1018) already does this same pattern correctly. It checks denoise_mask.is_nested and pads with torch.ones() for missing components:

if denoise_mask.is_nested:
    denoise_masks = denoise_mask.unbind()
    denoise_masks = denoise_masks[:len(latent_shapes)]
else:
    denoise_masks = [denoise_mask]

for i in range(len(denoise_masks), len(latent_shapes)):
    denoise_masks.append(torch.ones(latent_shapes[i]))

This PR applies the same defensive pattern to noise handling.

When using LTXAV (audio+video) workflows, latent_image is a NestedTensor
but noise may be a regular tensor. Calling unbind() on non-nested noise
splits along dim=0 (channels), producing a shape mismatch at noise_scaling.

Check whether noise is nested before unbinding. If not, pad with zero-noise
for additional components (e.g. audio), which is semantically correct since
those components don't need denoising in the video sampler.
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 7, 2026

📝 Walkthrough

Walkthrough

The CFGGuider.sample method in comfy/samplers.py was modified to handle cases where latent_image is nested. The change introduces conditional logic for noise unbinding: instead of unconditionally unbinding both latent_image and noise as nested tensors, the code now checks if noise is nested. When noise is not nested but latent_image is, the implementation creates an aligned tensor list by padding with zeros before repacking both inputs using comfy.utils.pack_latents. This handles the asymmetry between nested and non-nested inputs.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically identifies the main fix: handling nested tensor noise mismatch in CFGGuider.sample.
Description check ✅ Passed The description clearly describes the problem being fixed (nested tensor noise mismatch in CFGGuider.sample), the root cause, reproduction steps, and the solution applied.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
comfy/samplers.py (1)

1008-1018: Fix looks correct for the non-nested noise case.

The conditional check for noise.is_nested properly handles the mismatch scenario described in the PR. Using torch.zeros_like() for padding is appropriate since no denoising is applied to the padded audio components, and it mirrors the defensive pattern used for denoise_mask handling below.

One minor observation: when noise.is_nested is True, the code unbinds without checking if n_tensors has the same number of components as li_tensors. The denoise_mask handling (lines 1024-1030) defensively truncates and pads to match latent_shapes. If nested noise with mismatched components is a possible scenario, similar handling could be added here.

💡 Optional: Add defensive handling for nested noise component mismatch
         if latent_image.is_nested:
             li_tensors = latent_image.unbind()
             if noise.is_nested:
                 n_tensors = noise.unbind()
+                n_tensors = list(n_tensors[:len(li_tensors)])  # Truncate if more
+                for i in range(len(n_tensors), len(li_tensors)):
+                    n_tensors.append(torch.zeros_like(li_tensors[i]))  # Pad if fewer
             else:
                 # Noise only covers video -- pad remaining components (audio) with zeros
                 n_tensors = [noise]
                 for i in range(1, len(li_tensors)):
                     n_tensors.append(torch.zeros_like(li_tensors[i]))
             latent_image, latent_shapes = comfy.utils.pack_latents(li_tensors)
             noise, _ = comfy.utils.pack_latents(n_tensors)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@comfy/samplers.py` around lines 1008 - 1018, When noise.is_nested is True,
add defensive handling to ensure n_tensors has the same number of components as
li_tensors before packing: after n_tensors = noise.unbind(), compare
len(n_tensors) to len(li_tensors) (or latent_shapes) and if they differ truncate
extra components or append torch.zeros_like(li_tensors[i]) for missing
components (mirroring the denoise_mask truncation/padding behavior around
denoise_mask handling). Then call comfy.utils.pack_latents(n_tensors) as before
so latent_image/latent_shapes and noise align.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@comfy/samplers.py`:
- Around line 1008-1018: When noise.is_nested is True, add defensive handling to
ensure n_tensors has the same number of components as li_tensors before packing:
after n_tensors = noise.unbind(), compare len(n_tensors) to len(li_tensors) (or
latent_shapes) and if they differ truncate extra components or append
torch.zeros_like(li_tensors[i]) for missing components (mirroring the
denoise_mask truncation/padding behavior around denoise_mask handling). Then
call comfy.utils.pack_latents(n_tensors) as before so latent_image/latent_shapes
and noise align.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 0bd2b158-65d9-4063-9ee1-223aafb1f8bb

📥 Commits

Reviewing files that changed from the base of the PR and between b615af1 and 2beca41.

📒 Files selected for processing (1)
  • comfy/samplers.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant