Fix LTX-2 Inference when `num_videos_per_prompt > 1` and CFG is Enabled #13121

dg845 · 2026-02-11T05:39:28Z

What does this PR do?

This PR fixes LTX-2 inference when num_videos_per_prompt > 1 and CFG is enabled by duplicating the video and audio position ids for CFG. Using num_videos_per_prompt > 1 worked before this PR when CFG was not used (guidance_scale=1.0), but now it should work when CFG is used as well. An example script that should work after this PR is

import os

import torch

from diffusers import LTX2ImageToVideoPipeline
from diffusers.pipelines.ltx2.export_utils import encode_video
from diffusers.utils import load_image

device = "cuda"
seed = 42

pipe = LTX2ImageToVideoPipeline.from_pretrained("Lightricks/LTX-2", torch_dtype=torch.bfloat16)
pipe.enable_model_cpu_offload(device=device)

image = load_image(
    "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/astronaut.jpg"
)

prompt = "An astronaut hatches from a fragile egg on the surface of the Moon, the shell cracking and peeling apart in gentle low-gravity motion. Fine lunar dust lifts and drifts outward with each movement, floating in slow arcs before settling back onto the ground. The astronaut pushes free in a deliberate, weightless motion, small fragments of the egg tumbling and spinning through the air. In the background, the deep darkness of space subtly shifts as stars glide with the camera's movement, emphasizing vast depth and scale. The camera performs a smooth, cinematic slow push-in, with natural parallax between the foreground dust, the astronaut, and the distant starfield. Ultra-realistic detail, physically accurate low-gravity motion, cinematic lighting, and a breath-taking, movie-like shot."
negative_prompt = "shaky, glitchy, low quality, worst quality, deformed, distorted, disfigured, motion smear, motion artifacts, fused fingers, bad anatomy, weird hand, ugly, transition, static."
num_videos_per_prompt = 2
generator = torch.Generator(device=device).manual_seed(seed)

frame_rate = 24.0
video, audio = pipe(
    image=image,
    prompt=prompt,
    negative_prompt=negative_prompt,
    num_videos_per_prompt=num_videos_per_prompt,
    width=768,
    height=512,
    num_frames=121,
    frame_rate=frame_rate,
    num_inference_steps=40,
    guidance_scale=4.0,
    generator=generator,
    output_type="np",
    return_dict=False,
)

base_filename = "ltx2_i2v_video.mp4"
root, ext = os.path.splitext(base_filename)
for i in range(num_videos_per_prompt):
    filename = "_".join([root, str(i)]) + ext
    encode_video(
        video[i],
        fps=frame_rate,
        audio=audio[i].float().cpu(),
        audio_sample_rate=pipe.vocoder.config.output_sampling_rate,
        output_path=filename,
    )

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@sayakpaul
@yiyixuxu

HuggingFaceDocBuilderDev · 2026-02-11T05:47:56Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

sayakpaul

Thanks! Should we use a guidance scale of > 1 instead in the tests then?

dg845 · 2026-02-12T05:31:15Z

Should we use a guidance scale of > 1 instead in the tests then?

I think it would be reasonable to test with both guidance_scale == 1.0 and guidance_scale > 1.0 for the LTX-2 pipeline as both might be used commonly (> 1.0 for Stage 1 full inference, 1.0 for Stage 2 distilled inference). If we were to pick one setting guidance_scale > 1.0 would probably catch more bugs (such as this bug); another possibility is that we test mostly with guidance_scale == 1.0 but also test with guidance_scale > 1.0 for a few tests to catch some common CFG bugs.

dg845 · 2026-02-12T06:35:20Z

Merging as the CI failures should be unrelated.

sayakpaul · 2026-02-12T12:11:52Z

@dg845 would you like to open a PR with those modifications?

Fix LTX-2 inference when num_videos_per_prompt > 1 and CFG is enabled

1890df8

dg845 requested a review from sayakpaul February 11, 2026 05:39

sayakpaul approved these changes Feb 12, 2026

View reviewed changes

Merge branch 'main' into ltx2-fix-multiple-videos-per-prompt

f74f9a9

dg845 merged commit 985d83c into main Feb 12, 2026
10 of 12 checks passed

dg845 deleted the ltx2-fix-multiple-videos-per-prompt branch February 12, 2026 06:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix LTX-2 Inference when `num_videos_per_prompt > 1` and CFG is Enabled #13121

Fix LTX-2 Inference when `num_videos_per_prompt > 1` and CFG is Enabled #13121

dg845 commented Feb 11, 2026 •

edited

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Feb 11, 2026

Uh oh!

sayakpaul left a comment

Uh oh!

dg845 commented Feb 12, 2026

Uh oh!

dg845 commented Feb 12, 2026

Uh oh!

Uh oh!

sayakpaul commented Feb 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fix LTX-2 Inference when num_videos_per_prompt > 1 and CFG is Enabled #13121

Fix LTX-2 Inference when num_videos_per_prompt > 1 and CFG is Enabled #13121

Conversation

dg845 commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Who can review?

Uh oh!

HuggingFaceDocBuilderDev commented Feb 11, 2026

Uh oh!

sayakpaul left a comment

Choose a reason for hiding this comment

Uh oh!

dg845 commented Feb 12, 2026

Uh oh!

dg845 commented Feb 12, 2026

Uh oh!

Uh oh!

sayakpaul commented Feb 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fix LTX-2 Inference when `num_videos_per_prompt > 1` and CFG is Enabled #13121

Fix LTX-2 Inference when `num_videos_per_prompt > 1` and CFG is Enabled #13121

dg845 commented Feb 11, 2026 •

edited

Loading