Fix `stable_video_diffusion` by hlky · Pull Request #13684 · huggingface/diffusers

hlky · 2026-05-06T04:45:12Z

Fix `stable_video_diffusion`

Fixes

Issue 1

Return (frames,) when return_dict=False so Stable Video Diffusion follows the standard single-output tuple contract.

Issue 2

Use the maximum of min_guidance_scale and max_guidance_scale when preparing CFG state, so decreasing guidance schedules still duplicate conditioning batches correctly.

Issue 3

Preprocess PIL, NumPy, tensor, and list image inputs through shared processor utilities before CLIP resizing, so tensor inputs are resized consistently with PIL inputs.

Issue 4

Cast custom latents to the denoising dtype and device instead of only moving them to device.

Issue 5

Validate tuple/list config lengths in UNetSpatioTemporalConditionModel before indexed access.

Additional fixes

Batch and dtype consistency

Repeat image embeddings, VAE image latents, added time IDs, and guidance scale in effective batch order for num_videos_per_prompt.

Use the UNet dtype for denoising-path tensors while preserving VAE dtype/upcast handling.

Docs and typing

Update SVD docstrings and type hints for supported image inputs, output_type values, generator-list support, helper returns, and tuple/dataclass output behavior.

Meta issue patterns

Fixed: Pattern 1 batch/conditioning expansion, Pattern 5 dtype/device/config assumptions, Pattern 6 output contract, Pattern 7 validation/runtime alignment, Pattern 10 fast coverage.

Not applicable: Pattern 2 ignored public arguments, Pattern 3 mask handling, Pattern 4 optional dependency/default handling, Pattern 8 copied-code drift, Pattern 9 shared attention/offload infrastructure.

Unskipped tests

test_inference_batch_single_identical

Already passing after removing skip.

test_inference_batch_consistent

It was failing in part due to Issue 3:

src\diffusers\pipelines\stable_video_diffusion\pipeline_stable_video_diffusion.py:503: in __call__
    image_embeddings = self._encode_image(image, device, num_videos_per_prompt, self.do_classifier_free_guidance)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
src\diffusers\pipelines\stable_video_diffusion\pipeline_stable_video_diffusion.py:201: in _encode_image
    image = self.video_processor.pil_to_numpy(image)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
src\diffusers\image_processor.py:166: in pil_to_numpy
    images = [np.array(image).astype(np.float32) / 255.0 for image in images]

The test does this batched_input[name] = batch_size * [value], the old check if not isinstance(image, torch.Tensor) fails, and torch.Tensor ends up in pil_to_numpy. Solution: just use Processor classes.

test_float16_inference

Fixed by using prepare_latents in torch.float32 then casting to needed type. This is because randn_tensor with torch.float32 produces completely different Tensor than randn_tensor with torch.float16. Recommend producing random tensors in float32 then casting when reproducibility is a concern.

Notes

Slow test expected slice may have changed from prepare_latents change.

Fix stable_video_diffusion

b3a5150

github-actions Bot added fixes-issue size/L PR with diff > 200 LOC models tests pipelines labels May 6, 2026

hlky mentioned this pull request May 6, 2026

[meta issue] Systematic model/pipeline review findings / tracking #13656

Open

76 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix `stable_video_diffusion`#13684

Fix `stable_video_diffusion`#13684
hlky wants to merge 1 commit intohuggingface:mainfrom
hlky:fix-13627

hlky commented May 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

hlky commented May 6, 2026

Fix stable_video_diffusion

Fixes

Issue 1

Issue 2

Issue 3

Issue 4

Issue 5

Additional fixes

Batch and dtype consistency

Docs and typing

Meta issue patterns

Unskipped tests

test_inference_batch_single_identical

test_inference_batch_consistent

test_float16_inference

Notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Fix `stable_video_diffusion`