-
Notifications
You must be signed in to change notification settings - Fork 6.7k
[wan] fix layerwise upcasting tests on CPU #13039
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
| # Upcast the QR orthogonalization operation to FP32 | ||
| original_motion_dtype = motion_feat.dtype | ||
| motion_feat = motion_feat.to(weight.dtype) | ||
| motion_feat = motion_feat.to(torch.float32) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dg845
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! As far as I can tell, Wan Animate single file / GGUF support doesn't depend on this change (the generated samples look normal), so the change should be fine.
Wan Animate Single File Test Script
import numpy as np
import torch
from diffusers import AutoencoderKLWan, GGUFQuantizationConfig
from diffusers import WanAnimatePipeline, WanAnimateTransformer3DModel
from diffusers.utils import export_to_video, load_image, load_video
LoRA = True
device_gpu = torch.device("cuda:0")
original_model_id = "Wan-AI/Wan2.2-Animate-14B-Diffusers"
single_file_url = "https://huggingface.co/QuantStack/Wan2.2-Animate-14B-GGUF/blob/main/Wan2.2-Animate-14B-Q8_0.gguf"
lora_model_id = "Kijai/WanVideo_comfy"
lora_model_path = "Lightx2v/lightx2v_I2V_14B_480p_cfg_step_distill_rank64_bf16.safetensors"
print("Loading transformer ....")
transformer = WanAnimateTransformer3DModel.from_single_file(
single_file_url,
quantization_config=GGUFQuantizationConfig(compute_dtype=torch.bfloat16),
config=original_model_id,
subfolder="transformer",
torch_dtype=torch.bfloat16,
offload_device="cpu",
device=device_gpu
)
print("Transformer loaded successfully ....")
print("Loading pipeline ....")
pipe = WanAnimatePipeline.from_pretrained(
original_model_id,
transformer=transformer,
torch_dtype=torch.bfloat16,
)
if LoRA:
pipe.load_lora_weights(
lora_model_id,
weight_name=lora_model_path,
adapter_name="lightning",
offload_device="cpu",
device=device_gpu
)
pipe.enable_model_cpu_offload()
print("Pipeline loaded successfully ....")
# Load the character image
image = load_image(
"Wan2.2/examples/wan_animate/animate/image.jpeg",
)
# Load pose and face videos (preprocessed from reference video)
# Note: Videos should be preprocessed to extract pose keypoints and face features
# Refer to the Wan-Animate preprocessing documentation for details
pose_video = load_video("Wan2.2/examples/wan_animate/animate/process_results/src_pose.mp4")
face_video = load_video("Wan2.2/examples/wan_animate/animate/process_results/src_face.mp4")
# Calculate optimal dimensions based on VAE constraints
max_area = 1280 * 720
aspect_ratio = image.height / image.width
mod_value = pipe.vae_scale_factor_spatial * pipe.transformer.config.patch_size[1]
height = round(np.sqrt(max_area * aspect_ratio)) // mod_value * mod_value
width = round(np.sqrt(max_area / aspect_ratio)) // mod_value * mod_value
image = image.resize((width, height))
prompt = "People in the video are doing actions."
# Animation mode: Animate the character with the motion from pose/face videos
print("Generating animation ....")
if LoRA:
output = pipe(
image=image,
pose_video=pose_video,
face_video=face_video,
prompt=prompt,
# negative_prompt=negative_prompt,
height=height,
width=width,
segment_frame_length=77,
guidance_scale=1.0,
prev_segment_conditioning_frames=1, # refert_num in original code
num_inference_steps=4,
mode="animate",
).frames[0]
else:
output = pipe(
image=image,
pose_video=pose_video,
face_video=face_video,
prompt=prompt,
# negative_prompt=negative_prompt,
height=height,
width=width,
segment_frame_length=77,
guidance_scale=1.0,
prev_segment_conditioning_frames=1, # refert_num in original code
num_inference_steps=20,
mode="animate",
).frames[0]
print("Exporting animation ....")
export_to_video(output, "wan_animate_gguf_lora.mp4", fps=30)Sample with GGUF + LoRA:
wan_animate_gguf_lora.mp4
|
Failing tests are unrelated. |
What does this PR do?
Fixes https://github.com/huggingface/diffusers/actions/runs/21424431175/job/61690688194?pr=12994#step:7:437