Add DiffSynth blockwise ControlNet support to QwenImageControlNetModel by aryabyte21 · Pull Request #13268 · huggingface/diffusers

aryabyte21 · 2026-03-15T05:17:57Z

Summary

Adds support for the DiffSynth-Studio blockwise ControlNet architecture to the existing QwenImageControlNetModel, addressing the maintainer's feedback on PR #12317 by integrating it into the existing class rather than creating a separate one.

What changed

Added controlnet_block_type config parameter to QwenImageControlNetModel ("linear" default for InstantX, "blockwise" for DiffSynth)
Added BlockWiseControlBlock module: RMSNorm + MLP fusion block that normalizes both base hidden states and control features separately before fusing via Linear → GELU → Linear (zero-initialized output)
No pipeline changes needed — both variants produce the same controlnet_block_samples output format
Added conversion script for DiffSynth checkpoints (scripts/convert_diffsynth_blockwise_controlnet_to_diffusers.py)

Key design decision

Per @yiyixuxu's review comment on #12317:

"is this the only difference between Diffsynth & instantX controlnet, can we just add the change to the existing one?"

Yes — the only architectural difference is in the controlnet blocks themselves. The blockwise variant uses BlockWiseControlBlock(x, y) (two inputs: base + control features) instead of zero_module(nn.Linear)(x) (one input). Everything else — the pipeline, transformer integration, and output format — is identical.

Available checkpoints

Usage

from diffusers import QwenImageControlNetModel

# Blockwise variant (DiffSynth)
controlnet = QwenImageControlNetModel.from_pretrained(
    "path/to/converted/blockwise-controlnet"
)

# Or create from base transformer
controlnet = QwenImageControlNetModel.from_transformer(
    transformer,
    controlnet_block_type="blockwise",
)

Closes #12221

Test plan

Existing test_qwen_controlnet passes (no regression)
Existing test_qwen_controlnet_multicondition passes (no regression)
New test_qwen_blockwise_controlnet passes
New test_qwen_blockwise_controlnet_from_transformer passes
Full GPU inference test with converted DiffSynth checkpoint (needs GPU)

Extends the existing QwenImageControlNetModel with a `controlnet_block_type` config parameter to support the DiffSynth blockwise ControlNet architecture alongside the existing InstantX linear projection approach. The blockwise variant uses BlockWiseControlBlock modules (RMSNorm + MLP) that fuse base hidden states with control features at each transformer block, instead of the simple zero-initialized linear projections used by InstantX. Closes huggingface#12221

aryabyte21 mentioned this pull request Mar 15, 2026

[Looking for community contribution] support DiffSynth Controlnet in diffusers #12221

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add DiffSynth blockwise ControlNet support to QwenImageControlNetModel#13268

Add DiffSynth blockwise ControlNet support to QwenImageControlNetModel#13268
aryabyte21 wants to merge 1 commit intohuggingface:mainfrom
aryabyte21:feat/diffsynth-blockwise-controlnet

aryabyte21 commented Mar 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

aryabyte21 commented Mar 15, 2026

Summary

What changed

Key design decision

Available checkpoints

Usage

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant