Skip to content

Add DiffSynth blockwise ControlNet support to QwenImageControlNetModel#13268

Open
aryabyte21 wants to merge 1 commit intohuggingface:mainfrom
aryabyte21:feat/diffsynth-blockwise-controlnet
Open

Add DiffSynth blockwise ControlNet support to QwenImageControlNetModel#13268
aryabyte21 wants to merge 1 commit intohuggingface:mainfrom
aryabyte21:feat/diffsynth-blockwise-controlnet

Conversation

@aryabyte21
Copy link

Summary

Adds support for the DiffSynth-Studio blockwise ControlNet architecture to the existing QwenImageControlNetModel, addressing the maintainer's feedback on PR #12317 by integrating it into the existing class rather than creating a separate one.

What changed

  • Added controlnet_block_type config parameter to QwenImageControlNetModel ("linear" default for InstantX, "blockwise" for DiffSynth)
  • Added BlockWiseControlBlock module: RMSNorm + MLP fusion block that normalizes both base hidden states and control features separately before fusing via Linear → GELU → Linear (zero-initialized output)
  • No pipeline changes needed — both variants produce the same controlnet_block_samples output format
  • Added conversion script for DiffSynth checkpoints (scripts/convert_diffsynth_blockwise_controlnet_to_diffusers.py)

Key design decision

Per @yiyixuxu's review comment on #12317:

"is this the only difference between Diffsynth & instantX controlnet, can we just add the change to the existing one?"

Yes — the only architectural difference is in the controlnet blocks themselves. The blockwise variant uses BlockWiseControlBlock(x, y) (two inputs: base + control features) instead of zero_module(nn.Linear)(x) (one input). Everything else — the pipeline, transformer integration, and output format — is identical.

Available checkpoints

Usage

from diffusers import QwenImageControlNetModel

# Blockwise variant (DiffSynth)
controlnet = QwenImageControlNetModel.from_pretrained(
    "path/to/converted/blockwise-controlnet"
)

# Or create from base transformer
controlnet = QwenImageControlNetModel.from_transformer(
    transformer,
    controlnet_block_type="blockwise",
)

Closes #12221

Test plan

  • Existing test_qwen_controlnet passes (no regression)
  • Existing test_qwen_controlnet_multicondition passes (no regression)
  • New test_qwen_blockwise_controlnet passes
  • New test_qwen_blockwise_controlnet_from_transformer passes
  • Full GPU inference test with converted DiffSynth checkpoint (needs GPU)

Extends the existing QwenImageControlNetModel with a `controlnet_block_type`
config parameter to support the DiffSynth blockwise ControlNet architecture
alongside the existing InstantX linear projection approach.

The blockwise variant uses BlockWiseControlBlock modules (RMSNorm + MLP)
that fuse base hidden states with control features at each transformer
block, instead of the simple zero-initialized linear projections used
by InstantX.

Closes huggingface#12221
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Looking for community contribution] support DiffSynth Controlnet in diffusers

1 participant