feat: NPU FA support attention mask #751

zhangtao0408 · 2026-01-25T09:01:13Z

What does this PR do?

Fixes: huggingface/diffusers#13016
Related PR: huggingface/diffusers#13017

Test Codes

export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
export MODEL_PATH=/PATH/TO/Qwen-Image-Edit-2509
export LORA_PATH=/PATH/TO/Qwen-Image-Edit-2509/lora

torchrun --nproc_per_node=4 generate.py qwen_image_edit_lightning \
	--warmup 1 \
	--model-path $MODEL_PATH \
	--lora-path $LORA_PATH \
	--height 1024 \
	--width 1024 \
	--steps 4 \
	--parallel ulysses \
	--attn _native_npu \
	--ulysses-anything \
	--parallel-text-encoder \
	--parallel-vae \
    --ulysses-async

Results

Stage	E2E Time
Before PR	3.64s
After PR	3.72s

Before this PR

INFO 01-23 12:54:04 [base.py:630] ----------------------------------------------------------------------------------------------------
INFO 01-23 12:54:04 [base.py:395] 🤖 Example Init Config Summary:
INFO 01-23 12:54:04 [base.py:418] - Model: 
INFO 01-23 12:54:04 [base.py:418]     - /PATH/TO/Qwen-Image-Edit-2509
INFO 01-23 12:54:04 [base.py:418]     - lightx2v/Qwen-Image-Lightning
INFO 01-23 12:54:04 [base.py:418] - Task Type: IE2I - Image Editing to Image
INFO 01-23 12:54:04 [base.py:418] - Torch Dtype: torch.bfloat16
INFO 01-23 12:54:04 [base.py:418] - LoRA Weights: /PATH/TO/Qwen-Image-Edit-2509-Lightning-4steps-V1.0-bf16.safetensors
[rank2]:[W123 12:54:04.983532073 ProcessGroup.hpp:916] Warning: No backend of type 0 found for Process Group with name undefined. Assuming no hooks are registered. (function hasHooks)
INFO 01-23 12:54:04 [base.py:212] 🤖 Example Input Summary:
INFO 01-23 12:54:04 [base.py:212] - prompt: The magician bear is on the left, the alchemist bear is on the right, facing each other in the central park square.
INFO 01-23 12:54:04 [base.py:212] - negative_prompt:  
INFO 01-23 12:54:04 [base.py:212] - height: 1024
INFO 01-23 12:54:04 [base.py:212] - width: 1024
INFO 01-23 12:54:04 [base.py:212] - true_cfg_scale: 1.0
INFO 01-23 12:54:04 [base.py:212] - num_inference_steps: 4
INFO 01-23 12:54:04 [base.py:212] - image: List Images (2 images)
INFO 01-23 12:54:04 [base.py:212]     - Image 0: (1228x1228)
INFO 01-23 12:54:04 [base.py:212]     - Image 1: (1226x1228)
INFO 01-23 12:54:04 [base.py:212] - generator: device cpu, seed 0
INFO 01-23 12:54:04 [base.py:307] 🤖 Example Output Summary:
INFO 01-23 12:54:04 [base.py:323] - Model: qwen_image_edit_lightning
INFO 01-23 12:54:04 [base.py:323] - Optimization: 1024x1024_C0_Q0_NONE_Ulysses4_TEP_VAEP_ulysses_anything_ulysses_async_native_npu
INFO 01-23 12:54:04 [base.py:323] - Device: Ascend910B3 x 4
INFO 01-23 12:54:04 [base.py:323] - Load Time: 146.50s
INFO 01-23 12:54:04 [base.py:323] - Warmup Time: 26.16s
INFO 01-23 12:54:04 [base.py:323] - Inference Time: 3.64s
INFO 01-23 12:54:05 [base.py:246] Image saved to qwen_image_edit_lightning.1024x1024_C0_Q0_NONE_Ulysses4_TEP_VAEP_ulysses_anything_ulysses_async_native_npu.png
INFO 01-23 12:54:05 [base.py:641] ----------------------------------------------------------------------------------------------------

After this PR

INFO 01-23 13:04:47 [base.py:630] ----------------------------------------------------------------------------------------------------
INFO 01-23 13:04:47 [base.py:395] 🤖 Example Init Config Summary:
INFO 01-23 13:04:47 [base.py:418] - Model: 
INFO 01-23 13:04:47 [base.py:418]     - /PATH/TO/Qwen-Image-Edit-2509
INFO 01-23 13:04:47 [base.py:418]     - lightx2v/Qwen-Image-Lightning
INFO 01-23 13:04:47 [base.py:418] - Task Type: IE2I - Image Editing to Image
INFO 01-23 13:04:47 [base.py:418] - Torch Dtype: torch.bfloat16
INFO 01-23 13:04:47 [base.py:418] - LoRA Weights: /PATH/TO/Qwen-Image-Edit-2509-Lightning-4steps-V1.0-bf16.safetensors
[rank1]:[W123 13:04:47.961069787 ProcessGroup.hpp:916] Warning: No backend of type 0 found for Process Group with name undefined. Assuming no hooks are registered. (function hasHooks)
INFO 01-23 13:04:47 [base.py:212] 🤖 Example Input Summary:
INFO 01-23 13:04:47 [base.py:212] - prompt: The magician bear is on the left, the alchemist bear is on the right, facing each other in the central park square.
INFO 01-23 13:04:47 [base.py:212] - negative_prompt:  
INFO 01-23 13:04:47 [base.py:212] - height: 1024
INFO 01-23 13:04:47 [base.py:212] - width: 1024
INFO 01-23 13:04:47 [base.py:212] - true_cfg_scale: 1.0
INFO 01-23 13:04:47 [base.py:212] - num_inference_steps: 4
INFO 01-23 13:04:47 [base.py:212] - image: List Images (2 images)
INFO 01-23 13:04:47 [base.py:212]     - Image 0: (1228x1228)
INFO 01-23 13:04:47 [base.py:212]     - Image 1: (1226x1228)
INFO 01-23 13:04:47 [base.py:212] - generator: device cpu, seed 0
INFO 01-23 13:04:47 [base.py:307] 🤖 Example Output Summary:
INFO 01-23 13:04:47 [base.py:323] - Model: qwen_image_edit_lightning
INFO 01-23 13:04:47 [base.py:323] - Optimization: 1024x1024_C0_Q0_NONE_Ulysses4_TEP_VAEP_ulysses_anything_ulysses_async_native_npu
INFO 01-23 13:04:47 [base.py:323] - Device: Ascend910B3 x 4
INFO 01-23 13:04:47 [base.py:323] - Load Time: 150.44s
INFO 01-23 13:04:47 [base.py:323] - Warmup Time: 17.23s
INFO 01-23 13:04:47 [base.py:323] - Inference Time: 3.72s
INFO 01-23 13:04:48 [base.py:246] Image saved to qwen_image_edit_lightning.1024x1024_C0_Q0_NONE_Ulysses4_TEP_VAEP_ulysses_anything_ulysses_async_native_npu.png
INFO 01-23 13:04:48 [base.py:641] ----------------------------------------------------------------------------------------------------

DefTruth · 2026-01-25T13:45:51Z

please fix pre-commit

zhangtao0408 · 2026-01-26T01:51:51Z

Pre-Commit Check passed

# pre-commit run --all-files
[INFO] Initializing environment for https://github.com/pre-commit/pre-commit-hooks.
[WARNING] repo `https://github.com/pre-commit/pre-commit-hooks` uses deprecated stage names (commit, push) which will be removed in a future version.  Hint: often `pre-commit autoupdate --repo https://github.com/pre-commit/pre-commit-hooks` will fix this.  if it does not -- consider reporting an issue to that repo.
[INFO] Initializing environment for https://github.com/PyCQA/flake8.
[INFO] Initializing environment for https://github.com/PyCQA/pydocstyle.
[INFO] Initializing environment for https://github.com/psf/black.
[INFO] Initializing environment for https://github.com/psf/black:.[jupyter].
[INFO] Installing environment for https://github.com/pre-commit/pre-commit-hooks.
[INFO] Once installed this environment will be reused.
[INFO] This may take a few minutes...
[INFO] Installing environment for https://github.com/PyCQA/flake8.
[INFO] Once installed this environment will be reused.
[INFO] This may take a few minutes...
[INFO] Installing environment for https://github.com/PyCQA/pydocstyle.
[INFO] Once installed this environment will be reused.
[INFO] This may take a few minutes...
[INFO] Installing environment for https://github.com/psf/black.
[INFO] Once installed this environment will be reused.
[INFO] This may take a few minutes...
Check docstring is first.................................................Passed
Check Toml...............................................................Passed
Check Yaml...............................................................Passed
Mixed line ending........................................................Passed
Fix End of Files.........................................................Passed
flake8...................................................................Passed
pydocstyle...............................................................Passed
black-jupyter............................................................Passed

DefTruth

LGTM~ Thanks for working on this!

DefTruth and others added 2 commits January 21, 2026 12:24

fix UAA broken while using joint attn

bb033b3

Feat. NPU FA support attention mask.

f9e2ffd

TaoZhang-Work added 2 commits January 26, 2026 09:28

Clean code

a613055

clean code

ac0f459

DefTruth changed the title ~~Feat. NPU FA support attention mask.~~ feat: NPU FA support attention mask Jan 26, 2026

DefTruth approved these changes Jan 26, 2026

View reviewed changes

DefTruth merged commit 9baf4b8 into vipshop:main Jan 26, 2026
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: NPU FA support attention mask #751

feat: NPU FA support attention mask #751

Uh oh!

zhangtao0408 commented Jan 25, 2026 •

edited

Loading

Uh oh!

DefTruth commented Jan 25, 2026

Uh oh!

zhangtao0408 commented Jan 26, 2026

Uh oh!

DefTruth left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

feat: NPU FA support attention mask #751

feat: NPU FA support attention mask #751

Uh oh!

Conversation

zhangtao0408 commented Jan 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Test Codes

Results

Uh oh!

DefTruth commented Jan 25, 2026

Uh oh!

zhangtao0408 commented Jan 26, 2026

Uh oh!

DefTruth left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

zhangtao0408 commented Jan 25, 2026 •

edited

Loading