Address ernie-image review findings #13577 by akshan-main · Pull Request #13663 · huggingface/diffusers

akshan-main · 2026-04-30T14:09:01Z

What does this PR do?

Partial fix for #13577. Addresses 1, 2, 5 per @yiyixuxu's scope

(1) Switch ErnieImageAutoPromptEnhancerStep to ConditionalPipelineBlocks so use_pe=False actually skips the prompt enhancer (AutoPipelineBlocks selected on presence, not truthiness).
(2) Align modular VAE BN epsilon to the standard pipeline's hardcoded 1e-5 (matches training; the hub config currently reports 1e-4).
(5) Restructure output_type=\"latent\" so it runs maybe_free_model_hooks() and honors return_dict, matching the QwenImage/Flux2 pattern.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline?
Did you read our philosophy doc (important for complex PRs)?
Was this discussed/approved via a GitHub issue? ernie-image model/pipeline review #13577
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

Who can review?

@yiyixuxu

akshan-main · 2026-04-30T14:12:06Z

https://colab.research.google.com/gist/akshan-main/c6e677d61a865593c1d98aa71aaf2afd/ernie_image_review_fixes_test.ipynb

tested here

asomoza · 2026-04-30T15:21:51Z

can we also take the opportunity to change the auto classes for the real classes, it gets confusing for some users that pass the text encoder e.g. quantization and also kind of annoying to get the warning all the time.

akshan-main · 2026-04-30T16:29:04Z

@asomoza switched text_encoder to Mistral3Model and pe to Ministral3ForCausalLM in both the standard and modular pipelines. Left the tokenizers as AutoTokenizer since mistral doesn't have a model-specific tokenizer class

hlky · 2026-04-30T18:50:42Z

+            bn_mean = self.vae.bn.running_mean.view(1, -1, 1, 1).to(device)
+            bn_std = torch.sqrt(self.vae.bn.running_var.view(1, -1, 1, 1) + 1e-5).to(device)


dtype casting to be safe and for consistency with modular

Suggested change

bn_mean = self.vae.bn.running_mean.view(1, -1, 1, 1).to(device)

bn_std = torch.sqrt(self.vae.bn.running_var.view(1, -1, 1, 1) + 1e-5).to(device)

bn_mean = self.vae.bn.running_mean.view(1, -1, 1, 1).to(device=device, dtype=latents.dtype)

bn_std = torch.sqrt(self.vae.bn.running_var.view(1, -1, 1, 1) + 1e-5).to(device=device, dtype=latents.dtype)

There could be a TODO regarding vae.config.batch_norm_eps, it should be used in the future if the checkpoint config is changed

hlky · 2026-04-30T18:54:06Z

+            images = (images.clamp(-1, 1) + 1) / 2
+            images = images.cpu().permute(0, 2, 3, 1).float().numpy()

-        if output_type == "pil":
-            images = [Image.fromarray((img * 255).astype("uint8")) for img in images]
+            if output_type == "pil":
+                images = [Image.fromarray((img * 255).astype("uint8")) for img in images]


Can VaeImageProcessor be used here? cc @yiyixuxu Enforcing VaeImageProcessor could be another agent review rule?

Switched both standard and modular to VaeImageProcessor.postprocess. Also fixes output_type="pt" in the standard pipeline (was returning numpy).

akshan-main · 2026-05-06T00:33:11Z

Hey @hlky! Friendly ping. Could you review it whenever you get a chance? Thanks!

yiyixuxu

thanks!

yiyixuxu · 2026-05-06T02:09:04Z

thanks a lot for working on this @akshan-main
can you
(1) resolve the conflicts
(2) run a quick sanity check to show the generation is the same as main?

…fixes # Conflicts: # src/diffusers/pipelines/ernie_image/pipeline_ernie_image.py

akshan-main · 2026-05-06T03:53:39Z

ran baidu/ERNIE-Image on main and on this PR with the same prompt and seed. Generation is identical.

tested using

import os, subprocess, pathlib
import numpy as np
from PIL import Image
import IPython.display as ip
WORK = pathlib.Path("/content/work"); WORK.mkdir(exist_ok=True)

# clone PR if missing
if not pathlib.Path("/content/diffusers-pr").exists():
    subprocess.run(['git', 'clone', '--depth=1', '--branch', 'ernie-image-review-fixes',
                    'https://github.com/akshan-main/diffusers.git', '/content/diffusers-pr'])

SCRIPT = """
import torch
torch.manual_seed(42)
torch.cuda.manual_seed_all(42)

import sys, numpy as np
sys.path.insert(0, %r + '/src')
for m in list(sys.modules):
    if m == 'diffusers' or m.startswith('diffusers.'):
        del sys.modules[m]

from diffusers import ErnieImagePipeline
pipe = ErnieImagePipeline.from_pretrained('baidu/ERNIE-Image', torch_dtype=torch.bfloat16).to('cuda')
pipe.set_progress_bar_config(disable=True)
gen = torch.Generator('cuda').manual_seed(42)
out = pipe(prompt='a photo of an astronaut riding a horse on mars',
           num_inference_steps=20, generator=gen, output_type='np')
np.save(%r, out.images[0])
"""

paths = {}
for repo, key in [('/content/diffusers-main', 'main'), ('/content/diffusers-pr', 'pr')]:
    out_path = f'/content/parity_{key}.npy'
    script = WORK / f'parity_{key}.py'
    script.write_text(SCRIPT % (repo, out_path))
    r = subprocess.run(['python', str(script)], capture_output=True, text=True, env={**os.environ})
    if r.returncode:
        print(f"{key} FAILED:"); print(r.stderr[-800:]); raise SystemExit
    paths[key] = out_path

a, b = np.load(paths['main']), np.load(paths['pr'])
diff = np.abs(a.astype(np.float32) - b.astype(np.float32))
print(f'main vs PR:  max={diff.max():.6f}  mean={diff.mean():.6f}  identical={np.array_equal(a, b)}')

m = Image.fromarray((a * 255).round().astype(np.uint8))
p = Image.fromarray((b * 255).round().astype(np.uint8))
m.save('/content/parity_main.png')
p.save('/content/parity_pr.png')
w, h = m.size
combined = Image.new('RGB', (w * 2 + 10, h), (0, 0, 0))
combined.paste(m, (0, 0))
combined.paste(p, (w + 10, 0))
combined.thumbnail((1600, 800))
print('left = main     right = PR')
ip.display(combined)

akshan-main · 2026-05-06T04:15:47Z

Modular

yiyixuxu · 2026-05-06T04:55:24Z

@akshan-main
Thanks, do you want to put the Ministral3ForCausalLM/Mistral3Model change into a separate PR so we can merge this one first?
https://github.com/huggingface/diffusers/actions/runs/25416710358/job/74549649787?pr=13663#step:16:87

akshan-main · 2026-05-06T05:01:23Z

@yiyixuxu yes!

Address ernie-image review findings huggingface#13577

6e61370

github-actions Bot added modular-pipelines pipelines size/M PR with diff < 200 LOC labels Apr 30, 2026

asomoza reviewed Apr 30, 2026

View reviewed changes

Use concrete Mistral3Model / Ministral3ForCausalLM types

2b297bf

github-actions Bot added size/M PR with diff < 200 LOC and removed size/M PR with diff < 200 LOC labels Apr 30, 2026

akshan-main requested a review from asomoza April 30, 2026 16:29

hlky reviewed Apr 30, 2026

View reviewed changes

Cast bn_mean/bn_std to latents dtype + add TODO for hub eps

1176735

github-actions Bot added size/M PR with diff < 200 LOC and removed size/M PR with diff < 200 LOC labels Apr 30, 2026

Use VaeImageProcessor.postprocess in standard and modular ernie

26d8bc0

github-actions Bot added size/M PR with diff < 200 LOC and removed size/M PR with diff < 200 LOC labels Apr 30, 2026

akshan-main requested a review from hlky April 30, 2026 19:37

hlky mentioned this pull request May 1, 2026

fix(ddpm): use _execution_device, validate inputs, free hooks (#13649) #13671

Open

5 tasks

yiyixuxu approved these changes May 6, 2026

View reviewed changes

yiyixuxu added the close-to-merge label May 6, 2026

Merge remote-tracking branch 'upstream/main' into ernie-image-review-…

cf3f6b3

…fixes # Conflicts: # src/diffusers/pipelines/ernie_image/pipeline_ernie_image.py

github-actions Bot added size/M PR with diff < 200 LOC and removed size/M PR with diff < 200 LOC labels May 6, 2026

akshan-main requested a review from yiyixuxu May 6, 2026 03:57

Merge branch 'main' into ernie-image-review-fixes

3033a54

github-actions Bot added size/M PR with diff < 200 LOC and removed size/M PR with diff < 200 LOC labels May 6, 2026

		bn_mean = self.vae.bn.running_mean.view(1, -1, 1, 1).to(device)
		bn_std = torch.sqrt(self.vae.bn.running_var.view(1, -1, 1, 1) + 1e-5).to(device)

Conversation

akshan-main commented Apr 30, 2026

What does this PR do?

Before submitting

Who can review?

Uh oh!

akshan-main commented Apr 30, 2026

Uh oh!

asomoza Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

akshan-main commented Apr 30, 2026

Uh oh!

hlky Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

akshan-main Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

hlky Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

akshan-main Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

akshan-main commented May 6, 2026

Uh oh!

yiyixuxu left a comment

Choose a reason for hiding this comment

Uh oh!

yiyixuxu commented May 6, 2026

Uh oh!

akshan-main commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

akshan-main commented May 6, 2026

Uh oh!

yiyixuxu commented May 6, 2026

Uh oh!

akshan-main commented May 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

akshan-main commented May 6, 2026 •

edited

Loading