Skip to content

Address ernie-image review findings #13577#13663

Open
akshan-main wants to merge 6 commits intohuggingface:mainfrom
akshan-main:ernie-image-review-fixes
Open

Address ernie-image review findings #13577#13663
akshan-main wants to merge 6 commits intohuggingface:mainfrom
akshan-main:ernie-image-review-fixes

Conversation

@akshan-main
Copy link
Copy Markdown
Contributor

What does this PR do?

Partial fix for #13577. Addresses 1, 2, 5 per @yiyixuxu's scope

  • (1) Switch ErnieImageAutoPromptEnhancerStep to ConditionalPipelineBlocks so use_pe=False actually skips the prompt enhancer (AutoPipelineBlocks selected on presence, not truthiness).
  • (2) Align modular VAE BN epsilon to the standard pipeline's hardcoded 1e-5 (matches training; the hub config currently reports 1e-4).
  • (5) Restructure output_type=\"latent\" so it runs maybe_free_model_hooks() and honors return_dict, matching the QwenImage/Flux2 pattern.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline?
  • Did you read our philosophy doc (important for complex PRs)?
  • Was this discussed/approved via a GitHub issue? ernie-image model/pipeline review #13577
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

Who can review?

@yiyixuxu

@akshan-main
Copy link
Copy Markdown
Contributor Author

Comment on lines 54 to 58
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we also take the opportunity to change the auto classes for the real classes, it gets confusing for some users that pass the text encoder e.g. quantization and also kind of annoying to get the warning all the time.

@github-actions github-actions Bot added size/M PR with diff < 200 LOC and removed size/M PR with diff < 200 LOC labels Apr 30, 2026
@akshan-main
Copy link
Copy Markdown
Contributor Author

@asomoza switched text_encoder to Mistral3Model and pe to Ministral3ForCausalLM in both the standard and modular pipelines. Left the tokenizers as AutoTokenizer since mistral doesn't have a model-specific tokenizer class

@akshan-main akshan-main requested a review from asomoza April 30, 2026 16:29
Comment on lines +368 to +369
bn_mean = self.vae.bn.running_mean.view(1, -1, 1, 1).to(device)
bn_std = torch.sqrt(self.vae.bn.running_var.view(1, -1, 1, 1) + 1e-5).to(device)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dtype casting to be safe and for consistency with modular

Suggested change
bn_mean = self.vae.bn.running_mean.view(1, -1, 1, 1).to(device)
bn_std = torch.sqrt(self.vae.bn.running_var.view(1, -1, 1, 1) + 1e-5).to(device)
bn_mean = self.vae.bn.running_mean.view(1, -1, 1, 1).to(device=device, dtype=latents.dtype)
bn_std = torch.sqrt(self.vae.bn.running_var.view(1, -1, 1, 1) + 1e-5).to(device=device, dtype=latents.dtype)

There could be a TODO regarding vae.config.batch_norm_eps, it should be used in the future if the checkpoint config is changed

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

Comment on lines +379 to +383
images = (images.clamp(-1, 1) + 1) / 2
images = images.cpu().permute(0, 2, 3, 1).float().numpy()

if output_type == "pil":
images = [Image.fromarray((img * 255).astype("uint8")) for img in images]
if output_type == "pil":
images = [Image.fromarray((img * 255).astype("uint8")) for img in images]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can VaeImageProcessor be used here? cc @yiyixuxu Enforcing VaeImageProcessor could be another agent review rule?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched both standard and modular to VaeImageProcessor.postprocess. Also fixes output_type="pt" in the standard pipeline (was returning numpy).

@github-actions github-actions Bot added size/M PR with diff < 200 LOC and removed size/M PR with diff < 200 LOC labels Apr 30, 2026
@github-actions github-actions Bot added size/M PR with diff < 200 LOC and removed size/M PR with diff < 200 LOC labels Apr 30, 2026
@akshan-main akshan-main requested a review from hlky April 30, 2026 19:37
@akshan-main
Copy link
Copy Markdown
Contributor Author

Hey @hlky! Friendly ping. Could you review it whenever you get a chance? Thanks!

Copy link
Copy Markdown
Collaborator

@yiyixuxu yiyixuxu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!

@yiyixuxu
Copy link
Copy Markdown
Collaborator

yiyixuxu commented May 6, 2026

thanks a lot for working on this @akshan-main
can you
(1) resolve the conflicts
(2) run a quick sanity check to show the generation is the same as main?

…fixes

# Conflicts:
#	src/diffusers/pipelines/ernie_image/pipeline_ernie_image.py
@github-actions github-actions Bot added size/M PR with diff < 200 LOC and removed size/M PR with diff < 200 LOC labels May 6, 2026
@akshan-main
Copy link
Copy Markdown
Contributor Author

akshan-main commented May 6, 2026

Screenshot 2026-05-05 at 8 52 21 PM

ran baidu/ERNIE-Image on main and on this PR with the same prompt and seed. Generation is identical.

tested using

import os, subprocess, pathlib
import numpy as np
from PIL import Image
import IPython.display as ip
WORK = pathlib.Path("/content/work"); WORK.mkdir(exist_ok=True)

# clone PR if missing
if not pathlib.Path("/content/diffusers-pr").exists():
    subprocess.run(['git', 'clone', '--depth=1', '--branch', 'ernie-image-review-fixes',
                    'https://github.com/akshan-main/diffusers.git', '/content/diffusers-pr'])

SCRIPT = """
import torch
torch.manual_seed(42)
torch.cuda.manual_seed_all(42)

import sys, numpy as np
sys.path.insert(0, %r + '/src')
for m in list(sys.modules):
    if m == 'diffusers' or m.startswith('diffusers.'):
        del sys.modules[m]

from diffusers import ErnieImagePipeline
pipe = ErnieImagePipeline.from_pretrained('baidu/ERNIE-Image', torch_dtype=torch.bfloat16).to('cuda')
pipe.set_progress_bar_config(disable=True)
gen = torch.Generator('cuda').manual_seed(42)
out = pipe(prompt='a photo of an astronaut riding a horse on mars',
           num_inference_steps=20, generator=gen, output_type='np')
np.save(%r, out.images[0])
"""

paths = {}
for repo, key in [('/content/diffusers-main', 'main'), ('/content/diffusers-pr', 'pr')]:
    out_path = f'/content/parity_{key}.npy'
    script = WORK / f'parity_{key}.py'
    script.write_text(SCRIPT % (repo, out_path))
    r = subprocess.run(['python', str(script)], capture_output=True, text=True, env={**os.environ})
    if r.returncode:
        print(f"{key} FAILED:"); print(r.stderr[-800:]); raise SystemExit
    paths[key] = out_path

a, b = np.load(paths['main']), np.load(paths['pr'])
diff = np.abs(a.astype(np.float32) - b.astype(np.float32))
print(f'main vs PR:  max={diff.max():.6f}  mean={diff.mean():.6f}  identical={np.array_equal(a, b)}')

m = Image.fromarray((a * 255).round().astype(np.uint8))
p = Image.fromarray((b * 255).round().astype(np.uint8))
m.save('/content/parity_main.png')
p.save('/content/parity_pr.png')
w, h = m.size
combined = Image.new('RGB', (w * 2 + 10, h), (0, 0, 0))
combined.paste(m, (0, 0))
combined.paste(p, (w + 10, 0))
combined.thumbnail((1600, 800))
print('left = main     right = PR')
ip.display(combined)

@akshan-main akshan-main requested a review from yiyixuxu May 6, 2026 03:57
@akshan-main
Copy link
Copy Markdown
Contributor Author

Screenshot 2026-05-05 at 9 15 41 PM Modular

@github-actions github-actions Bot added size/M PR with diff < 200 LOC and removed size/M PR with diff < 200 LOC labels May 6, 2026
@yiyixuxu
Copy link
Copy Markdown
Collaborator

yiyixuxu commented May 6, 2026

@akshan-main
Thanks, do you want to put the Ministral3ForCausalLM/Mistral3Model change into a separate PR so we can merge this one first?
https://github.com/huggingface/diffusers/actions/runs/25416710358/job/74549649787?pr=13663#step:16:87

@akshan-main
Copy link
Copy Markdown
Contributor Author

@yiyixuxu yes!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants