Improve performance of gemma4 MoE inference. by NicoGrande · Pull Request #3875 · AI-Hypercomputer/maxtext

NicoGrande · 2026-05-11T22:26:01Z

Description

This PR introduces a small, inference specific optimization to reduce overall step time. Specifically, this PR moves the per-expert scale application to the model initialization instead of during the call method, given that these parameters are static during inference.

Tests

vllm_decode.py before and after.

Step time without this PR: 13.59 ms
Step time with this PR: 7.975 ms

Checklist

Before submitting this PR, please make sure (put X in square brackets):

I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
I have necessary comments in my code, particularly in hard-to-understand areas.
I have run end-to-end tests tests and provided workload links above if applicable.
I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

codecov · 2026-05-11T22:33:14Z

Codecov Report

❌ Patch coverage is 0% with 3 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/maxtext/layers/moe.py	0.00%	1 Missing and 2 partials ⚠️

📢 Thoughts on this report? Let us know!

NicoGrande requested review from RissyRan, gagika, jesselu-google, michelle-yooh, richjames0, shralex and suexu1025 as code owners May 11, 2026 22:26

gobbleturk reviewed May 11, 2026

View reviewed changes

Comment thread src/maxtext/layers/moe.py Outdated

gobbleturk requested changes May 11, 2026

View reviewed changes

NicoGrande force-pushed the nicogrande/improve-gemma4-vllm-perf branch from 1bea1ad to e3bc236 Compare May 11, 2026 22:28

NicoGrande force-pushed the nicogrande/improve-gemma4-vllm-perf branch from e3bc236 to 9d09e18 Compare May 11, 2026 23:12

NicoGrande requested review from A9isha, NuojCheng, SurbhiJainUSC, abhinavclemson, aireenmei, bvandermoon, dipannita08, hengtaoguo, igorts-git, jiangjy1982, khatwanimohit and vipannalla as code owners May 11, 2026 23:12

NicoGrande force-pushed the nicogrande/improve-gemma4-vllm-perf branch from 9d09e18 to 2f91e1b Compare May 11, 2026 23:15

gobbleturk approved these changes May 11, 2026

View reviewed changes

igorts-git approved these changes May 11, 2026

View reviewed changes

improve performance of gemma4 MoE inference.

5b0ae2f

NicoGrande force-pushed the nicogrande/improve-gemma4-vllm-perf branch from 2f91e1b to 5b0ae2f Compare May 11, 2026 23:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve performance of gemma4 MoE inference.#3875

Improve performance of gemma4 MoE inference.#3875
NicoGrande wants to merge 1 commit into
mainfrom
nicogrande/improve-gemma4-vllm-perf

NicoGrande commented May 11, 2026

Uh oh!

Uh oh!

codecov Bot commented May 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

NicoGrande commented May 11, 2026

Description

Tests

Checklist

Uh oh!

Uh oh!

codecov Bot commented May 11, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants