Step3.5 MoE support by meenchen · Pull Request #1063 · NVIDIA/Model-Optimizer

meenchen · 2026-03-17T21:24:15Z

What does this PR do?

Type of change: ?

Usage

# Add a code snippet demonstrating how to use this

Testing

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed (git commit -s -S).

Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded trust_remote_code=True, torch.load(..., weights_only=False), pickle, etc.).

Is this change backward compatible?: ✅ / ❌ / N/A
If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: ✅ / ❌ / N/A
Did you write any new necessary tests?: ✅ / ❌ / N/A
Did you update Changelog?: ✅ / ❌ / N/A

Additional Information

Signed-off-by: weimingc <17592131+meenchen@users.noreply.github.com>

copy-pr-bot · 2026-03-17T21:24:19Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

coderabbitai · 2026-03-17T21:24:23Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: ca85a750-bdd5-4f03-aefe-2fe517f7410b

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch weimingc/step-3.5-flash

📝 Coding Plan

Generate coding plan for human review comments

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Tip

You can customize the high-level summary generated by CodeRabbit.

Configure the reviews.high_level_summary_instructions setting to provide custom instructions for generating the high-level summary.

cjluo-nv · 2026-03-17T22:33:55Z

modelopt/torch/quantization/plugins/huggingface.py

+        x = self.input_quantizer(x)
+        # Select expert weight and quantize it
+        expert_weight = self.weight[expert_id]
+        expert_weight = self.weight_quantizer(expert_weight)


Should we use per_expert weight quantizer?

Signed-off-by: weimingc <17592131+meenchen@users.noreply.github.com>

cjluo-nv · 2026-03-17T23:33:19Z

modelopt/torch/quantization/plugins/huggingface.py

+        # Cast input to match expert weight dtype before linear operation,
+        # then cast output to float32 to match original MoELinear forward behavior.
+        expert = self.experts[expert_id]
+        x = x.to(expert.weight.dtype)


which one has higher precision? x or weight?

x is float32.

I think we need to cast weight up instead here. Might lose some accuracy here

Signed-off-by: weimingc <17592131+meenchen@users.noreply.github.com>

step3p5_moe support

8634500

Signed-off-by: weimingc <17592131+meenchen@users.noreply.github.com>

cjluo-nv approved these changes Mar 17, 2026

View reviewed changes

cjluo-nv reviewed Mar 17, 2026

View reviewed changes

meenchen added 2 commits March 17, 2026 23:21

separate experts

4cd8893

Signed-off-by: weimingc <17592131+meenchen@users.noreply.github.com>

fix dtype mismatch

ccc7ae1

Signed-off-by: weimingc <17592131+meenchen@users.noreply.github.com>

cjluo-nv reviewed Mar 17, 2026

View reviewed changes

meenchen added 2 commits March 17, 2026 23:33

fix gate_proj

7d2b032

Signed-off-by: weimingc <17592131+meenchen@users.noreply.github.com>

fix

3ce0335

Signed-off-by: weimingc <17592131+meenchen@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Step3.5 MoE support#1063

Step3.5 MoE support#1063
meenchen wants to merge 5 commits intomainfrom
weimingc/step-3.5-flash

meenchen commented Mar 17, 2026

Uh oh!

copy-pr-bot bot commented Mar 17, 2026

Uh oh!

coderabbitai bot commented Mar 17, 2026 •

edited

Loading

Review skipped

Uh oh!

cjluo-nv Mar 17, 2026

Uh oh!

cjluo-nv Mar 17, 2026

Uh oh!

meenchen Mar 17, 2026

Uh oh!

meenchen Mar 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

meenchen commented Mar 17, 2026

What does this PR do?

Usage

Testing

Before your PR is "Ready for review"

Additional Information

Uh oh!

copy-pr-bot bot commented Mar 17, 2026

Uh oh!

coderabbitai bot commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

cjluo-nv Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

cjluo-nv Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

meenchen Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

meenchen Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai bot commented Mar 17, 2026 •

edited

Loading