[Misc][Quantization] Clarify the intent of GGUF `FusedMoE` weight materialization #30310

a4lg · 2025-12-09T06:40:01Z

Purpose

This is a refactoring PR with no functional changes (unless the GGUF file is broken).

In the process of FusedMoE weight data materialization from GGUF files, there is a magic number and some intents are not clear enough.

This commit clarifies some of them:

GGUF (currently) requires 3D tensor(s) (i.e. full_load) for FusedMoE layer weights.
w1 and w3 are merged per expert, i.e. the next dimension after the expert ID is to be doubled to store both w1 and w3.
- Expert ID is the first dimension (as in the code right after the if is_gguf_weight... block).
- That means, the second dimension's size (final_shape[1]) should be doubled for w1 and w3, improving the clarity.

... and makes some minor adjustments.

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

In the process of FusedMoE weight data materialization from GGUF files, there is a magic number and some intents are not clear enough. This commit clarifies some of them: 1. GGUF (currently) requires 3D tensor(s) for FusedMoE layer weights. 2. w1 and w3 are merged per expert, i.e. the next dimension after the expert ID is to be doubled to store both w1 and w3. ... and makes some minor adjustments. Signed-off-by: Tsukasa OI <floss_llm@irq.a4lg.com>

gemini-code-assist

Code Review

This pull request is a refactoring that clarifies the weight materialization process for GGUF FusedMoE layers. The changes introduce an assertion to ensure that GGUF weights for FusedMoE are loaded as 3D tensors, which makes an implicit assumption explicit and improves robustness. Additionally, comments have been added to explain the logic behind handling merged weights (w1 and w3), and a minor style improvement was made by using a set for membership testing. These changes improve the code's clarity and maintainability without altering the core functionality. The implementation is correct and I have no further recommendations.

a4lg requested review from mgoin and pavanimajety as code owners December 9, 2025 06:40

gemini-code-assist bot reviewed Dec 9, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Misc][Quantization] Clarify the intent of GGUF `FusedMoE` weight materialization #30310

[Misc][Quantization] Clarify the intent of GGUF `FusedMoE` weight materialization #30310

a4lg commented Dec 9, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

[Misc][Quantization] Clarify the intent of GGUF FusedMoE weight materialization #30310

Are you sure you want to change the base?

[Misc][Quantization] Clarify the intent of GGUF FusedMoE weight materialization #30310

Conversation

a4lg commented Dec 9, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

[Misc][Quantization] Clarify the intent of GGUF `FusedMoE` weight materialization #30310

[Misc][Quantization] Clarify the intent of GGUF `FusedMoE` weight materialization #30310

a4lg commented Dec 9, 2025 •

edited by github-actions bot

Loading