Skip to content

Conversation

@a4lg
Copy link
Contributor

@a4lg a4lg commented Dec 9, 2025

Purpose

This is a refactoring PR with no functional changes (unless the GGUF file is broken).

In the process of FusedMoE weight data materialization from GGUF files, there is a magic number and some intents are not clear enough.

This commit clarifies some of them:

  1. GGUF (currently) requires 3D tensor(s) (i.e. full_load) for FusedMoE layer weights.
  2. w1 and w3 are merged per expert, i.e. the next dimension after the expert ID is to be doubled to store both w1 and w3.
    • Expert ID is the first dimension (as in the code right after the if is_gguf_weight... block).
    • That means, the second dimension's size (final_shape[1]) should be doubled for w1 and w3, improving the clarity.

... and makes some minor adjustments.


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

In the process of FusedMoE weight data materialization from GGUF files,
there is a magic number and some intents are not clear enough.

This commit clarifies some of them:

1.  GGUF (currently) requires 3D tensor(s) for FusedMoE layer weights.
2.  w1 and w3 are merged per expert, i.e. the next dimension after
    the expert ID is to be doubled to store both w1 and w3.

... and makes some minor adjustments.

Signed-off-by: Tsukasa OI <floss_llm@irq.a4lg.com>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request is a refactoring that clarifies the weight materialization process for GGUF FusedMoE layers. The changes introduce an assertion to ensure that GGUF weights for FusedMoE are loaded as 3D tensors, which makes an implicit assumption explicit and improves robustness. Additionally, comments have been added to explain the logic behind handling merged weights (w1 and w3), and a minor style improvement was made by using a set for membership testing. These changes improve the code's clarity and maintainability without altering the core functionality. The implementation is correct and I have no further recommendations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant