Conversation
Signed-off-by: weimingc <17592131+meenchen@users.noreply.github.com>
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Comment Tip You can customize the high-level summary generated by CodeRabbit.Configure the |
| x = self.input_quantizer(x) | ||
| # Select expert weight and quantize it | ||
| expert_weight = self.weight[expert_id] | ||
| expert_weight = self.weight_quantizer(expert_weight) |
There was a problem hiding this comment.
Should we use per_expert weight quantizer?
Signed-off-by: weimingc <17592131+meenchen@users.noreply.github.com>
Signed-off-by: weimingc <17592131+meenchen@users.noreply.github.com>
| # Cast input to match expert weight dtype before linear operation, | ||
| # then cast output to float32 to match original MoELinear forward behavior. | ||
| expert = self.experts[expert_id] | ||
| x = x.to(expert.weight.dtype) |
There was a problem hiding this comment.
which one has higher precision? x or weight?
There was a problem hiding this comment.
I think we need to cast weight up instead here. Might lose some accuracy here
Signed-off-by: weimingc <17592131+meenchen@users.noreply.github.com>
What does this PR do?
Type of change: ?
Usage
# Add a code snippet demonstrating how to use thisTesting
Before your PR is "Ready for review"
Make sure you read and follow Contributor guidelines and your commits are signed (
git commit -s -S).Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded
trust_remote_code=True,torch.load(..., weights_only=False),pickle, etc.).CONTRIBUTING.md: ✅ / ❌ / N/AAdditional Information