Conversation
Signed-off-by: n1ck-guo <heng.guo@intel.com>
wenhuach21
reviewed
Jan 23, 2026
wenhuach21
reviewed
Jan 23, 2026
Contributor
There was a problem hiding this comment.
Pull request overview
This PR adds support for a new quantization scheme called W4A16_MIXED, which implements mixed-precision quantization for different model components. The scheme uses 4-bit weights for expert layers, 8-bit for regular layers, and 16-bit for attention layers in standard models or all non-expert layers in multimodal language models (MLLMs).
Changes:
- Added
W4A16_MIXEDscheme registration in the PRESET_SCHEMES dictionary - Implemented special handling logic for
W4A16_MIXEDin_handle_special_schemesfunction with layer-specific bit precision assignment - Updated output format support lists to include
W4A16_MIXEDscheme - Added comprehensive test coverage for both MoE models and MLLM models
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| auto_round/schemes.py | Added W4A16_MIXED to PRESET_SCHEMES and implemented special scheme handling with layer-specific quantization logic |
| auto_round/compressors/base.py | Refactored initialization order and updated _handle_special_schemes call to pass required parameters |
| auto_round/formats.py | Added W4A16_MIXED to supported schemes lists for AutoGPTQ and AutoRound output formats |
| test/test_cpu/schemes/test_scheme.py | Added two new test cases for W4A16_MIXED scheme covering both MoE and MLLM models |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
wenhuach21
reviewed
Jan 26, 2026
wenhuach21
approved these changes
Jan 26, 2026
lvliang-intel
pushed a commit
that referenced
this pull request
Feb 2, 2026
Signed-off-by: n1ck-guo <heng.guo@intel.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
add support for w4a16_mixed
Type of Change
Related Issues
Fixes #
Relates to #
Changes Made
Testing
Checklist
Additional Context