-
Notifications
You must be signed in to change notification settings - Fork 19.7k
Introduces QuantizationConfig for fine-grained quantization control #21896
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Introduces QuantizationConfig for fine-grained quantization control #21896
Conversation
Summary of ChangesHello @JyotinderSingh, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the quantization capabilities within Keras by introducing a flexible Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a QuantizationConfig to provide a more flexible and customizable quantization API. This is a significant improvement, allowing users to specify their own quantizers for weights and activations, and enabling features like weight-only quantization. The changes are well-implemented across various layers including Dense, EinsumDense, Embedding, and ReversibleEmbedding, as well as the model-level quantize method. The new QuantizationConfig class is well-designed with serialization support, and the accompanying tests are comprehensive. I have a couple of suggestions for minor code improvements to reduce redundancy and enhance clarity.
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #21896 +/- ##
==========================================
+ Coverage 76.30% 76.33% +0.03%
==========================================
Files 580 581 +1
Lines 60029 60186 +157
Branches 9432 9461 +29
==========================================
+ Hits 45803 45944 +141
- Misses 11750 11759 +9
- Partials 2476 2483 +7
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
2ae1e37 to
a3668d5
Compare
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces QuantizationConfig to provide a more structured and fine-grained control over quantization settings. The changes are well-implemented across various layers and the new configuration class is well-designed. I've found a couple of minor issues related to an unused parameter and an outdated docstring that should be addressed.
3a31239 to
6917701
Compare
Description
This PR introduces
keras.quantizers.QuantizationConfig, a configuration class that allows users to customize the quantization behavior for weights and activations during Post-Training Quantization (PTQ).Previously, calling
model.quantize(mode="int8")applied a fixed quantization strategy (typicallyAbsMaxQuantizerfor both weights and activations). With this change, users can instantiate a configuration object (e.g.Int8QuantizationConfig) to define specific quantizers for weights and activations, or disable activation quantization entirely for weight-only quantization.Key Changes
QuantizationConfigClasskeras.src.quantizers.quantization_config.py.Int8QuantizationConfig,Int4QuantizationConfig, andFloat8QuantizationConfig.weight_quantizerandactivation_quantizerinstances with serialization support (get_config/from_config).Model & Layer Updates
Model.quantize()to accept an optionalQuantizationConfigobject, allowing custom configurations beyond the default string modes.Dense,EinsumDense,Embedding, andReversibleEmbedding.quantized_buildandquantizemethods now accept and parse the providedconfig.activation_quantizer=None, enabling weight-only quantization.grad_fninDenseandEinsumDenseto correctly handle cases where inputs are not quantized (i.e., wheninputs_scaleis effectively 1.0).Quantizer Updates
AbsMaxQuantizer.__call__to acceptto_numpy=Trueto support direct weight manipulation during the quantization step.Usage Examples
Applies the default
AbsMaxQuantizerto both weights and activations.Disable activation quantization by setting the activation quantizer to
None.Customize the value range or other parameters for specific quantizers.
Tests
QuantizationConfigTestfor serialization/deserialization.DenseTest,EinsumDenseTest,EmbeddingTest,ReversibleEmbeddingTest) to verify behavior withQuantizationConfig, specifically testing weight-only scenarios.GPTQTestto ensure config validation logic remains correct.