Skip to content

Fix activation scale inf issue for const_weight and const_scale#2448

Open
qgao007 wants to merge 2 commits intomasterfrom
qg/fix_const_weight_scale_misassignment
Open

Fix activation scale inf issue for const_weight and const_scale#2448
qgao007 wants to merge 2 commits intomasterfrom
qg/fix_const_weight_scale_misassignment

Conversation

@qgao007
Copy link
Copy Markdown
Contributor

@qgao007 qgao007 commented Apr 15, 2026

Type of Change

bug fix

Description

A lot of activation scale of inf when using one sample for calibration. For such layer, we should skip when it's marked _not_quantized, but it actually store value, due to constant weight initialization.

Expected Behavior & Potential Risk

resolved constant weight giberrish output issue. No foreseeable risk.

How has this PR been tested?

It has been tested using example_old.py file.

Dependency Change?

No dependency change.

Signed-off-by: Gao, Qun <qun.gao@intel.com>
@qgao007 qgao007 requested review from anko-intel and Copilot April 15, 2026 22:02
@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines:
Successfully started running 1 pipeline(s).
8 pipeline(s) were filtered out due to trigger conditions.

@qgao007 qgao007 requested a review from chensuyue April 15, 2026 22:02
@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines:
Successfully started running 1 pipeline(s).
8 pipeline(s) were filtered out due to trigger conditions.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes issues when static quantization calibration produces inf activation scales (e.g., too little calibration data) in combination with const_scale / const_weight, aiming to avoid corrupting layer state and producing incorrect outputs.

Changes:

  • Add early-return branches in post_quantization_cleanup() to skip cleanup/const-conversion when a layer is not quantized.
  • Attempt to remove input_observer from tracked sublayers when quantization is skipped.
  • Reset _const_variables when quantization is skipped.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +246 to +257
if not self._is_quantized:
# Clean up observer only if it exists
if hasattr(self, "input_observer"):
if hasattr(self, "_layers") and self.input_observer in self._layers:
self._layers.remove(self.input_observer)
# Set call to pass-through/original
if hasattr(self, "call"):
# pass through
pass
self._const_variables = []
self._tracker.lock()
return
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new if not self._is_quantized: ... return short-circuits post_quantization_cleanup() in the normal successful quantization path because _is_quantized is only set to True at the end of this method. This prevents switching self.call to call_symmetric/call_asymmetric and prevents converting const vars, so static quantization will effectively never activate (and deserialized models will still have call() pointing at input_observer, which may not exist). Consider removing this early-return and instead gating the skip-path on a separate flag set by convert() when calibration fails (e.g., _skip_quantization=True), where you also explicitly set call to a pass-through implementation and delete/clear input_observer consistently.

Copilot uses AI. Check for mistakes.
Comment on lines +462 to +471
if not self._is_quantized:
if hasattr(self, "input_observer"):
if hasattr(self, "_layers") and self.input_observer in self._layers:
self._layers.remove(self.input_observer)
# Set call to pass-through/original
if hasattr(self, "call"):
pass
self._const_variables = []
self._tracker.lock()
return
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue here: post_quantization_cleanup() now returns immediately whenever _is_quantized is False, but _is_quantized is only set to True at the end of this method (and convert() does not set it on success). This means the cleanup never switches self.call to call_fp8/call_int8 and will break both the static quantization workflow and prepare_deserialized_quantized_model() (which doesn't create input_observer, so leaving call() pointing at the observer path will raise at runtime). Instead of checking _is_quantized here, use a dedicated “skip quantization” flag set when calibration fails, and only take the early-return in that case after restoring the original call path.

Copilot uses AI. Check for mistakes.
Comment on lines +246 to +257
if not self._is_quantized:
# Clean up observer only if it exists
if hasattr(self, "input_observer"):
if hasattr(self, "_layers") and self.input_observer in self._layers:
self._layers.remove(self.input_observer)
# Set call to pass-through/original
if hasattr(self, "call"):
# pass through
pass
self._const_variables = []
self._tracker.lock()
return
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is meant to handle the “activation scale is inf (e.g., single-sample calibration) + const_scale/const_weight” scenario, but there is no regression test covering that failure mode. Since the repo already has JAX quantization pytest coverage (e.g., test/jax/test_save_load.py), please add a unit/integration test that calibrates with a 1-sample dataset that triggers inf scale and asserts the layer/model still runs correctly (no observer-dependent call path, no const var conversion when quantization is skipped).

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor

@anko-intel anko-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added code just disable const_weight and const_scale feature

pass
self._const_variables = []
self._tracker.lock()
return
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Early return just disable converting weight and scale to const (attributes) in line 264 and later

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants