Skip to content

Commit a04917e

Browse files
committed
NXP backend: Add QAT documentation
1 parent faa5903 commit a04917e

2 files changed

Lines changed: 119 additions & 0 deletions

File tree

docs/source/backends/nxp/nxp-quantization.md

Lines changed: 117 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -103,3 +103,120 @@ quantized_graph_module = calibrate_and_quantize(
103103
```
104104

105105
See [PyTorch 2 Export Post Training Quantization](https://docs.pytorch.org/ao/main/tutorials_source/pt2e_quant_ptq.html) for more information.
106+
107+
### Quantization Aware Training
108+
109+
The NeutronQuantizer supports two modes of quantization: *Post‑Training Quantization (PTQ)* and *Quantization Aware Training (QAT)*.
110+
PTQ uses a calibration phase to tune quantization parameters on an already‑trained model in order to obtain a model with integer weights.
111+
While this optimization reduces model size, it introduces quantization noise and can degrade the model's performance.
112+
Compared to PTQ, QAT enables the model to adapt its weights to the introduced quantization noise.
113+
In QAT, instead of calibration we run training to optimize both quantization parameters and model weights at the same time.
114+
115+
See the [Quantization Aware Training blog post](https://pytorch.org/blog/quantization-aware-training/) for an introduction to the QAT method.
116+
117+
To use QAT with the Neutron backend, toggle the `is_qat` parameter:
118+
119+
```python
120+
from executorch.backends.nxp.quantizer.neutron_quantizer import (
121+
NeutronQuantizer,
122+
NeutronTargetSpec,
123+
)
124+
125+
target_spec = NeutronTargetSpec(target="imxrt700")
126+
neutron_quantizer = NeutronQuantizer(neutron_target_spec=target_spec, is_qat=True)
127+
```
128+
129+
The rest of the quantization pipeline works similarly to the PTQ workflow.
130+
The most significant change is that the calibration step is replaced by training.
131+
132+
<div class="admonition tip">
133+
Note: QAT uses <code>prepare_qat_pt2e</code> prepare function instead of <code>prepare_pt2e</code>.
134+
</div>
135+
136+
```python
137+
import torch
138+
from torch.utils.data import DataLoader
139+
import torchvision.models as models
140+
import torchvision.datasets as datasets
141+
from torchvision.models.mobilenetv2 import MobileNet_V2_Weights
142+
from executorch.backends.nxp.quantizer.neutron_quantizer import NeutronQuantizer
143+
from executorch.backends.nxp.backend.neutron_target_spec import NeutronTargetSpec
144+
from torchao.quantization.pt2e.quantize_pt2e import convert_pt2e, prepare_qat_pt2e
145+
from torchao.quantization.pt2e import (
146+
move_exported_model_to_eval,
147+
move_exported_model_to_train,
148+
disable_observer,
149+
)
150+
151+
model = models.mobilenetv2.mobilenet_v2(weights=MobileNet_V2_Weights.DEFAULT).eval()
152+
153+
neutron_target_spec = NeutronTargetSpec(target="imxrt700")
154+
quantizer = NeutronQuantizer(neutron_target_spec, is_qat=True) # (1)
155+
156+
sample_inputs = (torch.randn(1, 3, 224, 224),)
157+
training_ep = torch.export.export(model, sample_inputs).module() # (2)
158+
159+
## Steps different from PTQ (3–6)
160+
prepared_model = prepare_qat_pt2e(training_ep, quantizer) # (3) !!! Different prepare function
161+
prepared_model = move_exported_model_to_train(prepared_model) # (4)
162+
163+
# ---------------- Training phase (5) ----------------
164+
criterion = torch.nn.CrossEntropyLoss()
165+
optimizer = torch.optim.SGD(prepared_model.parameters(), lr=1e-2, momentum=0.9)
166+
167+
train_data = datasets.ImageNet("./", split="train", transform=...)
168+
train_loader = DataLoader(train_data, batch_size=5)
169+
170+
# Training replaces calibration in QAT
171+
for epoch in range(num_epochs):
172+
for imgs, labels in train_loader:
173+
optimizer.zero_grad()
174+
outputs = prepared_model(imgs)
175+
loss = criterion(outputs, labels)
176+
loss.backward()
177+
optimizer.step()
178+
179+
# It is recommended to disable quantization params
180+
# updates after few epochs of training.
181+
if epoch >= num_epochs / 3:
182+
model.apply(disable_observer)
183+
# --------------- End of training phase ---------------
184+
185+
prepared_model = move_exported_model_to_eval(prepared_model) # (6)
186+
quantized_model = convert_pt2e(prepared_model) # (7)
187+
188+
# Optional step - fixes biasless convolution (see Known Limitations of QAT)
189+
quantized_model = QuantizeFusedConvBnBiasAtenPass(
190+
default_zero_bias=True
191+
)(quantized_model).graph_module
192+
193+
...
194+
```
195+
196+
Moving from PTQ to QAT check-list:
197+
- Set `is_qat=True` in `NeutronQuantizer`
198+
- Use `prepare_qat_pt2e` instead of `prepare_pt2e`
199+
- Call `move_exported_model_to_train()` before training
200+
- Train the model instead of calibrating
201+
- Call `move_exported_model_to_eval()` after training
202+
203+
#### Known limitations of QAT
204+
205+
In the current ExecuTorch/TorchAO implementation, there is an issue when quantizing biasless convolutions during QAT.
206+
The pipeline produces a non‑quantized empty bias, which causes the Neutron Converter to fail.
207+
To mitigate this issue, use the `QuantizeFusedConvBnBiasAtenPass` post‑quantization:
208+
209+
```python
210+
...
211+
212+
# training
213+
214+
prepared_model = move_exported_model_to_eval(prepared_model) # (6)
215+
quantized_model = convert_pt2e(prepared_model) # (7)
216+
217+
quantized_model = QuantizeFusedConvBnBiasAtenPass(
218+
default_zero_bias=True
219+
)(quantized_model).graph_module
220+
221+
...
222+
```

docs/source/quantization-overview.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,12 +25,14 @@ These quantizers usually support configs that allow users to specify quantizatio
2525
* Precision (e.g., 8-bit or 4-bit)
2626
* Quantization type (e.g., dynamic, static, or weight-only quantization)
2727
* Granularity (e.g., per-tensor, per-channel)
28+
* Post-Training Quantization vs. Quantization Aware Training
2829

2930
Not all quantization options are supported by all backends. Consult backend-specific guides for supported quantization modes and configuration, and how to initialize the backend-specific PT2E quantizer:
3031

3132
* [XNNPACK quantization](backends/xnnpack/xnnpack-quantization.md)
3233
* [CoreML quantization](backends/coreml/coreml-quantization.md)
3334
* [QNN quantization](backends-qualcomm.md#step-2-optional-quantize-your-model)
35+
* [NXP quantization](backends/nxp/nxp-quantization.md)
3436

3537

3638

0 commit comments

Comments
 (0)