@@ -103,3 +103,120 @@ quantized_graph_module = calibrate_and_quantize(
103103```
104104
105105See [ PyTorch 2 Export Post Training Quantization] ( https://docs.pytorch.org/ao/main/tutorials_source/pt2e_quant_ptq.html ) for more information.
106+
107+ ### Quantization Aware Training
108+
109+ The NeutronQuantizer supports two modes of quantization: * Post‑Training Quantization (PTQ)* and * Quantization Aware Training (QAT)* .
110+ PTQ uses a calibration phase to tune quantization parameters on an already‑trained model in order to obtain a model with integer weights.
111+ While this optimization reduces model size, it introduces quantization noise and can degrade the model's performance.
112+ Compared to PTQ, QAT enables the model to adapt its weights to the introduced quantization noise.
113+ In QAT, instead of calibration we run training to optimize both quantization parameters and model weights at the same time.
114+
115+ See the [ Quantization Aware Training blog post] ( https://pytorch.org/blog/quantization-aware-training/ ) for an introduction to the QAT method.
116+
117+ To use QAT with the Neutron backend, toggle the ` is_qat ` parameter:
118+
119+ ``` python
120+ from executorch.backends.nxp.quantizer.neutron_quantizer import (
121+ NeutronQuantizer,
122+ NeutronTargetSpec,
123+ )
124+
125+ target_spec = NeutronTargetSpec(target = " imxrt700" )
126+ neutron_quantizer = NeutronQuantizer(neutron_target_spec = target_spec, is_qat = True )
127+ ```
128+
129+ The rest of the quantization pipeline works similarly to the PTQ workflow.
130+ The most significant change is that the calibration step is replaced by training.
131+
132+ <div class =" admonition tip " >
133+ Note: QAT uses <code >prepare_qat_pt2e</code > prepare function instead of <code >prepare_pt2e</code >.
134+ </div >
135+
136+ ``` python
137+ import torch
138+ from torch.utils.data import DataLoader
139+ import torchvision.models as models
140+ import torchvision.datasets as datasets
141+ from torchvision.models.mobilenetv2 import MobileNet_V2_Weights
142+ from executorch.backends.nxp.quantizer.neutron_quantizer import NeutronQuantizer
143+ from executorch.backends.nxp.backend.neutron_target_spec import NeutronTargetSpec
144+ from torchao.quantization.pt2e.quantize_pt2e import convert_pt2e, prepare_qat_pt2e
145+ from torchao.quantization.pt2e import (
146+ move_exported_model_to_eval,
147+ move_exported_model_to_train,
148+ disable_observer,
149+ )
150+
151+ model = models.mobilenetv2.mobilenet_v2(weights = MobileNet_V2_Weights.DEFAULT ).eval()
152+
153+ neutron_target_spec = NeutronTargetSpec(target = " imxrt700" )
154+ quantizer = NeutronQuantizer(neutron_target_spec, is_qat = True ) # (1)
155+
156+ sample_inputs = (torch.randn(1 , 3 , 224 , 224 ),)
157+ training_ep = torch.export.export(model, sample_inputs).module() # (2)
158+
159+ # # Steps different from PTQ (3–6)
160+ prepared_model = prepare_qat_pt2e(training_ep, quantizer) # (3) !!! Different prepare function
161+ prepared_model = move_exported_model_to_train(prepared_model) # (4)
162+
163+ # ---------------- Training phase (5) ----------------
164+ criterion = torch.nn.CrossEntropyLoss()
165+ optimizer = torch.optim.SGD(prepared_model.parameters(), lr = 1e-2 , momentum = 0.9 )
166+
167+ train_data = datasets.ImageNet(" ./" , split = " train" , transform = ... )
168+ train_loader = DataLoader(train_data, batch_size = 5 )
169+
170+ # Training replaces calibration in QAT
171+ for epoch in range (num_epochs):
172+ for imgs, labels in train_loader:
173+ optimizer.zero_grad()
174+ outputs = prepared_model(imgs)
175+ loss = criterion(outputs, labels)
176+ loss.backward()
177+ optimizer.step()
178+
179+ # It is recommended to disable quantization params
180+ # updates after few epochs of training.
181+ if epoch >= num_epochs / 3 :
182+ model.apply(disable_observer)
183+ # --------------- End of training phase ---------------
184+
185+ prepared_model = move_exported_model_to_eval(prepared_model) # (6)
186+ quantized_model = convert_pt2e(prepared_model) # (7)
187+
188+ # Optional step - fixes biasless convolution (see Known Limitations of QAT)
189+ quantized_model = QuantizeFusedConvBnBiasAtenPass(
190+ default_zero_bias = True
191+ )(quantized_model).graph_module
192+
193+ ...
194+ ```
195+
196+ Moving from PTQ to QAT check-list:
197+ - Set ` is_qat=True ` in ` NeutronQuantizer `
198+ - Use ` prepare_qat_pt2e ` instead of ` prepare_pt2e `
199+ - Call ` move_exported_model_to_train() ` before training
200+ - Train the model instead of calibrating
201+ - Call ` move_exported_model_to_eval() ` after training
202+
203+ #### Known limitations of QAT
204+
205+ In the current ExecuTorch/TorchAO implementation, there is an issue when quantizing biasless convolutions during QAT.
206+ The pipeline produces a non‑quantized empty bias, which causes the Neutron Converter to fail.
207+ To mitigate this issue, use the ` QuantizeFusedConvBnBiasAtenPass ` post‑quantization:
208+
209+ ``` python
210+ ...
211+
212+ # training
213+
214+ prepared_model = move_exported_model_to_eval(prepared_model) # (6)
215+ quantized_model = convert_pt2e(prepared_model) # (7)
216+
217+ quantized_model = QuantizeFusedConvBnBiasAtenPass(
218+ default_zero_bias = True
219+ )(quantized_model).graph_module
220+
221+ ...
222+ ```
0 commit comments