You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Qualcomm AI Engine Direct - Debugger Convergence Phase 2: Migrating to official numeric discrepancy evaluator (pytorch#18834)
### Summary
Debugger Convergence Stage 2.
Stage 1 (Merged): pytorch#17804
Stage 2: This PR
Stage 3: Adding SKILL.md for debugger
Changes made on QNN backend
- Removed comparator logic and reuse dev tools `NumericalComparatorBase`
- Using ETRecrod to retrieve `edge_after_transform/forward` reference
graph
- Reuse the online `edge_after_transform/forward` graph instead of the
one that goes through serialize and deserialize since serialize will not
save `quant attributes`. Reference:
https://github.com/pytorch/executorch/blob/d31d4be15c176045ce3bae2c76a50c891fa5973a/exir/serde/serialize.py#L141
- Changing UT expected number of events as multi-output node is not
supported in dev tools.
- Verified that the IO's order of the graph is working properly.
Changes made on dev tools
-
https://github.com/pytorch/executorch/blob/411ede26bd8abfe723ec34e5a6e729f8c60cfee2/devtools/inspector/_inspector.py#L1150
is changed because it is hardcoded to use `edge before passes` graph for
now. Added a param and make sure it is still backward compatible.
- Added debug_handle to the `pandas dataframe` since it is helpful for
users to map `dataframe` back to the original graph.
### Test plan
Passing E2E test:
- `python backends/qualcomm/tests/test_qnn_delegate.py
TestUtilsScript.test_intermediate_debugger --device ${DEVICE} --model
SM8750 --build_folder build-android --executorch_root . --artifact_dir
./test_debugger --image_dataset ../datasets/imagenet-mini/val/`
Passing the following UT:
- `python backends/qualcomm/tests/test_qnn_delegate.py -k
TestQNNQuantizedUtils.test_qnn_backend_dump_intermediate_outputs_simple_model
--model SM8750 --device ${DEVICE} --build_folder build-android`
- `python backends/qualcomm/tests/test_qnn_delegate.py -k
TestQNNQuantizedUtils.test_qnn_backend_dump_intermediate_outputs_topk
--model SM8750 --device ${DEVICE} --build_folder build-android`
- `python backends/qualcomm/tests/test_qnn_delegate.py -k
TestQNNFloatingPointUtils.test_qnn_backend_dump_intermediate_outputs_topk
--model SM8750 --device ${DEVICE}--build_folder build-android`
- `python backends/qualcomm/tests/test_qnn_delegate.py -k
TestQNNFloatingPointUtils.test_qnn_backend_dump_intermediate_outputs_simple_model
--model SM8750 --device ${DEVICE} --build_folder build-android`
- Under `devtools/inspector/_inspector_utils.py`, skip delegate call
event since it holds all `debug_handles` and will mess up the op event
`debug handle`.
1. Follow the [tutorial](https://pytorch.org/executorch/main/getting-started-setup) to set up ExecuTorch.
128
126
2. Follow the [tutorial](https://pytorch.org/executorch/stable/build-run-qualcomm-ai-engine-direct-backend.html) to build Qualcomm AI Engine Direct Backend.
129
127
130
-
### 2. Enable Flag
128
+
##Instructions
131
129
132
-
When executing the script, please add the flag `--dump_intermediate_outputs`. This tells QNN to dump all intermediate tensors during execution.
130
+
### 1. Initialize debugger and build binary
131
+
132
+
Create a `QNNIntermediateDebugger` with a sample input and pass it to `build_executorch_binary`. The `--dump_intermediate_outputs` flag tells QNN to dump all intermediate tensors during execution.
133
133
134
-
### 3. Add debugger to the example script
135
-
Initialize a `QNNIntermediateDebugger`. Please pass initialized `QNNIntermediateDebugger` and the `args.dump_intermediate_outputs` to `build_executorch_binary` method as well.
136
-
#### Example:
137
134
```python
138
135
from executorch.backends.qualcomm.export_utils import build_executorch_binary
139
-
from executorch.backends.qualcomm.debugger.qnn_intermediate_debugger import QNNIntermediateDebugger
136
+
from executorch.backends.qualcomm.debugger.qnn_intermediate_debugger import (
It is perfectly fine for users to pass the desired amount of datasets to `build_executorch_binary`, which helps achieve better quantization results. However, after `build_executorch_binary` is called, we need to ensure that we only perform one inference during execution. Please ensure that CPU and QNN is using the same input during execution; otherwise, the debugging results might not be accurate.
151
+
After `build_executorch_binary()`, the debugger holds:
152
+
-`edge_ep` — edge `ExportedProgram` for CPU golden inference.
153
+
-`etrecord_file_path` — path to the generated ET record.
154
+
155
+
### 2. Execute on device
156
+
157
+
Ensure `dump_intermediate_outputs` is enabled in your `QnnConfig` (or pass `--dump_intermediate_outputs` via CLI). Only run **one inference** for debugging — multiple executions are not supported.
153
158
154
-
### 5: Pull and process the results.
155
-
After QNN execution with the runner, if the previous steps are done correctly, we should be able to get two files: `etdump.etdp` and `debug_output.bin`.
156
-
The following example pulls the files back and calls a callback function to process the results. In this callback function, we create the `Inspector`. Then we perform CPU inference to get CPU intermediate results. Now, we have both QNN and CPU intermediate results, we can start generating results to compare the accuracy. Taking the following example, we should be able to get `debug_graph.svg` as an output in the current directory.
157
-
#### Example:
158
159
```python
159
-
from executorch.backends.qualcomm.debugger.qnn_intermediate_debugger import OutputFormat
160
+
from executorch.examples.qualcomm.utils import SimpleADB
After execution, pull `etdump.etdp` and `debug_output.bin` from the device. Use `setup_inspector()` to create the `Inspector`, then create comparators and generate results.
174
+
175
+
Before comparing per-layer outputs, it is highly recommended to verify that the edge program's final output aligns with the original `nn.Module`. The debugger uses the edge program as the CPU golden reference, so if the edge graph itself has diverged (e.g., due to weights quantization or pass transformations), per-layer comparisons against it may be misleading.
176
+
177
+
```python
178
+
from executorch.backends.qualcomm.debugger.qcom_numerical_comparator_sample import (
The above example sets output formats as SVG and evaluation metrics using Cosine Similarity. Based on different needs, users can choose other output formats as shown in the `OutputFormat` class under [qnn_intermediate_debugger](./qnn_intermediate_debugger.py)
213
+
## Comparators
214
+
215
+
Create comparators via the `create_comparator()` factory, which automatically injects the `edge_ep`. A couple sample comparators are provided under [qcom_numerical_comparator_sample.py](./qcom_numerical_comparator_sample.py):
216
+
181
217
```python
182
-
classOutputFormat(IntEnum):
183
-
SVG_GRAPHS=0
184
-
CSV_FILES=1
185
-
DUMP_RAW=2
218
+
cos = qnn_intermediate_debugger.create_comparator(QcomCosineSimilarityComparator, threshold=0.9)
For evaluation metrics, if users would like to implement their own metrics, we have provided the option to implement [MetricEvaluatorBase](./metrics_evaluator.py). The following shows how to define custom metrics.
222
+
### Custom comparators
223
+
224
+
Users can also define their own comparator by implementing a derived class from [QcomNumericalComparatorBase](./qcom_numerical_comparator_base.py). Inside the derived class, users will need to implement `metric_name()`, `is_valid_score()`, and `element_compare()`. The base class handles QNN-specific preprocessing (dequantization, layout conversion) internally — `preprocessing` cannot be overridden.
We have provided an inception_v3 demo script to help users better understand how to apply the debugger to their scripts. Please refer to [qnn_intermediate_debugger_demo.py](../../../examples/qualcomm/util_scripts/qnn_intermediate_debugger_demo.py) for the example script.
An Inception_V3 demo script is provided at [qnn_intermediate_debugger_demo.py](../../../examples/qualcomm/util_scripts/qnn_intermediate_debugger_demo.py).
216
256
217
-
Before running the example script, please ensure that dataset is downloaded. Example dataset can be retrieved [here](https://www.kaggle.com/datasets/ifigotin/imagenetmini-1000).
257
+
Before running, ensure the dataset is downloaded. An example dataset can be retrieved [here](https://www.kaggle.com/datasets/ifigotin/imagenetmini-1000).
1. The current debugger only supports performing one execution. Multiple executions may cause unknown behavior and are not recommended.
226
-
2. Please ignore this if you are using `qnn_executor_runner`. If you have decided to write your own runner, please follow the [tutorial](https://pytorch.org/executorch/stable/etdump.html) on how to implement etdump into your own runner.
227
-
3. The current debugger does not support graph with partitions. (WIP)
228
-
4. The current debugger does not support LLM models. (WIP)
263
+
## Limitations
264
+
1. Only one execution per debug session — multiple executions may cause unknown behavior.
265
+
2. If you have decided to write your own runner (instead of `qnn_executor_runner`), follow the [tutorial](https://pytorch.org/executorch/stable/etdump.html) on how to implement etdump.
266
+
3. Does not support graphs with partitions (partial delegation).
0 commit comments