Fix QNN AOT runtime and lowering issues#674
Conversation
📝 WalkthroughWalkthroughThis PR strengthens the QNN backend's robustness across allocation, execution, tensor tracking, and operator lowering. It adds comprehensive error diagnostics to memory operations, refines tensor binding semantics, introduces symbol-aware tensor naming for identity consistency, makes quantization offset configuration dynamic, validates AOT graph operations, and improves operator pattern implementations while fixing correctness issues in VisionRoPE initialization and Android stack traces. ChangesQNN Backend Robustness
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (2)
mllm/backends/qnn/aot/QnnWrappersAPI.cpp (1)
372-374: ⚡ Quick winInclude
add_node_statusin the fatal error message.The failure path exits correctly, but dropping the status code makes root-cause triage harder.
🔧 Suggested change
- MLLM_ERROR_EXIT(ExitCode::kCoreError, "QNN AOT failed to add node {} (op type {}) to graph.", qnn_op->name_, - qnn_op->op_name_); + MLLM_ERROR_EXIT(ExitCode::kCoreError, + "QNN AOT failed to add node {} (op type {}) to graph. status={}", + qnn_op->name_, qnn_op->op_name_, static_cast<int>(add_node_status));As per coding guidelines, "Add appropriate logging (e.g., debug, info, warning, error) for significant events and errors, avoiding sensitive data exposure."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@mllm/backends/qnn/aot/QnnWrappersAPI.cpp` around lines 372 - 374, Update the fatal error message to include the numeric status value so failures are triageable: when add_node_status != mllm::qnn::MODEL_NO_ERROR, modify the MLLM_ERROR_EXIT call in QnnWrappersAPI.cpp to append the add_node_status (or its stringified form) to the log text for the node (qnn_op->name_) and op type (qnn_op->op_name_), ensuring the status is included in the formatted message passed to MLLM_ERROR_EXIT.mllm/backends/qnn/QNNBackend.cpp (1)
679-682: ⚡ Quick winAdd a regression test for repeated execute-time rebinding.
This behavior change is important enough to pin with a test that runs the same retrieved graph multiple times with different runtime buffers and verifies writes land in the latest bound outputs.
As per coding guidelines, "Suggest adding unit tests for untested complex logic or edge cases."
Also applies to: 699-702
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@mllm/backends/qnn/QNNBackend.cpp` around lines 679 - 682, Add a regression unit test that exercises repeated execute-time rebinding of retrieved AOT graphs: create a test that retrieves a graph via the QNNBackend retrieval path, then calls its execute/executeRetrievedGraph equivalent multiple times while repeatedly calling wrapper->__setDataContainer(runtime_input) with different runtime buffers, and after each execute verify that writes appear in the most recently bound output buffer; put the test near other backend tests for QNN (targeting the same behavior referenced around wrapper->__setDataContainer) to ensure future changes to rebind-on-every-execution are caught.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@mllm/backends/qnn/aot/QnnWrappersAPI.cpp`:
- Around line 372-374: Update the fatal error message to include the numeric
status value so failures are triageable: when add_node_status !=
mllm::qnn::MODEL_NO_ERROR, modify the MLLM_ERROR_EXIT call in QnnWrappersAPI.cpp
to append the add_node_status (or its stringified form) to the log text for the
node (qnn_op->name_) and op type (qnn_op->op_name_), ensuring the status is
included in the formatted message passed to MLLM_ERROR_EXIT.
In `@mllm/backends/qnn/QNNBackend.cpp`:
- Around line 679-682: Add a regression unit test that exercises repeated
execute-time rebinding of retrieved AOT graphs: create a test that retrieves a
graph via the QNNBackend retrieval path, then calls its
execute/executeRetrievedGraph equivalent multiple times while repeatedly calling
wrapper->__setDataContainer(runtime_input) with different runtime buffers, and
after each execute verify that writes appear in the most recently bound output
buffer; put the test near other backend tests for QNN (targeting the same
behavior referenced around wrapper->__setDataContainer) to ensure future changes
to rebind-on-every-execution are caught.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 37076747-8e6d-40bb-8e3c-c4faeef33330
📒 Files selected for processing (10)
mllm/backends/qnn/QNNAllocator.cppmllm/backends/qnn/QNNBackend.cppmllm/backends/qnn/QNNModel.cppmllm/backends/qnn/QNNUtils.hppmllm/backends/qnn/aot/QnnWrappersAPI.cppmllm/backends/qnn/aot/visitor/Matmul.cppmllm/backends/qnn/aot/visitor/Repeat.cppmllm/backends/qnn/aot/visitor/View.cppmllm/core/aops/VisionRoPEOp.cppmllm/mllm.hpp
Context / Follow-up PlanThis PR is the first step toward adding full QNN AOT inference support for Qwen2-VL 2B. The full work is intentionally split into smaller PRs to keep review scope manageable:
These fixes are kept model-agnostic because they also improve QNN AOT correctness for other models. In local testing, these changes are part of a Qwen2-VL 2B full-QNN path where ViT, LLM prefill, and decode all run through QNN AOT on SM8650. |
Please check Guidelines for Contributing.
Summary
This PR fixes several QNN AOT runtime and lowering issues:
MLLM_QNN_DUMP_IO.kSiLUtokVisionRoPE.Validation
git diff --cached --checkcmake --build build-qnn-aot --target mllm-qwen2vl-aot-c -j2Summary by CodeRabbit
Bug Fixes
New Features
Improvements