Skip to content

Fix QNN AOT runtime and lowering issues#674

Open
twlddd wants to merge 1 commit into
UbiquitousLearning:mainfrom
twlddd:qnn-aot-backend-fixes-pr1
Open

Fix QNN AOT runtime and lowering issues#674
twlddd wants to merge 1 commit into
UbiquitousLearning:mainfrom
twlddd:qnn-aot-backend-fixes-pr1

Conversation

@twlddd
Copy link
Copy Markdown

@twlddd twlddd commented May 18, 2026

Please check Guidelines for Contributing.

Summary

This PR fixes several QNN AOT runtime and lowering issues:

  • Rebind AOT graph input/output tensors on each graph execution.
  • Add output count validation and clearer QNN memory registration diagnostics.
  • Add optional QNN graph IO dump via MLLM_QNN_DUMP_IO.
  • Fix QNN AOT Repeat lowering to match mllm repeat_interleave semantics.
  • Handle no-op View lowering and report shape-changing alias views.
  • Preserve MatMul transpose options when lowering to QNN.
  • Support UInt16 symmetric quantization offset in QNN tensor wrappers.
  • Fix VisionRoPEOp registration from kSiLU to kVisionRoPE.
  • Guard stack trace writes against invalid snprintf lengths.

Validation

  • git diff --cached --check
  • cmake --build build-qnn-aot --target mllm-qwen2vl-aot-c -j2

Summary by CodeRabbit

  • Bug Fixes

    • Enhanced error diagnostics for tensor operations with detailed logging.
    • Added output tensor count validation for improved robustness.
    • Fixed operation initialization and Android stack trace printing.
    • Improved error handling for graph operations.
  • New Features

    • Added optional debug logging for tensor metadata via environment variable.
  • Improvements

    • Optimized tensor rebinding behavior during execution.
    • Enhanced operation implementations for better accuracy.

Review Change Stack

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 18, 2026

📝 Walkthrough

Walkthrough

This PR strengthens the QNN backend's robustness across allocation, execution, tensor tracking, and operator lowering. It adds comprehensive error diagnostics to memory operations, refines tensor binding semantics, introduces symbol-aware tensor naming for identity consistency, makes quantization offset configuration dynamic, validates AOT graph operations, and improves operator pattern implementations while fixing correctness issues in VisionRoPE initialization and Android stack traces.

Changes

QNN Backend Robustness

Layer / File(s) Summary
Enhanced error diagnostics in memory allocation
mllm/backends/qnn/QNNAllocator.cpp
Headers for I/O and string formatting are added. Ownership checks, memory FD validation, and memRegister result handling replace bare assertions with detailed stderr output and structured logging including tensor name, pointer, dtype, rank, and computed dimensions.
Improved QNN execution and wrapper binding
mllm/backends/qnn/QNNBackend.cpp, mllm/backends/qnn/QNNUtils.hpp
Output tensor count validation is added to mirror input checks. Runtime tensors are unconditionally rebound to input and output wrappers on each execution cycle. QNNTensorWrapper's __setDataContainer removes pre-assignment nil assertion and unconditionally assigns the container based on tensor nil state.
Tensor metadata logging and symbol-aware naming
mllm/backends/qnn/QNNModel.cpp, mllm/backends/qnn/aot/QnnWrappersAPI.cpp
QNNModel conditionally logs graph input/output tensor metadata (name, dtype, dims) when MLLM_QNN_DUMP_IO environment variable is set. QnnWrappersAPI introduces qnnTensorNameFromIR helper to prefer IR symbol attributes over tensor name(), enabling consistent tensor identity and caching behavior across AOT compilation.
Dynamic quantization offset selection
mllm/backends/qnn/aot/QnnWrappersAPI.cpp
SymPerTensor quantization path changes from hardcoded UInt8 assertion and fixed -128 offset to dynamic offset selection: UInt8/UInt16 types map to negative offsets; Int8/Int16 map to zero. Unsupported types trigger error exit with descriptive message.
AOT graph operation error validation
mllm/backends/qnn/aot/QnnWrappersAPI.cpp
addNode result is now validated against MODEL_NO_ERROR; on failure triggers MLLM_ERROR_EXIT with node and operation identifiers instead of silently ignoring errors.
Operator pattern lowering improvements
mllm/backends/qnn/aot/visitor/Matmul.cpp, mllm/backends/qnn/aot/visitor/Repeat.cpp, mllm/backends/qnn/aot/visitor/View.cpp
Matmul casts AOp to mllm::aops::MatMulOp and populates transpose_in0/transpose_in1 parameters. Repeat replaces single-pass Tile with reshape→tile→reshape sequence for correct repeat_interleave semantics. View skips Reshape op when input/output tensor names match and shapes are identical, logging errors on shape mismatches with shared names.
Miscellaneous correctness fixes
mllm/core/aops/VisionRoPEOp.cpp, mllm/mllm.hpp
VisionRoPEOp now correctly initializes with OpTypes::kVisionRoPE instead of kSiLU. Android stack trace printing clamps snprintf return value to prevent negative or oversized lengths in buffer writes.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • UbiquitousLearning/mllm#578: Modifies QNNAllocator.cpp registration path at the same code location with complementary error handling refactoring.
  • UbiquitousLearning/mllm#596: Changes QNN AOT quantization logic in QnnWrappersAPI.cpp for blockwise scale bitwidth alongside this PR's per-tensor offset configuration.
  • UbiquitousLearning/mllm#606: Reworks input/output wrapper binding and allocation flow in QNNBackend::graphExecute with overlapping execution path changes.

Suggested reviewers

  • liang1232018
  • chenghuaWang
  • oreomaker

Poem

🐰 QNN's strength now shines with checks so clear,
Symbol-named tensors hold their names so dear,
Operators rewrite with patterns refined,
The stack traces safe, no bounds left behind!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 4.55% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main focus of the changeset—fixing multiple QNN AOT runtime and lowering issues across several files.
Description check ✅ Passed The description provides a comprehensive summary of the PR objectives with detailed bullet points, validation steps, and references to contributing guidelines, exceeding the template requirements.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
mllm/backends/qnn/aot/QnnWrappersAPI.cpp (1)

372-374: ⚡ Quick win

Include add_node_status in the fatal error message.

The failure path exits correctly, but dropping the status code makes root-cause triage harder.

🔧 Suggested change
-    MLLM_ERROR_EXIT(ExitCode::kCoreError, "QNN AOT failed to add node {} (op type {}) to graph.", qnn_op->name_,
-                    qnn_op->op_name_);
+    MLLM_ERROR_EXIT(ExitCode::kCoreError,
+                    "QNN AOT failed to add node {} (op type {}) to graph. status={}",
+                    qnn_op->name_, qnn_op->op_name_, static_cast<int>(add_node_status));

As per coding guidelines, "Add appropriate logging (e.g., debug, info, warning, error) for significant events and errors, avoiding sensitive data exposure."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@mllm/backends/qnn/aot/QnnWrappersAPI.cpp` around lines 372 - 374, Update the
fatal error message to include the numeric status value so failures are
triageable: when add_node_status != mllm::qnn::MODEL_NO_ERROR, modify the
MLLM_ERROR_EXIT call in QnnWrappersAPI.cpp to append the add_node_status (or its
stringified form) to the log text for the node (qnn_op->name_) and op type
(qnn_op->op_name_), ensuring the status is included in the formatted message
passed to MLLM_ERROR_EXIT.
mllm/backends/qnn/QNNBackend.cpp (1)

679-682: ⚡ Quick win

Add a regression test for repeated execute-time rebinding.

This behavior change is important enough to pin with a test that runs the same retrieved graph multiple times with different runtime buffers and verifies writes land in the latest bound outputs.

As per coding guidelines, "Suggest adding unit tests for untested complex logic or edge cases."

Also applies to: 699-702

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@mllm/backends/qnn/QNNBackend.cpp` around lines 679 - 682, Add a regression
unit test that exercises repeated execute-time rebinding of retrieved AOT
graphs: create a test that retrieves a graph via the QNNBackend retrieval path,
then calls its execute/executeRetrievedGraph equivalent multiple times while
repeatedly calling wrapper->__setDataContainer(runtime_input) with different
runtime buffers, and after each execute verify that writes appear in the most
recently bound output buffer; put the test near other backend tests for QNN
(targeting the same behavior referenced around wrapper->__setDataContainer) to
ensure future changes to rebind-on-every-execution are caught.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@mllm/backends/qnn/aot/QnnWrappersAPI.cpp`:
- Around line 372-374: Update the fatal error message to include the numeric
status value so failures are triageable: when add_node_status !=
mllm::qnn::MODEL_NO_ERROR, modify the MLLM_ERROR_EXIT call in QnnWrappersAPI.cpp
to append the add_node_status (or its stringified form) to the log text for the
node (qnn_op->name_) and op type (qnn_op->op_name_), ensuring the status is
included in the formatted message passed to MLLM_ERROR_EXIT.

In `@mllm/backends/qnn/QNNBackend.cpp`:
- Around line 679-682: Add a regression unit test that exercises repeated
execute-time rebinding of retrieved AOT graphs: create a test that retrieves a
graph via the QNNBackend retrieval path, then calls its
execute/executeRetrievedGraph equivalent multiple times while repeatedly calling
wrapper->__setDataContainer(runtime_input) with different runtime buffers, and
after each execute verify that writes appear in the most recently bound output
buffer; put the test near other backend tests for QNN (targeting the same
behavior referenced around wrapper->__setDataContainer) to ensure future changes
to rebind-on-every-execution are caught.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 37076747-8e6d-40bb-8e3c-c4faeef33330

📥 Commits

Reviewing files that changed from the base of the PR and between 729ca4c and bf90dcd.

📒 Files selected for processing (10)
  • mllm/backends/qnn/QNNAllocator.cpp
  • mllm/backends/qnn/QNNBackend.cpp
  • mllm/backends/qnn/QNNModel.cpp
  • mllm/backends/qnn/QNNUtils.hpp
  • mllm/backends/qnn/aot/QnnWrappersAPI.cpp
  • mllm/backends/qnn/aot/visitor/Matmul.cpp
  • mllm/backends/qnn/aot/visitor/Repeat.cpp
  • mllm/backends/qnn/aot/visitor/View.cpp
  • mllm/core/aops/VisionRoPEOp.cpp
  • mllm/mllm.hpp

@twlddd
Copy link
Copy Markdown
Author

twlddd commented May 18, 2026

Context / Follow-up Plan

This PR is the first step toward adding full QNN AOT inference support for Qwen2-VL 2B.

The full work is intentionally split into smaller PRs to keep review scope manageable:

  1. This PR: shared QNN AOT runtime/lowering correctness fixes.
  2. Follow-up PR: generic QNN AOT support for visual graph lowering, including simple graph lowering and LayerNorm/GELU visitors.
  3. Follow-up PR: Qwen2-VL 2B full-QNN example, including visual bucket graphs, masked visual padding, interactive runner, and deployment docs.

These fixes are kept model-agnostic because they also improve QNN AOT correctness for other models.

In local testing, these changes are part of a Qwen2-VL 2B full-QNN path where ViT, LLM prefill, and decode all run through QNN AOT on SM8650.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant