Fix QNN AOT runtime and lowering issues by twlddd · Pull Request #674 · UbiquitousLearning/mllm

twlddd · 2026-05-18T06:37:38Z

Please check Guidelines for Contributing.

Summary

This PR fixes several QNN AOT runtime and lowering issues:

Rebind AOT graph input/output tensors on each graph execution.
Add output count validation and clearer QNN memory registration diagnostics.
Add optional QNN graph IO dump via MLLM_QNN_DUMP_IO.
Fix QNN AOT Repeat lowering to match mllm repeat_interleave semantics.
Handle no-op View lowering and report shape-changing alias views.
Preserve MatMul transpose options when lowering to QNN.
Support UInt16 symmetric quantization offset in QNN tensor wrappers.
Fix VisionRoPEOp registration from kSiLU to kVisionRoPE.
Guard stack trace writes against invalid snprintf lengths.

Validation

git diff --cached --check
cmake --build build-qnn-aot --target mllm-qwen2vl-aot-c -j2

Summary by CodeRabbit

Bug Fixes
- Enhanced error diagnostics for tensor operations with detailed logging.
- Added output tensor count validation for improved robustness.
- Fixed operation initialization and Android stack trace printing.
- Improved error handling for graph operations.
New Features
- Added optional debug logging for tensor metadata via environment variable.
Improvements
- Optimized tensor rebinding behavior during execution.
- Enhanced operation implementations for better accuracy.

coderabbitai · 2026-05-18T06:37:52Z

📝 Walkthrough

Walkthrough

This PR strengthens the QNN backend's robustness across allocation, execution, tensor tracking, and operator lowering. It adds comprehensive error diagnostics to memory operations, refines tensor binding semantics, introduces symbol-aware tensor naming for identity consistency, makes quantization offset configuration dynamic, validates AOT graph operations, and improves operator pattern implementations while fixing correctness issues in VisionRoPE initialization and Android stack traces.

Changes

QNN Backend Robustness

Layer / File(s)	Summary
Enhanced error diagnostics in memory allocation `mllm/backends/qnn/QNNAllocator.cpp`	Headers for I/O and string formatting are added. Ownership checks, memory FD validation, and memRegister result handling replace bare assertions with detailed stderr output and structured logging including tensor name, pointer, dtype, rank, and computed dimensions.
Improved QNN execution and wrapper binding `mllm/backends/qnn/QNNBackend.cpp`, `mllm/backends/qnn/QNNUtils.hpp`	Output tensor count validation is added to mirror input checks. Runtime tensors are unconditionally rebound to input and output wrappers on each execution cycle. QNNTensorWrapper's __setDataContainer removes pre-assignment nil assertion and unconditionally assigns the container based on tensor nil state.
Tensor metadata logging and symbol-aware naming `mllm/backends/qnn/QNNModel.cpp`, `mllm/backends/qnn/aot/QnnWrappersAPI.cpp`	QNNModel conditionally logs graph input/output tensor metadata (name, dtype, dims) when MLLM_QNN_DUMP_IO environment variable is set. QnnWrappersAPI introduces qnnTensorNameFromIR helper to prefer IR symbol attributes over tensor name(), enabling consistent tensor identity and caching behavior across AOT compilation.
Dynamic quantization offset selection `mllm/backends/qnn/aot/QnnWrappersAPI.cpp`	SymPerTensor quantization path changes from hardcoded UInt8 assertion and fixed -128 offset to dynamic offset selection: UInt8/UInt16 types map to negative offsets; Int8/Int16 map to zero. Unsupported types trigger error exit with descriptive message.
AOT graph operation error validation `mllm/backends/qnn/aot/QnnWrappersAPI.cpp`	addNode result is now validated against MODEL_NO_ERROR; on failure triggers MLLM_ERROR_EXIT with node and operation identifiers instead of silently ignoring errors.
Operator pattern lowering improvements `mllm/backends/qnn/aot/visitor/Matmul.cpp`, `mllm/backends/qnn/aot/visitor/Repeat.cpp`, `mllm/backends/qnn/aot/visitor/View.cpp`	Matmul casts AOp to mllm::aops::MatMulOp and populates transpose_in0/transpose_in1 parameters. Repeat replaces single-pass Tile with reshape→tile→reshape sequence for correct repeat_interleave semantics. View skips Reshape op when input/output tensor names match and shapes are identical, logging errors on shape mismatches with shared names.
Miscellaneous correctness fixes `mllm/core/aops/VisionRoPEOp.cpp`, `mllm/mllm.hpp`	VisionRoPEOp now correctly initializes with OpTypes::kVisionRoPE instead of kSiLU. Android stack trace printing clamps snprintf return value to prevent negative or oversized lengths in buffer writes.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

UbiquitousLearning/mllm#578: Modifies QNNAllocator.cpp registration path at the same code location with complementary error handling refactoring.
UbiquitousLearning/mllm#596: Changes QNN AOT quantization logic in QnnWrappersAPI.cpp for blockwise scale bitwidth alongside this PR's per-tensor offset configuration.
UbiquitousLearning/mllm#606: Reworks input/output wrapper binding and allocation flow in QNNBackend::graphExecute with overlapping execution path changes.

Suggested reviewers

liang1232018
chenghuaWang
oreomaker

Poem

🐰 QNN's strength now shines with checks so clear,
Symbol-named tensors hold their names so dear,
Operators rewrite with patterns refined,
The stack traces safe, no bounds left behind!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 4.55% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately summarizes the main focus of the changeset—fixing multiple QNN AOT runtime and lowering issues across several files.
Description check	✅ Passed	The description provides a comprehensive summary of the PR objectives with detailed bullet points, validation steps, and references to contributing guidelines, exceeding the template requirements.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (2)

mllm/backends/qnn/aot/QnnWrappersAPI.cpp (1)
372-374: ⚡ Quick win

Include add_node_status in the fatal error message.

The failure path exits correctly, but dropping the status code makes root-cause triage harder.
🔧 Suggested change
-    MLLM_ERROR_EXIT(ExitCode::kCoreError, "QNN AOT failed to add node {} (op type {}) to graph.", qnn_op->name_,
-                    qnn_op->op_name_);
+    MLLM_ERROR_EXIT(ExitCode::kCoreError,
+                    "QNN AOT failed to add node {} (op type {}) to graph. status={}",
+                    qnn_op->name_, qnn_op->op_name_, static_cast<int>(add_node_status));
As per coding guidelines, "Add appropriate logging (e.g., debug, info, warning, error) for significant events and errors, avoiding sensitive data exposure."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@mllm/backends/qnn/aot/QnnWrappersAPI.cpp` around lines 372 - 374, Update the
fatal error message to include the numeric status value so failures are
triageable: when add_node_status != mllm::qnn::MODEL_NO_ERROR, modify the
MLLM_ERROR_EXIT call in QnnWrappersAPI.cpp to append the add_node_status (or its
stringified form) to the log text for the node (qnn_op->name_) and op type
(qnn_op->op_name_), ensuring the status is included in the formatted message
passed to MLLM_ERROR_EXIT.
mllm/backends/qnn/QNNBackend.cpp (1)
679-682: ⚡ Quick win

Add a regression test for repeated execute-time rebinding.

This behavior change is important enough to pin with a test that runs the same retrieved graph multiple times with different runtime buffers and verifies writes land in the latest bound outputs.

As per coding guidelines, "Suggest adding unit tests for untested complex logic or edge cases."

Also applies to: 699-702
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@mllm/backends/qnn/QNNBackend.cpp` around lines 679 - 682, Add a regression
unit test that exercises repeated execute-time rebinding of retrieved AOT
graphs: create a test that retrieves a graph via the QNNBackend retrieval path,
then calls its execute/executeRetrievedGraph equivalent multiple times while
repeatedly calling wrapper->__setDataContainer(runtime_input) with different
runtime buffers, and after each execute verify that writes appear in the most
recently bound output buffer; put the test near other backend tests for QNN
(targeting the same behavior referenced around wrapper->__setDataContainer) to
ensure future changes to rebind-on-every-execution are caught.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@mllm/backends/qnn/aot/QnnWrappersAPI.cpp`:
- Around line 372-374: Update the fatal error message to include the numeric
status value so failures are triageable: when add_node_status !=
mllm::qnn::MODEL_NO_ERROR, modify the MLLM_ERROR_EXIT call in QnnWrappersAPI.cpp
to append the add_node_status (or its stringified form) to the log text for the
node (qnn_op->name_) and op type (qnn_op->op_name_), ensuring the status is
included in the formatted message passed to MLLM_ERROR_EXIT.

In `@mllm/backends/qnn/QNNBackend.cpp`:
- Around line 679-682: Add a regression unit test that exercises repeated
execute-time rebinding of retrieved AOT graphs: create a test that retrieves a
graph via the QNNBackend retrieval path, then calls its
execute/executeRetrievedGraph equivalent multiple times while repeatedly calling
wrapper->__setDataContainer(runtime_input) with different runtime buffers, and
after each execute verify that writes appear in the most recently bound output
buffer; put the test near other backend tests for QNN (targeting the same
behavior referenced around wrapper->__setDataContainer) to ensure future changes
to rebind-on-every-execution are caught.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 37076747-8e6d-40bb-8e3c-c4faeef33330

📥 Commits

Reviewing files that changed from the base of the PR and between 729ca4c and bf90dcd.

📒 Files selected for processing (10)

mllm/backends/qnn/QNNAllocator.cpp
mllm/backends/qnn/QNNBackend.cpp
mllm/backends/qnn/QNNModel.cpp
mllm/backends/qnn/QNNUtils.hpp
mllm/backends/qnn/aot/QnnWrappersAPI.cpp
mllm/backends/qnn/aot/visitor/Matmul.cpp
mllm/backends/qnn/aot/visitor/Repeat.cpp
mllm/backends/qnn/aot/visitor/View.cpp
mllm/core/aops/VisionRoPEOp.cpp
mllm/mllm.hpp

twlddd · 2026-05-18T06:43:53Z

Context / Follow-up Plan

This PR is the first step toward adding full QNN AOT inference support for Qwen2-VL 2B.

The full work is intentionally split into smaller PRs to keep review scope manageable:

This PR: shared QNN AOT runtime/lowering correctness fixes.
Follow-up PR: generic QNN AOT support for visual graph lowering, including simple graph lowering and LayerNorm/GELU visitors.
Follow-up PR: Qwen2-VL 2B full-QNN example, including visual bucket graphs, masked visual padding, interactive runner, and deployment docs.

These fixes are kept model-agnostic because they also improve QNN AOT correctness for other models.

In local testing, these changes are part of a Qwen2-VL 2B full-QNN path where ViT, LLM prefill, and decode all run through QNN AOT on SM8650.

Fix QNN AOT runtime and lowering issues

bf90dcd

twlddd requested review from chenghuaWang, liang1232018 and oreomaker as code owners May 18, 2026 06:37

coderabbitai Bot reviewed May 18, 2026

View reviewed changes

twlddd mentioned this pull request May 18, 2026

Add QNN AOT support for visual graph lowering #675

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix QNN AOT runtime and lowering issues#674

Fix QNN AOT runtime and lowering issues#674
twlddd wants to merge 1 commit into
UbiquitousLearning:mainfrom
twlddd:qnn-aot-backend-fixes-pr1

twlddd commented May 18, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 18, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

twlddd commented May 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

twlddd commented May 18, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Validation

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

twlddd commented May 18, 2026

Context / Follow-up Plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

twlddd commented May 18, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 18, 2026 •

edited

Loading