Add QNN AOT support for visual graph lowering by twlddd · Pull Request #675 · UbiquitousLearning/mllm

twlddd · 2026-05-18T06:51:41Z

Please check Guidelines for Contributing.

Summary

Depends on #674.

This PR adds generic QNN AOT support needed for lowering visual encoder graphs:

Add an explicit simple graph lowering pipeline for non-LLM graph roots.
Add QNN AOT visitors for GELU and LayerNorm.
Add quant recipe support for GELU and LayerNorm.
Add PTQ handling for raw LayerNorm weights.
Support raw Linear weights only when allow_raw_float_linear=true is explicitly set in the quant recipe config.
Add Conv2D bias raw quant recipe handling.

This is model-agnostic infrastructure. A follow-up PR will add the Qwen2-VL 2B full-QNN example and runtime scripts.

Validation

git diff --cached --check
cmake --build build-qnn-aot --target mllm-qwen2vl-aot-c -j2

Summary by CodeRabbit

New Features
- Added QNN AOT visitor support for LayerNorm and GELU operations with quantization.
- Introduced simplified QNN graph lowering pipeline factory.
Bug Fixes
- Fixed VisionRoPE operation type initialization.
- Fixed Android stack trace output buffer handling.
- Added output tensor count validation in graph execution.
Improvements
- Enhanced diagnostic logging for tensor operations and quantization metadata.
- Extended quantization recipe support for LayerNorm and GELU patterns.
- Improved tensor naming and quantization handling in QNN AOT compilation.

coderabbitai · 2026-05-18T06:51:48Z

📝 Walkthrough

Walkthrough

This PR expands QNN AOT lowering infrastructure by adding enhanced diagnostic output to memory allocation, introducing tensor naming improvements and quantization type handling, launching a simple-graph lowering path, implementing new quantization recipe patterns for LayerNorm and GELU, extending PTQ weight solving, and providing new QNN operator visitors and refinements to existing ones.

Changes

QNN AOT Lowering & Quantization Enhancement

Layer / File(s)	Summary
Enhanced QNN allocator diagnostics and wrapper binding `mllm/backends/qnn/QNNAllocator.cpp`, `mllm/backends/qnn/QNNBackend.cpp`, `mllm/backends/qnn/QNNModel.cpp`, `mllm/backends/qnn/QNNUtils.hpp`	Memory allocation failures log detailed tensor/pointer/dimension context; wrapper data containers are unconditionally rebound on every execution run; IO metadata is conditionally logged via `MLLM_QNN_DUMP_IO` environment variable.
Tensor naming and QNN node registration improvements `mllm/backends/qnn/aot/QnnWrappersAPI.cpp`	Tensor names prefer IR `symbol` attribute; symmetric per-tensor quantization offset selection expanded across target integer types; node insertion failures trigger explicit error and exit.
QNN AOT pipeline factory and simple graph lowering support `mllm/backends/qnn/aot/passes/AOTPipeline.cpp`, `mllm/backends/qnn/aot/passes/AOTPipeline.hpp`, `mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp`, `mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.hpp`	New `createQnnAOTSimpleLoweringPipeline` factory provides deterministic pass sequencing; `LLM2QnnLoweringPass` constructor accepts optional `simple_qnn_graph_name` and adds `runSimpleGraph` implementation for recursive CallGraphOp lowering with pattern-driven rewrites.
Quantization recipe patterns for LayerNorm and GELU `mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp`, `mllm/backends/qnn/aot/passes/LLMQuantRecipePass.hpp`	New patterns conditionally assign raw quant specs to LayerNorm weight/bias and GELU inputs/outputs; Linear pattern adds raw-float support and bias weight quantization; elementwise patterns generate fresh quant specs for second inputs.
PTQ weight solving expansion for LayerNorm and multiple quantization types `mllm/backends/qnn/aot/passes/PTQPass.cpp`	`solveLinearWeight` iterates all annotated weights and supports both `kRaw` and `kLPBQ` specs; new `solveLayerNormWeights` helper solves LayerNorm weight specs; recursive weight solving recognizes `LayerNormOp` nodes.
New QNN AOT visitor patterns for GELU and LayerNorm `mllm/backends/qnn/aot/visitor/GELU.cpp`, `mllm/backends/qnn/aot/visitor/GELU.hpp`, `mllm/backends/qnn/aot/visitor/LayerNorm.cpp`, `mllm/backends/qnn/aot/visitor/LayerNorm.hpp`	New `QnnAOTGELUPattern` and `QnnAOTLayerNormPattern` implementations match linalg operators with `using_qnn` attribute and rewrite them to QNN AOT nodes with proper context/graph tensor capturing and parameter configuration.
QNN AOT visitor updates for Conv2D, MatMul, Repeat, and View `mllm/backends/qnn/aot/visitor/Conv2D.cpp`, `mllm/backends/qnn/aot/visitor/Matmul.cpp`, `mllm/backends/qnn/aot/visitor/Repeat.cpp`, `mllm/backends/qnn/aot/visitor/View.cpp`	Conv2D ensures bias tensor has `quant_recipe` attribute; MatMul validates and extracts transpose options; Repeat implements reshape-tile-reshape for repeat_interleave semantics; View detects tensor aliasing and handles shape-change errors.
Bug fixes and type corrections `mllm/core/aops/VisionRoPEOp.cpp`, `mllm/mllm.hpp`	`VisionRoPEOp` constructor corrects op type from `kSiLU` to `kVisionRoPE`; Android stack-trace output clamps frame-length to prevent negative or oversized `safe_write` arguments.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

UbiquitousLearning/mllm#593: Both PRs modify quantization/weight-solving logic in PTQPass.cpp for expanded annotation-based weight processing.
UbiquitousLearning/mllm#606: Both PRs modify graphExecute wrapper binding and input/output preparation in QNNBackend.cpp.
UbiquitousLearning/mllm#578: Both PRs enhance QNN allocator diagnostics and wrapper internals in QNNAllocator.cpp and QNNUtils.hpp.

Suggested reviewers

liang1232018
oreomaker
chenghuaWang

Poem

🐰 Whiskers twitch with QNN delight,
Tensors named and lowered right,
Recipes quantized with care,
GELU and LayerNorm everywhere,
Diagnostics bloom—no more bare asserts in sight! 📊✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 9.43% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and specifically summarizes the main change: adding QNN AOT support for visual graph lowering, which is the primary objective of this PR.
Description check	✅ Passed	The PR description provides a clear summary section that outlines all major changes (simple graph pipeline, GELU/LayerNorm visitors, quant recipe support, PTQ handling, linear weights support, Conv2D bias handling), includes validation steps, and notes this is infrastructure for visual encoder graphs. However, it lacks detailed explanation of individual changes and does not follow the template structure with explicit sections for What, Why, and How.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 7

🧹 Nitpick comments (1)

mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp (1)
291-291: ⚡ Quick win

Fix: this IRWriter::WALK_CONTINUE is valid (no compile break)
mllm/compile/ir/Node.hpp defines IRWriter::enum WalkResult { WALK_CONTINUE = 0, ... } as an unscoped enum, so ir::IRWriter::WALK_CONTINUE is a valid qualifier; the concern about WalkResult-scoping causing a compile break doesn’t apply. For consistency with nearby uses, consider switching this site to ir::IRWriter::WalkResult::WALK_CONTINUE.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp` at line 291, The return
uses the unscoped enum constant ir::IRWriter::WALK_CONTINUE which is valid but
inconsistent with nearby uses; update the return expression in
LLM2QnnLoweringPass.cpp to use the scoped enum form
ir::IRWriter::WalkResult::WALK_CONTINUE (or the equivalent scoped WalkResult
qualifier used elsewhere) so the code matches surrounding style and improves
readability while leaving behavior unchanged.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp`:
- Around line 267-270: runSimpleGraph is checking the wrong QNN marker attribute
("using_qnn") and thus can skip ops that the main lowering path (run) marks with
"use_qnn"; update the attribute check in runSimpleGraph to use the same key as
run — replace the linalg_op->getAttr("using_qnn") check with
linalg_op->getAttr("use_qnn") (the change affects the block that logs MLLM_WARN
and returns WALK_BREAK) so both paths consistently recognize QNN ops.
- Around line 255-258: Guard access to CallGraphOp's symbol attribute before
dereferencing: in the block where call_op is obtained (call_op =
some_op->cast_<ir::graph::CallGraphOp>()), check that call_op->getSymbolAttr()
is non-null (and optionally that getSymbolAttr()->str() is non-empty) before
passing it to getCtx()->lookupSymbolTable; if the attribute is null, call
MLLM_ERROR_EXIT (same style as existing error handling) with a clear message
that the CallGraphOp has no symbol attribute so you avoid crashing during
lookup.
- Around line 246-261: The recursive lambda lower_subgraph_inline lacks cycle
protection; add a visited set (e.g., std::unordered_set<std::string> visited or
std::unordered_set<ir::graph::SubGraphOp::ptr_t>) captured by reference into the
lambda, and at the start of lower_subgraph_inline check if the subgraph (by
pointer or by call_op->getSymbolAttr()->str()) is already in visited and return
immediately if so; otherwise insert it before walking the region and skip
recursing into called subgraphs that are present in visited to prevent infinite
recursion when handling ir::graph::CallGraphOp, ensuring the check happens
before calling lower_subgraph_inline(called->cast_<ir::graph::SubGraphOp>()).

In `@mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp`:
- Around line 1003-1013: When real_linear_op indicates bias is true but the
symbol lookup for the bias register (constructed via
linear_ir->getAOp()->getName() + ".bias") fails, do not silently skip — return a
failure so quant annotations stay consistent. Replace the current conditional
that ignores a missing bias_reg_tensor_ir with a check that returns an error
(use the project's error macro/return convention, e.g., MLLM_RETURN_FALSE_IF_NOT
or equivalent) when bias_reg_tensor_ir is null, and only proceed to validate
types and attach quant_recipe
(writer.create<ir::linalg::LinalgIRQuantizatonSpecAttr>,
annotation_attr->annotation_.weights.insert) if the lookup succeeded.

In `@mllm/backends/qnn/aot/passes/PTQPass.cpp`:
- Around line 40-69: The NYI error message inside the LPBQ branch is misleading:
update the NYI call (inside the loop over annotation.weights where variables
name, weight_spec and mllm_op->getName() are used) to state that the weight
parameter name must be "weight" (include the offending name and the operator
name for context) rather than saying “only supports Linear weight”; keep the
existing guard (if (name != "weight")) and change only the message text to
clarify the restriction applies to the parameter name.

In `@mllm/backends/qnn/aot/visitor/LayerNorm.cpp`:
- Around line 49-63: The code dereferences the result of
writer.getContext()->lookupSymbolTable(base_op->getName() + ".weight") and
".bias" without null/emptiness checks; update the LayerNorm visitor to first
validate that lookupSymbolTable(...) returns a non-null pointer and that
->outputs() is non-empty before calling front() and cast_<...>(), and if the
symbol is missing or outputs() is empty return/propagate a controlled error (or
log and abort the pass) instead of dereferencing; apply the same validation to
both the weight and bias paths (referencing getContext(), lookupSymbolTable,
base_op->getName() + ".weight"/".bias", outputs(), front(),
cast_<ir::tensor::TensorValue>(), qnn_op_node->emplaceInput, and
env->captureQnnAOTNodeTensor) so the pass fails fast and safely.
- Around line 68-72: The code writes axes_data[0] =
input->tensor_.shape().size() - 1 which underflows for rank-0 tensors; fix by
guarding the rank before subtracting: read size_t rank =
input->tensor_.shape().size(); if rank == 0 either set the axis to 0
(axes_data[0] = 0) or return/log an error for invalid input depending on
LayerNorm's contract, otherwise set axes_data[0] = static_cast<uint32_t>(rank -
1); keep the existing axes_param/axes_data and qnn_op_node->emplaceParamTensor
use but ensure you validate rank first to avoid underflow.

---

Nitpick comments:
In `@mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp`:
- Line 291: The return uses the unscoped enum constant
ir::IRWriter::WALK_CONTINUE which is valid but inconsistent with nearby uses;
update the return expression in LLM2QnnLoweringPass.cpp to use the scoped enum
form ir::IRWriter::WalkResult::WALK_CONTINUE (or the equivalent scoped
WalkResult qualifier used elsewhere) so the code matches surrounding style and
improves readability while leaving behavior unchanged.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 3ccddc83-593c-4f92-977a-6b8d125f926f

📥 Commits

Reviewing files that changed from the base of the PR and between 729ca4c and 7436ea3.

📒 Files selected for processing (22)

mllm/backends/qnn/QNNAllocator.cpp
mllm/backends/qnn/QNNBackend.cpp
mllm/backends/qnn/QNNModel.cpp
mllm/backends/qnn/QNNUtils.hpp
mllm/backends/qnn/aot/QnnWrappersAPI.cpp
mllm/backends/qnn/aot/passes/AOTPipeline.cpp
mllm/backends/qnn/aot/passes/AOTPipeline.hpp
mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp
mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.hpp
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.hpp
mllm/backends/qnn/aot/passes/PTQPass.cpp
mllm/backends/qnn/aot/visitor/Conv2D.cpp
mllm/backends/qnn/aot/visitor/GELU.cpp
mllm/backends/qnn/aot/visitor/GELU.hpp
mllm/backends/qnn/aot/visitor/LayerNorm.cpp
mllm/backends/qnn/aot/visitor/LayerNorm.hpp
mllm/backends/qnn/aot/visitor/Matmul.cpp
mllm/backends/qnn/aot/visitor/Repeat.cpp
mllm/backends/qnn/aot/visitor/View.cpp
mllm/core/aops/VisionRoPEOp.cpp
mllm/mllm.hpp

coderabbitai · 2026-05-21T04:38:10Z

+  std::function<void(ir::graph::SubGraphOp::ptr_t)> lower_subgraph_inline;
+  lower_subgraph_inline = [&](ir::graph::SubGraphOp::ptr_t subgraph) {
+    auto region = subgraph->getTopRegion();
+    if (!region) { return; }
+
+    auto subgraph_writer = ir::IRWriter(getCtx(), region);
+    subgraph_writer.walk<ir::Op>(
+        [&](ir::IRWriter& this_tough_writer, const ir::Op::ptr_t& some_op) -> ir::IRWriter::WalkResult {
+          if (some_op->isa_<ir::graph::CallGraphOp>()) {
+            auto call_op = some_op->cast_<ir::graph::CallGraphOp>();
+            auto called = getCtx()->lookupSymbolTable(call_op->getSymbolAttr()->str());
+            if (!called || !called->isa_<ir::graph::SubGraphOp>()) {
+              MLLM_ERROR_EXIT(ExitCode::kCoreError, "Cannot find called subgraph {}", call_op->getSymbolAttr()->str());
+            }
+            lower_subgraph_inline(called->cast_<ir::graph::SubGraphOp>());
+            return ir::IRWriter::WalkResult::WALK_CONTINUE;


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add cycle protection in recursive subgraph lowering.

lower_subgraph_inline recursively follows CallGraphOp edges but has no visited-set guard. Cyclic callgraphs will recurse indefinitely and can overflow the stack.

Suggested fix

+ std::unordered_set<std::string> visiting; std::function<void(ir::graph::SubGraphOp::ptr_t)> lower_subgraph_inline; lower_subgraph_inline = [&](ir::graph::SubGraphOp::ptr_t subgraph) { + auto symbol = subgraph->getSymbolAttr(); + const std::string current = symbol ? symbol->str() : ""; + if (!current.empty()) { + if (visiting.count(current)) { + MLLM_ERROR_EXIT(ExitCode::kCoreError, "Detected cyclic subgraph call at {}", current); + } + visiting.insert(current); + } + auto region = subgraph->getTopRegion(); - if (!region) { return; } + if (!region) { + if (!current.empty()) { visiting.erase(current); } + return; + } ... - lower_subgraph_inline(called->cast_<ir::graph::SubGraphOp>()); + lower_subgraph_inline(called->cast_<ir::graph::SubGraphOp>()); return ir::IRWriter::WalkResult::WALK_CONTINUE; } ... - }); + }); + if (!current.empty()) { visiting.erase(current); } };

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp` around lines 246 - 261, The recursive lambda lower_subgraph_inline lacks cycle protection; add a visited set (e.g., std::unordered_set<std::string> visited or std::unordered_set<ir::graph::SubGraphOp::ptr_t>) captured by reference into the lambda, and at the start of lower_subgraph_inline check if the subgraph (by pointer or by call_op->getSymbolAttr()->str()) is already in visited and return immediately if so; otherwise insert it before walking the region and skip recursing into called subgraphs that are present in visited to prevent infinite recursion when handling ir::graph::CallGraphOp, ensuring the check happens before calling lower_subgraph_inline(called->cast_<ir::graph::SubGraphOp>()).

coderabbitai · 2026-05-21T04:38:10Z

+            auto call_op = some_op->cast_<ir::graph::CallGraphOp>();
+            auto called = getCtx()->lookupSymbolTable(call_op->getSymbolAttr()->str());
+            if (!called || !called->isa_<ir::graph::SubGraphOp>()) {
+              MLLM_ERROR_EXIT(ExitCode::kCoreError, "Cannot find called subgraph {}", call_op->getSymbolAttr()->str());


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Guard CallGraphOp symbol access before dereference.

Line 256 dereferences call_op->getSymbolAttr() without a null check. A malformed IR node would crash the pass instead of returning a controlled failure.

Suggested fix

if (some_op->isa_<ir::graph::CallGraphOp>()) { auto call_op = some_op->cast_<ir::graph::CallGraphOp>(); - auto called = getCtx()->lookupSymbolTable(call_op->getSymbolAttr()->str()); + auto callee_symbol = call_op->getSymbolAttr(); + if (!callee_symbol) { + MLLM_ERROR_EXIT(ExitCode::kCoreError, "CallGraphOp missing symbol attr"); + } + auto called = getCtx()->lookupSymbolTable(callee_symbol->str()); if (!called || !called->isa_<ir::graph::SubGraphOp>()) { - MLLM_ERROR_EXIT(ExitCode::kCoreError, "Cannot find called subgraph {}", call_op->getSymbolAttr()->str()); + MLLM_ERROR_EXIT(ExitCode::kCoreError, "Cannot find called subgraph {}", callee_symbol->str()); }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp` around lines 255 - 258, Guard access to CallGraphOp's symbol attribute before dereferencing: in the block where call_op is obtained (call_op = some_op->cast_<ir::graph::CallGraphOp>()), check that call_op->getSymbolAttr() is non-null (and optionally that getSymbolAttr()->str() is non-empty) before passing it to getCtx()->lookupSymbolTable; if the attribute is null, call MLLM_ERROR_EXIT (same style as existing error handling) with a clear message that the CallGraphOp has no symbol attribute so you avoid crashing during lookup.

coderabbitai · 2026-05-21T04:38:10Z

+          if (!linalg_op->getAttr("using_qnn")) {
+            MLLM_WARN("Found non-qnn op: {} in graph: {}", linalg_op->getAOp()->getName(), qnn_graph_name);
+            return ir::IRWriter::WalkResult::WALK_BREAK;
+          }


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Use the same QNN marker attribute key as the main lowering path.

runSimpleGraph checks using_qnn, but run checks use_qnn (Line 177). This mismatch can skip valid QNN ops in simple-graph mode.

Suggested fix

- if (!linalg_op->getAttr("using_qnn")) { + if (!linalg_op->getAttr("use_qnn")) { MLLM_WARN("Found non-qnn op: {} in graph: {}", linalg_op->getAOp()->getName(), qnn_graph_name); return ir::IRWriter::WalkResult::WALK_BREAK; }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if (!linalg_op->getAttr("using_qnn")) {

MLLM_WARN("Found non-qnn op: {} in graph: {}", linalg_op->getAOp()->getName(), qnn_graph_name);

return ir::IRWriter::WalkResult::WALK_BREAK;

}

if (!linalg_op->getAttr("use_qnn")) {

MLLM_WARN("Found non-qnn op: {} in graph: {}", linalg_op->getAOp()->getName(), qnn_graph_name);

return ir::IRWriter::WalkResult::WALK_BREAK;

}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp` around lines 267 - 270, runSimpleGraph is checking the wrong QNN marker attribute ("using_qnn") and thus can skip ops that the main lowering path (run) marks with "use_qnn"; update the attribute check in runSimpleGraph to use the same key as run — replace the linalg_op->getAttr("using_qnn") check with linalg_op->getAttr("use_qnn") (the change affects the block that logs MLLM_WARN and returns WALK_BREAK) so both paths consistently recognize QNN ops.

coderabbitai · 2026-05-21T04:38:10Z

+    if (real_linear_op && real_linear_op->options().bias) {
+      auto bias_name = linear_ir->getAOp()->getName() + ".bias";
+      auto bias_reg_tensor_ir = writer.getContext()->lookupSymbolTable(bias_name);
+      if (bias_reg_tensor_ir) {
+        MLLM_RETURN_FALSE_IF_NOT(bias_reg_tensor_ir->isa_<ir::tensor::RegisterOp>());
+        MLLM_RETURN_FALSE_IF_NOT(bias_reg_tensor_ir->outputs().front()->isa_<ir::tensor::TensorValue>());
+        auto bias_tensor = bias_reg_tensor_ir->outputs().front()->cast_<ir::tensor::TensorValue>();
+        auto bias_quant_spec = ir::linalg::QuantizationSpecRaw::create(bias_tensor->tensor_.dtype());
+        bias_tensor->setAttr("quant_recipe", writer.create<ir::linalg::LinalgIRQuantizatonSpecAttr>(bias_quant_spec));
+        annotation_attr->annotation_.weights.insert({"bias", bias_quant_spec});
+      }


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Do not silently ignore missing Linear bias tensors when bias=true.

This branch skips bias quant annotation if symbol lookup fails, which can leave quant_recipe inconsistent with op options and fail later in less actionable places.

Suggested fix

- if (real_linear_op && real_linear_op->options().bias) { + if (real_linear_op && real_linear_op->options().bias) { auto bias_name = linear_ir->getAOp()->getName() + ".bias"; auto bias_reg_tensor_ir = writer.getContext()->lookupSymbolTable(bias_name); - if (bias_reg_tensor_ir) { - MLLM_RETURN_FALSE_IF_NOT(bias_reg_tensor_ir->isa_<ir::tensor::RegisterOp>()); - MLLM_RETURN_FALSE_IF_NOT(bias_reg_tensor_ir->outputs().front()->isa_<ir::tensor::TensorValue>()); - auto bias_tensor = bias_reg_tensor_ir->outputs().front()->cast_<ir::tensor::TensorValue>(); - auto bias_quant_spec = ir::linalg::QuantizationSpecRaw::create(bias_tensor->tensor_.dtype()); - bias_tensor->setAttr("quant_recipe", writer.create<ir::linalg::LinalgIRQuantizatonSpecAttr>(bias_quant_spec)); - annotation_attr->annotation_.weights.insert({"bias", bias_quant_spec}); - } + MLLM_RETURN_FALSE_IF_NOT(bias_reg_tensor_ir); + MLLM_RETURN_FALSE_IF_NOT(bias_reg_tensor_ir->isa_<ir::tensor::RegisterOp>()); + MLLM_RETURN_FALSE_IF_NOT(bias_reg_tensor_ir->outputs().front()->isa_<ir::tensor::TensorValue>()); + auto bias_tensor = bias_reg_tensor_ir->outputs().front()->cast_<ir::tensor::TensorValue>(); + auto bias_quant_spec = ir::linalg::QuantizationSpecRaw::create(bias_tensor->tensor_.dtype()); + bias_tensor->setAttr("quant_recipe", writer.create<ir::linalg::LinalgIRQuantizatonSpecAttr>(bias_quant_spec)); + annotation_attr->annotation_.weights.insert({"bias", bias_quant_spec}); }

As per coding guidelines, "Ensure functions that can fail return appropriate error codes or raise exceptions" and "Validate inputs for public APIs and critical internal functions."

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if (real_linear_op && real_linear_op->options().bias) {

auto bias_name = linear_ir->getAOp()->getName() + ".bias";

auto bias_reg_tensor_ir = writer.getContext()->lookupSymbolTable(bias_name);

if (bias_reg_tensor_ir) {

MLLM_RETURN_FALSE_IF_NOT(bias_reg_tensor_ir->isa_<ir::tensor::RegisterOp>());

MLLM_RETURN_FALSE_IF_NOT(bias_reg_tensor_ir->outputs().front()->isa_<ir::tensor::TensorValue>());

auto bias_tensor = bias_reg_tensor_ir->outputs().front()->cast_<ir::tensor::TensorValue>();

auto bias_quant_spec = ir::linalg::QuantizationSpecRaw::create(bias_tensor->tensor_.dtype());

bias_tensor->setAttr("quant_recipe", writer.create<ir::linalg::LinalgIRQuantizatonSpecAttr>(bias_quant_spec));

annotation_attr->annotation_.weights.insert({"bias", bias_quant_spec});

}

if (real_linear_op && real_linear_op->options().bias) {

auto bias_name = linear_ir->getAOp()->getName() + ".bias";

auto bias_reg_tensor_ir = writer.getContext()->lookupSymbolTable(bias_name);

MLLM_RETURN_FALSE_IF_NOT(bias_reg_tensor_ir);

MLLM_RETURN_FALSE_IF_NOT(bias_reg_tensor_ir->isa_<ir::tensor::RegisterOp>());

MLLM_RETURN_FALSE_IF_NOT(bias_reg_tensor_ir->outputs().front()->isa_<ir::tensor::TensorValue>());

auto bias_tensor = bias_reg_tensor_ir->outputs().front()->cast_<ir::tensor::TensorValue>();

auto bias_quant_spec = ir::linalg::QuantizationSpecRaw::create(bias_tensor->tensor_.dtype());

bias_tensor->setAttr("quant_recipe", writer.create<ir::linalg::LinalgIRQuantizatonSpecAttr>(bias_quant_spec));

annotation_attr->annotation_.weights.insert({"bias", bias_quant_spec});

}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp` around lines 1003 - 1013, When real_linear_op indicates bias is true but the symbol lookup for the bias register (constructed via linear_ir->getAOp()->getName() + ".bias") fails, do not silently skip — return a failure so quant annotations stay consistent. Replace the current conditional that ignores a missing bias_reg_tensor_ir with a check that returns an error (use the project's error macro/return convention, e.g., MLLM_RETURN_FALSE_IF_NOT or equivalent) when bias_reg_tensor_ir is null, and only proceed to validate types and attach quant_recipe (writer.create<ir::linalg::LinalgIRQuantizatonSpecAttr>, annotation_attr->annotation_.weights.insert) if the lookup succeeded.

coderabbitai · 2026-05-21T04:38:10Z

+  for (const auto& [name, weight_spec] : annotation.weights) {
+    if (weight_spec->solved) continue;

-      this_spec->scale_level_0_int = scale1;
-      this_spec->scale_level_1_fp = scale2;
-
-      weight_spec->solved = true;
-      break;
-    }
-    default: {
-      NYI("quant recipe type not support");
+    switch (weight_spec->type) {
+      case ir::linalg::QuantizationSpecType::kRaw: {
+        weight_spec->solved = true;
+        break;
+      }
+      case ir::linalg::QuantizationSpecType::kLPBQ: {
+        if (name != "weight") { NYI("LPBQ quant recipe only supports Linear weight, got '{}'.", name); }
+        auto this_spec = std::static_pointer_cast<ir::linalg::QuantizationSpecLPBQ>(weight_spec);
+        auto scale1 = pf->pull(mllm_op->getName() + ".scale1");  // using uint8 to store uint4
+        auto scale2 = pf->pull(mllm_op->getName() + ".scale2");
+        auto weight = pf->pull(mllm_op->getName() + ".weight");
+
+        // FIXME weight maybe error, Check qnn eats int8 or uint8. Here weight using int8 to store int4.
+        checkTypeLimits<int8_t>(weight, 0, 15);   // Int4
+        checkTypeLimits<uint8_t>(scale1, 0, 16);  // UInt4
+
+        this_spec->scale_level_0_int = scale1;
+        this_spec->scale_level_1_fp = scale2;
+
+        weight_spec->solved = true;
+        break;
+      }
+      default: {
+        NYI("quant recipe type not support");
+      }
    }
  }


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Clarify error message for LPBQ weight name restriction.

Line 49's error message states "LPBQ quant recipe only supports Linear weight" but this function is also invoked for Conv2D operators (line 173). The restriction is actually on the weight parameter name (must be "weight"), not the operator type. The current message may mislead developers debugging Conv2D quantization issues or attempting to apply LPBQ to Conv2D bias weights.

📝 Proposed fix to clarify the error message

- if (name != "weight") { NYI("LPBQ quant recipe only supports Linear weight, got '{}'.", name); } + if (name != "weight") { NYI("LPBQ quant recipe only supports weight parameter named 'weight', got '{}'.", name); }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@mllm/backends/qnn/aot/passes/PTQPass.cpp` around lines 40 - 69, The NYI error message inside the LPBQ branch is misleading: update the NYI call (inside the loop over annotation.weights where variables name, weight_spec and mllm_op->getName() are used) to state that the weight parameter name must be "weight" (include the offending name and the operator name for context) rather than saying “only supports Linear weight”; keep the existing guard (if (name != "weight")) and change only the message text to clarify the restriction applies to the parameter name.

coderabbitai · 2026-05-21T04:38:10Z

+    auto weight = writer.getContext()
+                      ->lookupSymbolTable(base_op->getName() + ".weight")
+                      ->outputs()
+                      .front()
+                      ->cast_<ir::tensor::TensorValue>();
+    qnn_op_node->emplaceInput(env->captureQnnAOTNodeTensor(qnn_context_name, qnn_graph_name, weight, true));
+  }
+
+  if (real_layer_norm_op->options().bias) {
+    auto bias = writer.getContext()
+                    ->lookupSymbolTable(base_op->getName() + ".bias")
+                    ->outputs()
+                    .front()
+                    ->cast_<ir::tensor::TensorValue>();
+    qnn_op_node->emplaceInput(env->captureQnnAOTNodeTensor(qnn_context_name, qnn_graph_name, bias, true));


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fail fast on missing LayerNorm weight/bias symbols before dereferencing.

Both weight and bias paths dereference lookupSymbolTable(...) results immediately; if a symbol is absent, this can crash the pass instead of returning a controlled failure.

Suggested fix

if (real_layer_norm_op->options().elementwise_affine) { - auto weight = writer.getContext() - ->lookupSymbolTable(base_op->getName() + ".weight") - ->outputs() - .front() - ->cast_<ir::tensor::TensorValue>(); + auto weight_sym = writer.getContext()->lookupSymbolTable(base_op->getName() + ".weight"); + MLLM_RETURN_FALSE_IF_NOT(weight_sym); + MLLM_RETURN_FALSE_IF_NOT(weight_sym->outputs().front()->isa_<ir::tensor::TensorValue>()); + auto weight = weight_sym->outputs().front()->cast_<ir::tensor::TensorValue>(); qnn_op_node->emplaceInput(env->captureQnnAOTNodeTensor(qnn_context_name, qnn_graph_name, weight, true)); } if (real_layer_norm_op->options().bias) { - auto bias = writer.getContext() - ->lookupSymbolTable(base_op->getName() + ".bias") - ->outputs() - .front() - ->cast_<ir::tensor::TensorValue>(); + auto bias_sym = writer.getContext()->lookupSymbolTable(base_op->getName() + ".bias"); + MLLM_RETURN_FALSE_IF_NOT(bias_sym); + MLLM_RETURN_FALSE_IF_NOT(bias_sym->outputs().front()->isa_<ir::tensor::TensorValue>()); + auto bias = bias_sym->outputs().front()->cast_<ir::tensor::TensorValue>(); qnn_op_node->emplaceInput(env->captureQnnAOTNodeTensor(qnn_context_name, qnn_graph_name, bias, true)); }

As per coding guidelines, "Validate inputs for public APIs and critical internal functions."

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@mllm/backends/qnn/aot/visitor/LayerNorm.cpp` around lines 49 - 63, The code dereferences the result of writer.getContext()->lookupSymbolTable(base_op->getName() + ".weight") and ".bias" without null/emptiness checks; update the LayerNorm visitor to first validate that lookupSymbolTable(...) returns a non-null pointer and that ->outputs() is non-empty before calling front() and cast_<...>(), and if the symbol is missing or outputs() is empty return/propagate a controlled error (or log and abort the pass) instead of dereferencing; apply the same validation to both the weight and bias paths (referencing getContext(), lookupSymbolTable, base_op->getName() + ".weight"/".bias", outputs(), front(), cast_<ir::tensor::TensorValue>(), qnn_op_node->emplaceInput, and env->captureQnnAOTNodeTensor) so the pass fails fast and safely.

coderabbitai · 2026-05-21T04:38:10Z

+  std::vector<uint32_t> axes_dims = {1};
+  auto axes_param = QNNParamTensorWrapper::create("axes", base_op->getName() + "_axes", QNN_DATATYPE_UINT_32, axes_dims);
+  auto* axes_data = reinterpret_cast<uint32_t*>(axes_param->alloc());
+  axes_data[0] = input->tensor_.shape().size() - 1;
+  qnn_op_node->emplaceParamTensor(axes_param);


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Guard rank before deriving the last-axis index.

shape().size() - 1 underflows for rank-0 tensors and writes an invalid axes value.

Suggested fix

std::vector<uint32_t> axes_dims = {1}; auto axes_param = QNNParamTensorWrapper::create("axes", base_op->getName() + "_axes", QNN_DATATYPE_UINT_32, axes_dims); auto* axes_data = reinterpret_cast<uint32_t*>(axes_param->alloc()); - axes_data[0] = input->tensor_.shape().size() - 1; + const auto rank = input->tensor_.shape().size(); + MLLM_RETURN_FALSE_IF_NOT(rank > 0); + axes_data[0] = static_cast<uint32_t>(rank - 1); qnn_op_node->emplaceParamTensor(axes_param);

As per coding guidelines, "Validate inputs for public APIs and critical internal functions."

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

std::vector<uint32_t> axes_dims = {1};

auto axes_param = QNNParamTensorWrapper::create("axes", base_op->getName() + "_axes", QNN_DATATYPE_UINT_32, axes_dims);

auto* axes_data = reinterpret_cast<uint32_t*>(axes_param->alloc());

axes_data[0] = input->tensor_.shape().size() - 1;

qnn_op_node->emplaceParamTensor(axes_param);

std::vector<uint32_t> axes_dims = {1};

auto axes_param = QNNParamTensorWrapper::create("axes", base_op->getName() + "_axes", QNN_DATATYPE_UINT_32, axes_dims);

auto* axes_data = reinterpret_cast<uint32_t*>(axes_param->alloc());

const auto rank = input->tensor_.shape().size();

MLLM_RETURN_FALSE_IF_NOT(rank > 0);

axes_data[0] = static_cast<uint32_t>(rank - 1);

qnn_op_node->emplaceParamTensor(axes_param);

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@mllm/backends/qnn/aot/visitor/LayerNorm.cpp` around lines 68 - 72, The code writes axes_data[0] = input->tensor_.shape().size() - 1 which underflows for rank-0 tensors; fix by guarding the rank before subtracting: read size_t rank = input->tensor_.shape().size(); if rank == 0 either set the axis to 0 (axes_data[0] = 0) or return/log an error for invalid input depending on LayerNorm's contract, otherwise set axes_data[0] = static_cast<uint32_t>(rank - 1); keep the existing axes_param/axes_data and qnn_op_node->emplaceParamTensor use but ensure you validate rank first to avoid underflow.

twlddd added 2 commits May 18, 2026 14:32

Fix QNN AOT runtime and lowering issues

bf90dcd

Add QNN AOT support for visual graph lowering

7436ea3

twlddd mentioned this pull request May 18, 2026

Add Qwen2-VL full QNN AOT example #676

Open

twlddd marked this pull request as ready for review May 21, 2026 04:30

twlddd requested review from chenghuaWang, liang1232018 and oreomaker as code owners May 21, 2026 04:30

coderabbitai Bot reviewed May 21, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add QNN AOT support for visual graph lowering#675

Add QNN AOT support for visual graph lowering#675
twlddd wants to merge 2 commits into
UbiquitousLearning:mainfrom
twlddd:qnn-aot-visual-graph-support-pr2

twlddd commented May 18, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 18, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 21, 2026

Uh oh!

coderabbitai Bot May 21, 2026

Uh oh!

coderabbitai Bot May 21, 2026

Uh oh!

coderabbitai Bot May 21, 2026

Uh oh!

coderabbitai Bot May 21, 2026

Uh oh!

coderabbitai Bot May 21, 2026

Uh oh!

coderabbitai Bot May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

twlddd commented May 18, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Validation

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 21, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 21, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 21, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 21, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 21, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 21, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 21, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

twlddd commented May 18, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 18, 2026 •

edited

Loading