Skip to content

Add QNN AOT support for visual graph lowering#675

Open
twlddd wants to merge 2 commits into
UbiquitousLearning:mainfrom
twlddd:qnn-aot-visual-graph-support-pr2
Open

Add QNN AOT support for visual graph lowering#675
twlddd wants to merge 2 commits into
UbiquitousLearning:mainfrom
twlddd:qnn-aot-visual-graph-support-pr2

Conversation

@twlddd
Copy link
Copy Markdown

@twlddd twlddd commented May 18, 2026

Please check Guidelines for Contributing.

Summary

Depends on #674.

This PR adds generic QNN AOT support needed for lowering visual encoder graphs:

  • Add an explicit simple graph lowering pipeline for non-LLM graph roots.
  • Add QNN AOT visitors for GELU and LayerNorm.
  • Add quant recipe support for GELU and LayerNorm.
  • Add PTQ handling for raw LayerNorm weights.
  • Support raw Linear weights only when allow_raw_float_linear=true is explicitly set in the quant recipe config.
  • Add Conv2D bias raw quant recipe handling.

This is model-agnostic infrastructure. A follow-up PR will add the Qwen2-VL 2B full-QNN example and runtime scripts.

Validation

  • git diff --cached --check
  • cmake --build build-qnn-aot --target mllm-qwen2vl-aot-c -j2

Summary by CodeRabbit

  • New Features

    • Added QNN AOT visitor support for LayerNorm and GELU operations with quantization.
    • Introduced simplified QNN graph lowering pipeline factory.
  • Bug Fixes

    • Fixed VisionRoPE operation type initialization.
    • Fixed Android stack trace output buffer handling.
    • Added output tensor count validation in graph execution.
  • Improvements

    • Enhanced diagnostic logging for tensor operations and quantization metadata.
    • Extended quantization recipe support for LayerNorm and GELU patterns.
    • Improved tensor naming and quantization handling in QNN AOT compilation.

Review Change Stack

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 18, 2026

📝 Walkthrough

Walkthrough

This PR expands QNN AOT lowering infrastructure by adding enhanced diagnostic output to memory allocation, introducing tensor naming improvements and quantization type handling, launching a simple-graph lowering path, implementing new quantization recipe patterns for LayerNorm and GELU, extending PTQ weight solving, and providing new QNN operator visitors and refinements to existing ones.

Changes

QNN AOT Lowering & Quantization Enhancement

Layer / File(s) Summary
Enhanced QNN allocator diagnostics and wrapper binding
mllm/backends/qnn/QNNAllocator.cpp, mllm/backends/qnn/QNNBackend.cpp, mllm/backends/qnn/QNNModel.cpp, mllm/backends/qnn/QNNUtils.hpp
Memory allocation failures log detailed tensor/pointer/dimension context; wrapper data containers are unconditionally rebound on every execution run; IO metadata is conditionally logged via MLLM_QNN_DUMP_IO environment variable.
Tensor naming and QNN node registration improvements
mllm/backends/qnn/aot/QnnWrappersAPI.cpp
Tensor names prefer IR symbol attribute; symmetric per-tensor quantization offset selection expanded across target integer types; node insertion failures trigger explicit error and exit.
QNN AOT pipeline factory and simple graph lowering support
mllm/backends/qnn/aot/passes/AOTPipeline.cpp, mllm/backends/qnn/aot/passes/AOTPipeline.hpp, mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp, mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.hpp
New createQnnAOTSimpleLoweringPipeline factory provides deterministic pass sequencing; LLM2QnnLoweringPass constructor accepts optional simple_qnn_graph_name and adds runSimpleGraph implementation for recursive CallGraphOp lowering with pattern-driven rewrites.
Quantization recipe patterns for LayerNorm and GELU
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp, mllm/backends/qnn/aot/passes/LLMQuantRecipePass.hpp
New patterns conditionally assign raw quant specs to LayerNorm weight/bias and GELU inputs/outputs; Linear pattern adds raw-float support and bias weight quantization; elementwise patterns generate fresh quant specs for second inputs.
PTQ weight solving expansion for LayerNorm and multiple quantization types
mllm/backends/qnn/aot/passes/PTQPass.cpp
solveLinearWeight iterates all annotated weights and supports both kRaw and kLPBQ specs; new solveLayerNormWeights helper solves LayerNorm weight specs; recursive weight solving recognizes LayerNormOp nodes.
New QNN AOT visitor patterns for GELU and LayerNorm
mllm/backends/qnn/aot/visitor/GELU.cpp, mllm/backends/qnn/aot/visitor/GELU.hpp, mllm/backends/qnn/aot/visitor/LayerNorm.cpp, mllm/backends/qnn/aot/visitor/LayerNorm.hpp
New QnnAOTGELUPattern and QnnAOTLayerNormPattern implementations match linalg operators with using_qnn attribute and rewrite them to QNN AOT nodes with proper context/graph tensor capturing and parameter configuration.
QNN AOT visitor updates for Conv2D, MatMul, Repeat, and View
mllm/backends/qnn/aot/visitor/Conv2D.cpp, mllm/backends/qnn/aot/visitor/Matmul.cpp, mllm/backends/qnn/aot/visitor/Repeat.cpp, mllm/backends/qnn/aot/visitor/View.cpp
Conv2D ensures bias tensor has quant_recipe attribute; MatMul validates and extracts transpose options; Repeat implements reshape-tile-reshape for repeat_interleave semantics; View detects tensor aliasing and handles shape-change errors.
Bug fixes and type corrections
mllm/core/aops/VisionRoPEOp.cpp, mllm/mllm.hpp
VisionRoPEOp constructor corrects op type from kSiLU to kVisionRoPE; Android stack-trace output clamps frame-length to prevent negative or oversized safe_write arguments.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested reviewers

  • liang1232018
  • oreomaker
  • chenghuaWang

Poem

🐰 Whiskers twitch with QNN delight,
Tensors named and lowered right,
Recipes quantized with care,
GELU and LayerNorm everywhere,
Diagnostics bloom—no more bare asserts in sight! 📊✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 9.43% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically summarizes the main change: adding QNN AOT support for visual graph lowering, which is the primary objective of this PR.
Description check ✅ Passed The PR description provides a clear summary section that outlines all major changes (simple graph pipeline, GELU/LayerNorm visitors, quant recipe support, PTQ handling, linear weights support, Conv2D bias handling), includes validation steps, and notes this is infrastructure for visual encoder graphs. However, it lacks detailed explanation of individual changes and does not follow the template structure with explicit sections for What, Why, and How.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@twlddd twlddd marked this pull request as ready for review May 21, 2026 04:30
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🧹 Nitpick comments (1)
mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp (1)

291-291: ⚡ Quick win

Fix: this IRWriter::WALK_CONTINUE is valid (no compile break)
mllm/compile/ir/Node.hpp defines IRWriter::enum WalkResult { WALK_CONTINUE = 0, ... } as an unscoped enum, so ir::IRWriter::WALK_CONTINUE is a valid qualifier; the concern about WalkResult-scoping causing a compile break doesn’t apply. For consistency with nearby uses, consider switching this site to ir::IRWriter::WalkResult::WALK_CONTINUE.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp` at line 291, The return
uses the unscoped enum constant ir::IRWriter::WALK_CONTINUE which is valid but
inconsistent with nearby uses; update the return expression in
LLM2QnnLoweringPass.cpp to use the scoped enum form
ir::IRWriter::WalkResult::WALK_CONTINUE (or the equivalent scoped WalkResult
qualifier used elsewhere) so the code matches surrounding style and improves
readability while leaving behavior unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp`:
- Around line 267-270: runSimpleGraph is checking the wrong QNN marker attribute
("using_qnn") and thus can skip ops that the main lowering path (run) marks with
"use_qnn"; update the attribute check in runSimpleGraph to use the same key as
run — replace the linalg_op->getAttr("using_qnn") check with
linalg_op->getAttr("use_qnn") (the change affects the block that logs MLLM_WARN
and returns WALK_BREAK) so both paths consistently recognize QNN ops.
- Around line 255-258: Guard access to CallGraphOp's symbol attribute before
dereferencing: in the block where call_op is obtained (call_op =
some_op->cast_<ir::graph::CallGraphOp>()), check that call_op->getSymbolAttr()
is non-null (and optionally that getSymbolAttr()->str() is non-empty) before
passing it to getCtx()->lookupSymbolTable; if the attribute is null, call
MLLM_ERROR_EXIT (same style as existing error handling) with a clear message
that the CallGraphOp has no symbol attribute so you avoid crashing during
lookup.
- Around line 246-261: The recursive lambda lower_subgraph_inline lacks cycle
protection; add a visited set (e.g., std::unordered_set<std::string> visited or
std::unordered_set<ir::graph::SubGraphOp::ptr_t>) captured by reference into the
lambda, and at the start of lower_subgraph_inline check if the subgraph (by
pointer or by call_op->getSymbolAttr()->str()) is already in visited and return
immediately if so; otherwise insert it before walking the region and skip
recursing into called subgraphs that are present in visited to prevent infinite
recursion when handling ir::graph::CallGraphOp, ensuring the check happens
before calling lower_subgraph_inline(called->cast_<ir::graph::SubGraphOp>()).

In `@mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp`:
- Around line 1003-1013: When real_linear_op indicates bias is true but the
symbol lookup for the bias register (constructed via
linear_ir->getAOp()->getName() + ".bias") fails, do not silently skip — return a
failure so quant annotations stay consistent. Replace the current conditional
that ignores a missing bias_reg_tensor_ir with a check that returns an error
(use the project's error macro/return convention, e.g., MLLM_RETURN_FALSE_IF_NOT
or equivalent) when bias_reg_tensor_ir is null, and only proceed to validate
types and attach quant_recipe
(writer.create<ir::linalg::LinalgIRQuantizatonSpecAttr>,
annotation_attr->annotation_.weights.insert) if the lookup succeeded.

In `@mllm/backends/qnn/aot/passes/PTQPass.cpp`:
- Around line 40-69: The NYI error message inside the LPBQ branch is misleading:
update the NYI call (inside the loop over annotation.weights where variables
name, weight_spec and mllm_op->getName() are used) to state that the weight
parameter name must be "weight" (include the offending name and the operator
name for context) rather than saying “only supports Linear weight”; keep the
existing guard (if (name != "weight")) and change only the message text to
clarify the restriction applies to the parameter name.

In `@mllm/backends/qnn/aot/visitor/LayerNorm.cpp`:
- Around line 49-63: The code dereferences the result of
writer.getContext()->lookupSymbolTable(base_op->getName() + ".weight") and
".bias" without null/emptiness checks; update the LayerNorm visitor to first
validate that lookupSymbolTable(...) returns a non-null pointer and that
->outputs() is non-empty before calling front() and cast_<...>(), and if the
symbol is missing or outputs() is empty return/propagate a controlled error (or
log and abort the pass) instead of dereferencing; apply the same validation to
both the weight and bias paths (referencing getContext(), lookupSymbolTable,
base_op->getName() + ".weight"/".bias", outputs(), front(),
cast_<ir::tensor::TensorValue>(), qnn_op_node->emplaceInput, and
env->captureQnnAOTNodeTensor) so the pass fails fast and safely.
- Around line 68-72: The code writes axes_data[0] =
input->tensor_.shape().size() - 1 which underflows for rank-0 tensors; fix by
guarding the rank before subtracting: read size_t rank =
input->tensor_.shape().size(); if rank == 0 either set the axis to 0
(axes_data[0] = 0) or return/log an error for invalid input depending on
LayerNorm's contract, otherwise set axes_data[0] = static_cast<uint32_t>(rank -
1); keep the existing axes_param/axes_data and qnn_op_node->emplaceParamTensor
use but ensure you validate rank first to avoid underflow.

---

Nitpick comments:
In `@mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp`:
- Line 291: The return uses the unscoped enum constant
ir::IRWriter::WALK_CONTINUE which is valid but inconsistent with nearby uses;
update the return expression in LLM2QnnLoweringPass.cpp to use the scoped enum
form ir::IRWriter::WalkResult::WALK_CONTINUE (or the equivalent scoped
WalkResult qualifier used elsewhere) so the code matches surrounding style and
improves readability while leaving behavior unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 3ccddc83-593c-4f92-977a-6b8d125f926f

📥 Commits

Reviewing files that changed from the base of the PR and between 729ca4c and 7436ea3.

📒 Files selected for processing (22)
  • mllm/backends/qnn/QNNAllocator.cpp
  • mllm/backends/qnn/QNNBackend.cpp
  • mllm/backends/qnn/QNNModel.cpp
  • mllm/backends/qnn/QNNUtils.hpp
  • mllm/backends/qnn/aot/QnnWrappersAPI.cpp
  • mllm/backends/qnn/aot/passes/AOTPipeline.cpp
  • mllm/backends/qnn/aot/passes/AOTPipeline.hpp
  • mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp
  • mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.hpp
  • mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp
  • mllm/backends/qnn/aot/passes/LLMQuantRecipePass.hpp
  • mllm/backends/qnn/aot/passes/PTQPass.cpp
  • mllm/backends/qnn/aot/visitor/Conv2D.cpp
  • mllm/backends/qnn/aot/visitor/GELU.cpp
  • mllm/backends/qnn/aot/visitor/GELU.hpp
  • mllm/backends/qnn/aot/visitor/LayerNorm.cpp
  • mllm/backends/qnn/aot/visitor/LayerNorm.hpp
  • mllm/backends/qnn/aot/visitor/Matmul.cpp
  • mllm/backends/qnn/aot/visitor/Repeat.cpp
  • mllm/backends/qnn/aot/visitor/View.cpp
  • mllm/core/aops/VisionRoPEOp.cpp
  • mllm/mllm.hpp

Comment on lines +246 to +261
std::function<void(ir::graph::SubGraphOp::ptr_t)> lower_subgraph_inline;
lower_subgraph_inline = [&](ir::graph::SubGraphOp::ptr_t subgraph) {
auto region = subgraph->getTopRegion();
if (!region) { return; }

auto subgraph_writer = ir::IRWriter(getCtx(), region);
subgraph_writer.walk<ir::Op>(
[&](ir::IRWriter& this_tough_writer, const ir::Op::ptr_t& some_op) -> ir::IRWriter::WalkResult {
if (some_op->isa_<ir::graph::CallGraphOp>()) {
auto call_op = some_op->cast_<ir::graph::CallGraphOp>();
auto called = getCtx()->lookupSymbolTable(call_op->getSymbolAttr()->str());
if (!called || !called->isa_<ir::graph::SubGraphOp>()) {
MLLM_ERROR_EXIT(ExitCode::kCoreError, "Cannot find called subgraph {}", call_op->getSymbolAttr()->str());
}
lower_subgraph_inline(called->cast_<ir::graph::SubGraphOp>());
return ir::IRWriter::WalkResult::WALK_CONTINUE;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add cycle protection in recursive subgraph lowering.

lower_subgraph_inline recursively follows CallGraphOp edges but has no visited-set guard. Cyclic callgraphs will recurse indefinitely and can overflow the stack.

Suggested fix
+  std::unordered_set<std::string> visiting;
   std::function<void(ir::graph::SubGraphOp::ptr_t)> lower_subgraph_inline;
   lower_subgraph_inline = [&](ir::graph::SubGraphOp::ptr_t subgraph) {
+    auto symbol = subgraph->getSymbolAttr();
+    const std::string current = symbol ? symbol->str() : "";
+    if (!current.empty()) {
+      if (visiting.count(current)) {
+        MLLM_ERROR_EXIT(ExitCode::kCoreError, "Detected cyclic subgraph call at {}", current);
+      }
+      visiting.insert(current);
+    }
+
     auto region = subgraph->getTopRegion();
-    if (!region) { return; }
+    if (!region) {
+      if (!current.empty()) { visiting.erase(current); }
+      return;
+    }
...
-            lower_subgraph_inline(called->cast_<ir::graph::SubGraphOp>());
+            lower_subgraph_inline(called->cast_<ir::graph::SubGraphOp>());
             return ir::IRWriter::WalkResult::WALK_CONTINUE;
           }
...
-        });
+        });
+    if (!current.empty()) { visiting.erase(current); }
   };
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp` around lines 246 - 261,
The recursive lambda lower_subgraph_inline lacks cycle protection; add a visited
set (e.g., std::unordered_set<std::string> visited or
std::unordered_set<ir::graph::SubGraphOp::ptr_t>) captured by reference into the
lambda, and at the start of lower_subgraph_inline check if the subgraph (by
pointer or by call_op->getSymbolAttr()->str()) is already in visited and return
immediately if so; otherwise insert it before walking the region and skip
recursing into called subgraphs that are present in visited to prevent infinite
recursion when handling ir::graph::CallGraphOp, ensuring the check happens
before calling lower_subgraph_inline(called->cast_<ir::graph::SubGraphOp>()).

Comment on lines +255 to +258
auto call_op = some_op->cast_<ir::graph::CallGraphOp>();
auto called = getCtx()->lookupSymbolTable(call_op->getSymbolAttr()->str());
if (!called || !called->isa_<ir::graph::SubGraphOp>()) {
MLLM_ERROR_EXIT(ExitCode::kCoreError, "Cannot find called subgraph {}", call_op->getSymbolAttr()->str());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Guard CallGraphOp symbol access before dereference.

Line 256 dereferences call_op->getSymbolAttr() without a null check. A malformed IR node would crash the pass instead of returning a controlled failure.

Suggested fix
           if (some_op->isa_<ir::graph::CallGraphOp>()) {
             auto call_op = some_op->cast_<ir::graph::CallGraphOp>();
-            auto called = getCtx()->lookupSymbolTable(call_op->getSymbolAttr()->str());
+            auto callee_symbol = call_op->getSymbolAttr();
+            if (!callee_symbol) {
+              MLLM_ERROR_EXIT(ExitCode::kCoreError, "CallGraphOp missing symbol attr");
+            }
+            auto called = getCtx()->lookupSymbolTable(callee_symbol->str());
             if (!called || !called->isa_<ir::graph::SubGraphOp>()) {
-              MLLM_ERROR_EXIT(ExitCode::kCoreError, "Cannot find called subgraph {}", call_op->getSymbolAttr()->str());
+              MLLM_ERROR_EXIT(ExitCode::kCoreError, "Cannot find called subgraph {}", callee_symbol->str());
             }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp` around lines 255 - 258,
Guard access to CallGraphOp's symbol attribute before dereferencing: in the
block where call_op is obtained (call_op =
some_op->cast_<ir::graph::CallGraphOp>()), check that call_op->getSymbolAttr()
is non-null (and optionally that getSymbolAttr()->str() is non-empty) before
passing it to getCtx()->lookupSymbolTable; if the attribute is null, call
MLLM_ERROR_EXIT (same style as existing error handling) with a clear message
that the CallGraphOp has no symbol attribute so you avoid crashing during
lookup.

Comment on lines +267 to +270
if (!linalg_op->getAttr("using_qnn")) {
MLLM_WARN("Found non-qnn op: {} in graph: {}", linalg_op->getAOp()->getName(), qnn_graph_name);
return ir::IRWriter::WalkResult::WALK_BREAK;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Use the same QNN marker attribute key as the main lowering path.

runSimpleGraph checks using_qnn, but run checks use_qnn (Line 177). This mismatch can skip valid QNN ops in simple-graph mode.

Suggested fix
-          if (!linalg_op->getAttr("using_qnn")) {
+          if (!linalg_op->getAttr("use_qnn")) {
             MLLM_WARN("Found non-qnn op: {} in graph: {}", linalg_op->getAOp()->getName(), qnn_graph_name);
             return ir::IRWriter::WalkResult::WALK_BREAK;
           }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if (!linalg_op->getAttr("using_qnn")) {
MLLM_WARN("Found non-qnn op: {} in graph: {}", linalg_op->getAOp()->getName(), qnn_graph_name);
return ir::IRWriter::WalkResult::WALK_BREAK;
}
if (!linalg_op->getAttr("use_qnn")) {
MLLM_WARN("Found non-qnn op: {} in graph: {}", linalg_op->getAOp()->getName(), qnn_graph_name);
return ir::IRWriter::WalkResult::WALK_BREAK;
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp` around lines 267 - 270,
runSimpleGraph is checking the wrong QNN marker attribute ("using_qnn") and thus
can skip ops that the main lowering path (run) marks with "use_qnn"; update the
attribute check in runSimpleGraph to use the same key as run — replace the
linalg_op->getAttr("using_qnn") check with linalg_op->getAttr("use_qnn") (the
change affects the block that logs MLLM_WARN and returns WALK_BREAK) so both
paths consistently recognize QNN ops.

Comment on lines +1003 to +1013
if (real_linear_op && real_linear_op->options().bias) {
auto bias_name = linear_ir->getAOp()->getName() + ".bias";
auto bias_reg_tensor_ir = writer.getContext()->lookupSymbolTable(bias_name);
if (bias_reg_tensor_ir) {
MLLM_RETURN_FALSE_IF_NOT(bias_reg_tensor_ir->isa_<ir::tensor::RegisterOp>());
MLLM_RETURN_FALSE_IF_NOT(bias_reg_tensor_ir->outputs().front()->isa_<ir::tensor::TensorValue>());
auto bias_tensor = bias_reg_tensor_ir->outputs().front()->cast_<ir::tensor::TensorValue>();
auto bias_quant_spec = ir::linalg::QuantizationSpecRaw::create(bias_tensor->tensor_.dtype());
bias_tensor->setAttr("quant_recipe", writer.create<ir::linalg::LinalgIRQuantizatonSpecAttr>(bias_quant_spec));
annotation_attr->annotation_.weights.insert({"bias", bias_quant_spec});
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Do not silently ignore missing Linear bias tensors when bias=true.

This branch skips bias quant annotation if symbol lookup fails, which can leave quant_recipe inconsistent with op options and fail later in less actionable places.

Suggested fix
-    if (real_linear_op && real_linear_op->options().bias) {
+    if (real_linear_op && real_linear_op->options().bias) {
       auto bias_name = linear_ir->getAOp()->getName() + ".bias";
       auto bias_reg_tensor_ir = writer.getContext()->lookupSymbolTable(bias_name);
-      if (bias_reg_tensor_ir) {
-        MLLM_RETURN_FALSE_IF_NOT(bias_reg_tensor_ir->isa_<ir::tensor::RegisterOp>());
-        MLLM_RETURN_FALSE_IF_NOT(bias_reg_tensor_ir->outputs().front()->isa_<ir::tensor::TensorValue>());
-        auto bias_tensor = bias_reg_tensor_ir->outputs().front()->cast_<ir::tensor::TensorValue>();
-        auto bias_quant_spec = ir::linalg::QuantizationSpecRaw::create(bias_tensor->tensor_.dtype());
-        bias_tensor->setAttr("quant_recipe", writer.create<ir::linalg::LinalgIRQuantizatonSpecAttr>(bias_quant_spec));
-        annotation_attr->annotation_.weights.insert({"bias", bias_quant_spec});
-      }
+      MLLM_RETURN_FALSE_IF_NOT(bias_reg_tensor_ir);
+      MLLM_RETURN_FALSE_IF_NOT(bias_reg_tensor_ir->isa_<ir::tensor::RegisterOp>());
+      MLLM_RETURN_FALSE_IF_NOT(bias_reg_tensor_ir->outputs().front()->isa_<ir::tensor::TensorValue>());
+      auto bias_tensor = bias_reg_tensor_ir->outputs().front()->cast_<ir::tensor::TensorValue>();
+      auto bias_quant_spec = ir::linalg::QuantizationSpecRaw::create(bias_tensor->tensor_.dtype());
+      bias_tensor->setAttr("quant_recipe", writer.create<ir::linalg::LinalgIRQuantizatonSpecAttr>(bias_quant_spec));
+      annotation_attr->annotation_.weights.insert({"bias", bias_quant_spec});
     }

As per coding guidelines, "Ensure functions that can fail return appropriate error codes or raise exceptions" and "Validate inputs for public APIs and critical internal functions."

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if (real_linear_op && real_linear_op->options().bias) {
auto bias_name = linear_ir->getAOp()->getName() + ".bias";
auto bias_reg_tensor_ir = writer.getContext()->lookupSymbolTable(bias_name);
if (bias_reg_tensor_ir) {
MLLM_RETURN_FALSE_IF_NOT(bias_reg_tensor_ir->isa_<ir::tensor::RegisterOp>());
MLLM_RETURN_FALSE_IF_NOT(bias_reg_tensor_ir->outputs().front()->isa_<ir::tensor::TensorValue>());
auto bias_tensor = bias_reg_tensor_ir->outputs().front()->cast_<ir::tensor::TensorValue>();
auto bias_quant_spec = ir::linalg::QuantizationSpecRaw::create(bias_tensor->tensor_.dtype());
bias_tensor->setAttr("quant_recipe", writer.create<ir::linalg::LinalgIRQuantizatonSpecAttr>(bias_quant_spec));
annotation_attr->annotation_.weights.insert({"bias", bias_quant_spec});
}
if (real_linear_op && real_linear_op->options().bias) {
auto bias_name = linear_ir->getAOp()->getName() + ".bias";
auto bias_reg_tensor_ir = writer.getContext()->lookupSymbolTable(bias_name);
MLLM_RETURN_FALSE_IF_NOT(bias_reg_tensor_ir);
MLLM_RETURN_FALSE_IF_NOT(bias_reg_tensor_ir->isa_<ir::tensor::RegisterOp>());
MLLM_RETURN_FALSE_IF_NOT(bias_reg_tensor_ir->outputs().front()->isa_<ir::tensor::TensorValue>());
auto bias_tensor = bias_reg_tensor_ir->outputs().front()->cast_<ir::tensor::TensorValue>();
auto bias_quant_spec = ir::linalg::QuantizationSpecRaw::create(bias_tensor->tensor_.dtype());
bias_tensor->setAttr("quant_recipe", writer.create<ir::linalg::LinalgIRQuantizatonSpecAttr>(bias_quant_spec));
annotation_attr->annotation_.weights.insert({"bias", bias_quant_spec});
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp` around lines 1003 -
1013, When real_linear_op indicates bias is true but the symbol lookup for the
bias register (constructed via linear_ir->getAOp()->getName() + ".bias") fails,
do not silently skip — return a failure so quant annotations stay consistent.
Replace the current conditional that ignores a missing bias_reg_tensor_ir with a
check that returns an error (use the project's error macro/return convention,
e.g., MLLM_RETURN_FALSE_IF_NOT or equivalent) when bias_reg_tensor_ir is null,
and only proceed to validate types and attach quant_recipe
(writer.create<ir::linalg::LinalgIRQuantizatonSpecAttr>,
annotation_attr->annotation_.weights.insert) if the lookup succeeded.

Comment on lines +40 to 69
for (const auto& [name, weight_spec] : annotation.weights) {
if (weight_spec->solved) continue;

this_spec->scale_level_0_int = scale1;
this_spec->scale_level_1_fp = scale2;

weight_spec->solved = true;
break;
}
default: {
NYI("quant recipe type not support");
switch (weight_spec->type) {
case ir::linalg::QuantizationSpecType::kRaw: {
weight_spec->solved = true;
break;
}
case ir::linalg::QuantizationSpecType::kLPBQ: {
if (name != "weight") { NYI("LPBQ quant recipe only supports Linear weight, got '{}'.", name); }
auto this_spec = std::static_pointer_cast<ir::linalg::QuantizationSpecLPBQ>(weight_spec);
auto scale1 = pf->pull(mllm_op->getName() + ".scale1"); // using uint8 to store uint4
auto scale2 = pf->pull(mllm_op->getName() + ".scale2");
auto weight = pf->pull(mllm_op->getName() + ".weight");

// FIXME weight maybe error, Check qnn eats int8 or uint8. Here weight using int8 to store int4.
checkTypeLimits<int8_t>(weight, 0, 15); // Int4
checkTypeLimits<uint8_t>(scale1, 0, 16); // UInt4

this_spec->scale_level_0_int = scale1;
this_spec->scale_level_1_fp = scale2;

weight_spec->solved = true;
break;
}
default: {
NYI("quant recipe type not support");
}
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Clarify error message for LPBQ weight name restriction.

Line 49's error message states "LPBQ quant recipe only supports Linear weight" but this function is also invoked for Conv2D operators (line 173). The restriction is actually on the weight parameter name (must be "weight"), not the operator type. The current message may mislead developers debugging Conv2D quantization issues or attempting to apply LPBQ to Conv2D bias weights.

📝 Proposed fix to clarify the error message
-      if (name != "weight") { NYI("LPBQ quant recipe only supports Linear weight, got '{}'.", name); }
+      if (name != "weight") { NYI("LPBQ quant recipe only supports weight parameter named 'weight', got '{}'.", name); }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@mllm/backends/qnn/aot/passes/PTQPass.cpp` around lines 40 - 69, The NYI error
message inside the LPBQ branch is misleading: update the NYI call (inside the
loop over annotation.weights where variables name, weight_spec and
mllm_op->getName() are used) to state that the weight parameter name must be
"weight" (include the offending name and the operator name for context) rather
than saying “only supports Linear weight”; keep the existing guard (if (name !=
"weight")) and change only the message text to clarify the restriction applies
to the parameter name.

Comment on lines +49 to +63
auto weight = writer.getContext()
->lookupSymbolTable(base_op->getName() + ".weight")
->outputs()
.front()
->cast_<ir::tensor::TensorValue>();
qnn_op_node->emplaceInput(env->captureQnnAOTNodeTensor(qnn_context_name, qnn_graph_name, weight, true));
}

if (real_layer_norm_op->options().bias) {
auto bias = writer.getContext()
->lookupSymbolTable(base_op->getName() + ".bias")
->outputs()
.front()
->cast_<ir::tensor::TensorValue>();
qnn_op_node->emplaceInput(env->captureQnnAOTNodeTensor(qnn_context_name, qnn_graph_name, bias, true));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fail fast on missing LayerNorm weight/bias symbols before dereferencing.

Both weight and bias paths dereference lookupSymbolTable(...) results immediately; if a symbol is absent, this can crash the pass instead of returning a controlled failure.

Suggested fix
   if (real_layer_norm_op->options().elementwise_affine) {
-    auto weight = writer.getContext()
-                      ->lookupSymbolTable(base_op->getName() + ".weight")
-                      ->outputs()
-                      .front()
-                      ->cast_<ir::tensor::TensorValue>();
+    auto weight_sym = writer.getContext()->lookupSymbolTable(base_op->getName() + ".weight");
+    MLLM_RETURN_FALSE_IF_NOT(weight_sym);
+    MLLM_RETURN_FALSE_IF_NOT(weight_sym->outputs().front()->isa_<ir::tensor::TensorValue>());
+    auto weight = weight_sym->outputs().front()->cast_<ir::tensor::TensorValue>();
     qnn_op_node->emplaceInput(env->captureQnnAOTNodeTensor(qnn_context_name, qnn_graph_name, weight, true));
   }

   if (real_layer_norm_op->options().bias) {
-    auto bias = writer.getContext()
-                    ->lookupSymbolTable(base_op->getName() + ".bias")
-                    ->outputs()
-                    .front()
-                    ->cast_<ir::tensor::TensorValue>();
+    auto bias_sym = writer.getContext()->lookupSymbolTable(base_op->getName() + ".bias");
+    MLLM_RETURN_FALSE_IF_NOT(bias_sym);
+    MLLM_RETURN_FALSE_IF_NOT(bias_sym->outputs().front()->isa_<ir::tensor::TensorValue>());
+    auto bias = bias_sym->outputs().front()->cast_<ir::tensor::TensorValue>();
     qnn_op_node->emplaceInput(env->captureQnnAOTNodeTensor(qnn_context_name, qnn_graph_name, bias, true));
   }

As per coding guidelines, "Validate inputs for public APIs and critical internal functions."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@mllm/backends/qnn/aot/visitor/LayerNorm.cpp` around lines 49 - 63, The code
dereferences the result of
writer.getContext()->lookupSymbolTable(base_op->getName() + ".weight") and
".bias" without null/emptiness checks; update the LayerNorm visitor to first
validate that lookupSymbolTable(...) returns a non-null pointer and that
->outputs() is non-empty before calling front() and cast_<...>(), and if the
symbol is missing or outputs() is empty return/propagate a controlled error (or
log and abort the pass) instead of dereferencing; apply the same validation to
both the weight and bias paths (referencing getContext(), lookupSymbolTable,
base_op->getName() + ".weight"/".bias", outputs(), front(),
cast_<ir::tensor::TensorValue>(), qnn_op_node->emplaceInput, and
env->captureQnnAOTNodeTensor) so the pass fails fast and safely.

Comment on lines +68 to +72
std::vector<uint32_t> axes_dims = {1};
auto axes_param = QNNParamTensorWrapper::create("axes", base_op->getName() + "_axes", QNN_DATATYPE_UINT_32, axes_dims);
auto* axes_data = reinterpret_cast<uint32_t*>(axes_param->alloc());
axes_data[0] = input->tensor_.shape().size() - 1;
qnn_op_node->emplaceParamTensor(axes_param);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Guard rank before deriving the last-axis index.

shape().size() - 1 underflows for rank-0 tensors and writes an invalid axes value.

Suggested fix
   std::vector<uint32_t> axes_dims = {1};
   auto axes_param = QNNParamTensorWrapper::create("axes", base_op->getName() + "_axes", QNN_DATATYPE_UINT_32, axes_dims);
   auto* axes_data = reinterpret_cast<uint32_t*>(axes_param->alloc());
-  axes_data[0] = input->tensor_.shape().size() - 1;
+  const auto rank = input->tensor_.shape().size();
+  MLLM_RETURN_FALSE_IF_NOT(rank > 0);
+  axes_data[0] = static_cast<uint32_t>(rank - 1);
   qnn_op_node->emplaceParamTensor(axes_param);

As per coding guidelines, "Validate inputs for public APIs and critical internal functions."

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
std::vector<uint32_t> axes_dims = {1};
auto axes_param = QNNParamTensorWrapper::create("axes", base_op->getName() + "_axes", QNN_DATATYPE_UINT_32, axes_dims);
auto* axes_data = reinterpret_cast<uint32_t*>(axes_param->alloc());
axes_data[0] = input->tensor_.shape().size() - 1;
qnn_op_node->emplaceParamTensor(axes_param);
std::vector<uint32_t> axes_dims = {1};
auto axes_param = QNNParamTensorWrapper::create("axes", base_op->getName() + "_axes", QNN_DATATYPE_UINT_32, axes_dims);
auto* axes_data = reinterpret_cast<uint32_t*>(axes_param->alloc());
const auto rank = input->tensor_.shape().size();
MLLM_RETURN_FALSE_IF_NOT(rank > 0);
axes_data[0] = static_cast<uint32_t>(rank - 1);
qnn_op_node->emplaceParamTensor(axes_param);
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@mllm/backends/qnn/aot/visitor/LayerNorm.cpp` around lines 68 - 72, The code
writes axes_data[0] = input->tensor_.shape().size() - 1 which underflows for
rank-0 tensors; fix by guarding the rank before subtracting: read size_t rank =
input->tensor_.shape().size(); if rank == 0 either set the axis to 0
(axes_data[0] = 0) or return/log an error for invalid input depending on
LayerNorm's contract, otherwise set axes_data[0] = static_cast<uint32_t>(rank -
1); keep the existing axes_param/axes_data and qnn_op_node->emplaceParamTensor
use but ensure you validate rank first to avoid underflow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant