Add QNN AOT support for visual graph lowering#675
Conversation
📝 WalkthroughWalkthroughThis PR expands QNN AOT lowering infrastructure by adding enhanced diagnostic output to memory allocation, introducing tensor naming improvements and quantization type handling, launching a simple-graph lowering path, implementing new quantization recipe patterns for LayerNorm and GELU, extending PTQ weight solving, and providing new QNN operator visitors and refinements to existing ones. ChangesQNN AOT Lowering & Quantization Enhancement
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 7
🧹 Nitpick comments (1)
mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp (1)
291-291: ⚡ Quick winFix: this
IRWriter::WALK_CONTINUEis valid (no compile break)
mllm/compile/ir/Node.hppdefinesIRWriter::enum WalkResult { WALK_CONTINUE = 0, ... }as an unscoped enum, soir::IRWriter::WALK_CONTINUEis a valid qualifier; the concern aboutWalkResult-scoping causing a compile break doesn’t apply. For consistency with nearby uses, consider switching this site toir::IRWriter::WalkResult::WALK_CONTINUE.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp` at line 291, The return uses the unscoped enum constant ir::IRWriter::WALK_CONTINUE which is valid but inconsistent with nearby uses; update the return expression in LLM2QnnLoweringPass.cpp to use the scoped enum form ir::IRWriter::WalkResult::WALK_CONTINUE (or the equivalent scoped WalkResult qualifier used elsewhere) so the code matches surrounding style and improves readability while leaving behavior unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp`:
- Around line 267-270: runSimpleGraph is checking the wrong QNN marker attribute
("using_qnn") and thus can skip ops that the main lowering path (run) marks with
"use_qnn"; update the attribute check in runSimpleGraph to use the same key as
run — replace the linalg_op->getAttr("using_qnn") check with
linalg_op->getAttr("use_qnn") (the change affects the block that logs MLLM_WARN
and returns WALK_BREAK) so both paths consistently recognize QNN ops.
- Around line 255-258: Guard access to CallGraphOp's symbol attribute before
dereferencing: in the block where call_op is obtained (call_op =
some_op->cast_<ir::graph::CallGraphOp>()), check that call_op->getSymbolAttr()
is non-null (and optionally that getSymbolAttr()->str() is non-empty) before
passing it to getCtx()->lookupSymbolTable; if the attribute is null, call
MLLM_ERROR_EXIT (same style as existing error handling) with a clear message
that the CallGraphOp has no symbol attribute so you avoid crashing during
lookup.
- Around line 246-261: The recursive lambda lower_subgraph_inline lacks cycle
protection; add a visited set (e.g., std::unordered_set<std::string> visited or
std::unordered_set<ir::graph::SubGraphOp::ptr_t>) captured by reference into the
lambda, and at the start of lower_subgraph_inline check if the subgraph (by
pointer or by call_op->getSymbolAttr()->str()) is already in visited and return
immediately if so; otherwise insert it before walking the region and skip
recursing into called subgraphs that are present in visited to prevent infinite
recursion when handling ir::graph::CallGraphOp, ensuring the check happens
before calling lower_subgraph_inline(called->cast_<ir::graph::SubGraphOp>()).
In `@mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp`:
- Around line 1003-1013: When real_linear_op indicates bias is true but the
symbol lookup for the bias register (constructed via
linear_ir->getAOp()->getName() + ".bias") fails, do not silently skip — return a
failure so quant annotations stay consistent. Replace the current conditional
that ignores a missing bias_reg_tensor_ir with a check that returns an error
(use the project's error macro/return convention, e.g., MLLM_RETURN_FALSE_IF_NOT
or equivalent) when bias_reg_tensor_ir is null, and only proceed to validate
types and attach quant_recipe
(writer.create<ir::linalg::LinalgIRQuantizatonSpecAttr>,
annotation_attr->annotation_.weights.insert) if the lookup succeeded.
In `@mllm/backends/qnn/aot/passes/PTQPass.cpp`:
- Around line 40-69: The NYI error message inside the LPBQ branch is misleading:
update the NYI call (inside the loop over annotation.weights where variables
name, weight_spec and mllm_op->getName() are used) to state that the weight
parameter name must be "weight" (include the offending name and the operator
name for context) rather than saying “only supports Linear weight”; keep the
existing guard (if (name != "weight")) and change only the message text to
clarify the restriction applies to the parameter name.
In `@mllm/backends/qnn/aot/visitor/LayerNorm.cpp`:
- Around line 49-63: The code dereferences the result of
writer.getContext()->lookupSymbolTable(base_op->getName() + ".weight") and
".bias" without null/emptiness checks; update the LayerNorm visitor to first
validate that lookupSymbolTable(...) returns a non-null pointer and that
->outputs() is non-empty before calling front() and cast_<...>(), and if the
symbol is missing or outputs() is empty return/propagate a controlled error (or
log and abort the pass) instead of dereferencing; apply the same validation to
both the weight and bias paths (referencing getContext(), lookupSymbolTable,
base_op->getName() + ".weight"/".bias", outputs(), front(),
cast_<ir::tensor::TensorValue>(), qnn_op_node->emplaceInput, and
env->captureQnnAOTNodeTensor) so the pass fails fast and safely.
- Around line 68-72: The code writes axes_data[0] =
input->tensor_.shape().size() - 1 which underflows for rank-0 tensors; fix by
guarding the rank before subtracting: read size_t rank =
input->tensor_.shape().size(); if rank == 0 either set the axis to 0
(axes_data[0] = 0) or return/log an error for invalid input depending on
LayerNorm's contract, otherwise set axes_data[0] = static_cast<uint32_t>(rank -
1); keep the existing axes_param/axes_data and qnn_op_node->emplaceParamTensor
use but ensure you validate rank first to avoid underflow.
---
Nitpick comments:
In `@mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp`:
- Line 291: The return uses the unscoped enum constant
ir::IRWriter::WALK_CONTINUE which is valid but inconsistent with nearby uses;
update the return expression in LLM2QnnLoweringPass.cpp to use the scoped enum
form ir::IRWriter::WalkResult::WALK_CONTINUE (or the equivalent scoped
WalkResult qualifier used elsewhere) so the code matches surrounding style and
improves readability while leaving behavior unchanged.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 3ccddc83-593c-4f92-977a-6b8d125f926f
📒 Files selected for processing (22)
mllm/backends/qnn/QNNAllocator.cppmllm/backends/qnn/QNNBackend.cppmllm/backends/qnn/QNNModel.cppmllm/backends/qnn/QNNUtils.hppmllm/backends/qnn/aot/QnnWrappersAPI.cppmllm/backends/qnn/aot/passes/AOTPipeline.cppmllm/backends/qnn/aot/passes/AOTPipeline.hppmllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cppmllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.hppmllm/backends/qnn/aot/passes/LLMQuantRecipePass.cppmllm/backends/qnn/aot/passes/LLMQuantRecipePass.hppmllm/backends/qnn/aot/passes/PTQPass.cppmllm/backends/qnn/aot/visitor/Conv2D.cppmllm/backends/qnn/aot/visitor/GELU.cppmllm/backends/qnn/aot/visitor/GELU.hppmllm/backends/qnn/aot/visitor/LayerNorm.cppmllm/backends/qnn/aot/visitor/LayerNorm.hppmllm/backends/qnn/aot/visitor/Matmul.cppmllm/backends/qnn/aot/visitor/Repeat.cppmllm/backends/qnn/aot/visitor/View.cppmllm/core/aops/VisionRoPEOp.cppmllm/mllm.hpp
| std::function<void(ir::graph::SubGraphOp::ptr_t)> lower_subgraph_inline; | ||
| lower_subgraph_inline = [&](ir::graph::SubGraphOp::ptr_t subgraph) { | ||
| auto region = subgraph->getTopRegion(); | ||
| if (!region) { return; } | ||
|
|
||
| auto subgraph_writer = ir::IRWriter(getCtx(), region); | ||
| subgraph_writer.walk<ir::Op>( | ||
| [&](ir::IRWriter& this_tough_writer, const ir::Op::ptr_t& some_op) -> ir::IRWriter::WalkResult { | ||
| if (some_op->isa_<ir::graph::CallGraphOp>()) { | ||
| auto call_op = some_op->cast_<ir::graph::CallGraphOp>(); | ||
| auto called = getCtx()->lookupSymbolTable(call_op->getSymbolAttr()->str()); | ||
| if (!called || !called->isa_<ir::graph::SubGraphOp>()) { | ||
| MLLM_ERROR_EXIT(ExitCode::kCoreError, "Cannot find called subgraph {}", call_op->getSymbolAttr()->str()); | ||
| } | ||
| lower_subgraph_inline(called->cast_<ir::graph::SubGraphOp>()); | ||
| return ir::IRWriter::WalkResult::WALK_CONTINUE; |
There was a problem hiding this comment.
Add cycle protection in recursive subgraph lowering.
lower_subgraph_inline recursively follows CallGraphOp edges but has no visited-set guard. Cyclic callgraphs will recurse indefinitely and can overflow the stack.
Suggested fix
+ std::unordered_set<std::string> visiting;
std::function<void(ir::graph::SubGraphOp::ptr_t)> lower_subgraph_inline;
lower_subgraph_inline = [&](ir::graph::SubGraphOp::ptr_t subgraph) {
+ auto symbol = subgraph->getSymbolAttr();
+ const std::string current = symbol ? symbol->str() : "";
+ if (!current.empty()) {
+ if (visiting.count(current)) {
+ MLLM_ERROR_EXIT(ExitCode::kCoreError, "Detected cyclic subgraph call at {}", current);
+ }
+ visiting.insert(current);
+ }
+
auto region = subgraph->getTopRegion();
- if (!region) { return; }
+ if (!region) {
+ if (!current.empty()) { visiting.erase(current); }
+ return;
+ }
...
- lower_subgraph_inline(called->cast_<ir::graph::SubGraphOp>());
+ lower_subgraph_inline(called->cast_<ir::graph::SubGraphOp>());
return ir::IRWriter::WalkResult::WALK_CONTINUE;
}
...
- });
+ });
+ if (!current.empty()) { visiting.erase(current); }
};🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp` around lines 246 - 261,
The recursive lambda lower_subgraph_inline lacks cycle protection; add a visited
set (e.g., std::unordered_set<std::string> visited or
std::unordered_set<ir::graph::SubGraphOp::ptr_t>) captured by reference into the
lambda, and at the start of lower_subgraph_inline check if the subgraph (by
pointer or by call_op->getSymbolAttr()->str()) is already in visited and return
immediately if so; otherwise insert it before walking the region and skip
recursing into called subgraphs that are present in visited to prevent infinite
recursion when handling ir::graph::CallGraphOp, ensuring the check happens
before calling lower_subgraph_inline(called->cast_<ir::graph::SubGraphOp>()).
| auto call_op = some_op->cast_<ir::graph::CallGraphOp>(); | ||
| auto called = getCtx()->lookupSymbolTable(call_op->getSymbolAttr()->str()); | ||
| if (!called || !called->isa_<ir::graph::SubGraphOp>()) { | ||
| MLLM_ERROR_EXIT(ExitCode::kCoreError, "Cannot find called subgraph {}", call_op->getSymbolAttr()->str()); |
There was a problem hiding this comment.
Guard CallGraphOp symbol access before dereference.
Line 256 dereferences call_op->getSymbolAttr() without a null check. A malformed IR node would crash the pass instead of returning a controlled failure.
Suggested fix
if (some_op->isa_<ir::graph::CallGraphOp>()) {
auto call_op = some_op->cast_<ir::graph::CallGraphOp>();
- auto called = getCtx()->lookupSymbolTable(call_op->getSymbolAttr()->str());
+ auto callee_symbol = call_op->getSymbolAttr();
+ if (!callee_symbol) {
+ MLLM_ERROR_EXIT(ExitCode::kCoreError, "CallGraphOp missing symbol attr");
+ }
+ auto called = getCtx()->lookupSymbolTable(callee_symbol->str());
if (!called || !called->isa_<ir::graph::SubGraphOp>()) {
- MLLM_ERROR_EXIT(ExitCode::kCoreError, "Cannot find called subgraph {}", call_op->getSymbolAttr()->str());
+ MLLM_ERROR_EXIT(ExitCode::kCoreError, "Cannot find called subgraph {}", callee_symbol->str());
}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp` around lines 255 - 258,
Guard access to CallGraphOp's symbol attribute before dereferencing: in the
block where call_op is obtained (call_op =
some_op->cast_<ir::graph::CallGraphOp>()), check that call_op->getSymbolAttr()
is non-null (and optionally that getSymbolAttr()->str() is non-empty) before
passing it to getCtx()->lookupSymbolTable; if the attribute is null, call
MLLM_ERROR_EXIT (same style as existing error handling) with a clear message
that the CallGraphOp has no symbol attribute so you avoid crashing during
lookup.
| if (!linalg_op->getAttr("using_qnn")) { | ||
| MLLM_WARN("Found non-qnn op: {} in graph: {}", linalg_op->getAOp()->getName(), qnn_graph_name); | ||
| return ir::IRWriter::WalkResult::WALK_BREAK; | ||
| } |
There was a problem hiding this comment.
Use the same QNN marker attribute key as the main lowering path.
runSimpleGraph checks using_qnn, but run checks use_qnn (Line 177). This mismatch can skip valid QNN ops in simple-graph mode.
Suggested fix
- if (!linalg_op->getAttr("using_qnn")) {
+ if (!linalg_op->getAttr("use_qnn")) {
MLLM_WARN("Found non-qnn op: {} in graph: {}", linalg_op->getAOp()->getName(), qnn_graph_name);
return ir::IRWriter::WalkResult::WALK_BREAK;
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if (!linalg_op->getAttr("using_qnn")) { | |
| MLLM_WARN("Found non-qnn op: {} in graph: {}", linalg_op->getAOp()->getName(), qnn_graph_name); | |
| return ir::IRWriter::WalkResult::WALK_BREAK; | |
| } | |
| if (!linalg_op->getAttr("use_qnn")) { | |
| MLLM_WARN("Found non-qnn op: {} in graph: {}", linalg_op->getAOp()->getName(), qnn_graph_name); | |
| return ir::IRWriter::WalkResult::WALK_BREAK; | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp` around lines 267 - 270,
runSimpleGraph is checking the wrong QNN marker attribute ("using_qnn") and thus
can skip ops that the main lowering path (run) marks with "use_qnn"; update the
attribute check in runSimpleGraph to use the same key as run — replace the
linalg_op->getAttr("using_qnn") check with linalg_op->getAttr("use_qnn") (the
change affects the block that logs MLLM_WARN and returns WALK_BREAK) so both
paths consistently recognize QNN ops.
| if (real_linear_op && real_linear_op->options().bias) { | ||
| auto bias_name = linear_ir->getAOp()->getName() + ".bias"; | ||
| auto bias_reg_tensor_ir = writer.getContext()->lookupSymbolTable(bias_name); | ||
| if (bias_reg_tensor_ir) { | ||
| MLLM_RETURN_FALSE_IF_NOT(bias_reg_tensor_ir->isa_<ir::tensor::RegisterOp>()); | ||
| MLLM_RETURN_FALSE_IF_NOT(bias_reg_tensor_ir->outputs().front()->isa_<ir::tensor::TensorValue>()); | ||
| auto bias_tensor = bias_reg_tensor_ir->outputs().front()->cast_<ir::tensor::TensorValue>(); | ||
| auto bias_quant_spec = ir::linalg::QuantizationSpecRaw::create(bias_tensor->tensor_.dtype()); | ||
| bias_tensor->setAttr("quant_recipe", writer.create<ir::linalg::LinalgIRQuantizatonSpecAttr>(bias_quant_spec)); | ||
| annotation_attr->annotation_.weights.insert({"bias", bias_quant_spec}); | ||
| } |
There was a problem hiding this comment.
Do not silently ignore missing Linear bias tensors when bias=true.
This branch skips bias quant annotation if symbol lookup fails, which can leave quant_recipe inconsistent with op options and fail later in less actionable places.
Suggested fix
- if (real_linear_op && real_linear_op->options().bias) {
+ if (real_linear_op && real_linear_op->options().bias) {
auto bias_name = linear_ir->getAOp()->getName() + ".bias";
auto bias_reg_tensor_ir = writer.getContext()->lookupSymbolTable(bias_name);
- if (bias_reg_tensor_ir) {
- MLLM_RETURN_FALSE_IF_NOT(bias_reg_tensor_ir->isa_<ir::tensor::RegisterOp>());
- MLLM_RETURN_FALSE_IF_NOT(bias_reg_tensor_ir->outputs().front()->isa_<ir::tensor::TensorValue>());
- auto bias_tensor = bias_reg_tensor_ir->outputs().front()->cast_<ir::tensor::TensorValue>();
- auto bias_quant_spec = ir::linalg::QuantizationSpecRaw::create(bias_tensor->tensor_.dtype());
- bias_tensor->setAttr("quant_recipe", writer.create<ir::linalg::LinalgIRQuantizatonSpecAttr>(bias_quant_spec));
- annotation_attr->annotation_.weights.insert({"bias", bias_quant_spec});
- }
+ MLLM_RETURN_FALSE_IF_NOT(bias_reg_tensor_ir);
+ MLLM_RETURN_FALSE_IF_NOT(bias_reg_tensor_ir->isa_<ir::tensor::RegisterOp>());
+ MLLM_RETURN_FALSE_IF_NOT(bias_reg_tensor_ir->outputs().front()->isa_<ir::tensor::TensorValue>());
+ auto bias_tensor = bias_reg_tensor_ir->outputs().front()->cast_<ir::tensor::TensorValue>();
+ auto bias_quant_spec = ir::linalg::QuantizationSpecRaw::create(bias_tensor->tensor_.dtype());
+ bias_tensor->setAttr("quant_recipe", writer.create<ir::linalg::LinalgIRQuantizatonSpecAttr>(bias_quant_spec));
+ annotation_attr->annotation_.weights.insert({"bias", bias_quant_spec});
}As per coding guidelines, "Ensure functions that can fail return appropriate error codes or raise exceptions" and "Validate inputs for public APIs and critical internal functions."
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if (real_linear_op && real_linear_op->options().bias) { | |
| auto bias_name = linear_ir->getAOp()->getName() + ".bias"; | |
| auto bias_reg_tensor_ir = writer.getContext()->lookupSymbolTable(bias_name); | |
| if (bias_reg_tensor_ir) { | |
| MLLM_RETURN_FALSE_IF_NOT(bias_reg_tensor_ir->isa_<ir::tensor::RegisterOp>()); | |
| MLLM_RETURN_FALSE_IF_NOT(bias_reg_tensor_ir->outputs().front()->isa_<ir::tensor::TensorValue>()); | |
| auto bias_tensor = bias_reg_tensor_ir->outputs().front()->cast_<ir::tensor::TensorValue>(); | |
| auto bias_quant_spec = ir::linalg::QuantizationSpecRaw::create(bias_tensor->tensor_.dtype()); | |
| bias_tensor->setAttr("quant_recipe", writer.create<ir::linalg::LinalgIRQuantizatonSpecAttr>(bias_quant_spec)); | |
| annotation_attr->annotation_.weights.insert({"bias", bias_quant_spec}); | |
| } | |
| if (real_linear_op && real_linear_op->options().bias) { | |
| auto bias_name = linear_ir->getAOp()->getName() + ".bias"; | |
| auto bias_reg_tensor_ir = writer.getContext()->lookupSymbolTable(bias_name); | |
| MLLM_RETURN_FALSE_IF_NOT(bias_reg_tensor_ir); | |
| MLLM_RETURN_FALSE_IF_NOT(bias_reg_tensor_ir->isa_<ir::tensor::RegisterOp>()); | |
| MLLM_RETURN_FALSE_IF_NOT(bias_reg_tensor_ir->outputs().front()->isa_<ir::tensor::TensorValue>()); | |
| auto bias_tensor = bias_reg_tensor_ir->outputs().front()->cast_<ir::tensor::TensorValue>(); | |
| auto bias_quant_spec = ir::linalg::QuantizationSpecRaw::create(bias_tensor->tensor_.dtype()); | |
| bias_tensor->setAttr("quant_recipe", writer.create<ir::linalg::LinalgIRQuantizatonSpecAttr>(bias_quant_spec)); | |
| annotation_attr->annotation_.weights.insert({"bias", bias_quant_spec}); | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp` around lines 1003 -
1013, When real_linear_op indicates bias is true but the symbol lookup for the
bias register (constructed via linear_ir->getAOp()->getName() + ".bias") fails,
do not silently skip — return a failure so quant annotations stay consistent.
Replace the current conditional that ignores a missing bias_reg_tensor_ir with a
check that returns an error (use the project's error macro/return convention,
e.g., MLLM_RETURN_FALSE_IF_NOT or equivalent) when bias_reg_tensor_ir is null,
and only proceed to validate types and attach quant_recipe
(writer.create<ir::linalg::LinalgIRQuantizatonSpecAttr>,
annotation_attr->annotation_.weights.insert) if the lookup succeeded.
| for (const auto& [name, weight_spec] : annotation.weights) { | ||
| if (weight_spec->solved) continue; | ||
|
|
||
| this_spec->scale_level_0_int = scale1; | ||
| this_spec->scale_level_1_fp = scale2; | ||
|
|
||
| weight_spec->solved = true; | ||
| break; | ||
| } | ||
| default: { | ||
| NYI("quant recipe type not support"); | ||
| switch (weight_spec->type) { | ||
| case ir::linalg::QuantizationSpecType::kRaw: { | ||
| weight_spec->solved = true; | ||
| break; | ||
| } | ||
| case ir::linalg::QuantizationSpecType::kLPBQ: { | ||
| if (name != "weight") { NYI("LPBQ quant recipe only supports Linear weight, got '{}'.", name); } | ||
| auto this_spec = std::static_pointer_cast<ir::linalg::QuantizationSpecLPBQ>(weight_spec); | ||
| auto scale1 = pf->pull(mllm_op->getName() + ".scale1"); // using uint8 to store uint4 | ||
| auto scale2 = pf->pull(mllm_op->getName() + ".scale2"); | ||
| auto weight = pf->pull(mllm_op->getName() + ".weight"); | ||
|
|
||
| // FIXME weight maybe error, Check qnn eats int8 or uint8. Here weight using int8 to store int4. | ||
| checkTypeLimits<int8_t>(weight, 0, 15); // Int4 | ||
| checkTypeLimits<uint8_t>(scale1, 0, 16); // UInt4 | ||
|
|
||
| this_spec->scale_level_0_int = scale1; | ||
| this_spec->scale_level_1_fp = scale2; | ||
|
|
||
| weight_spec->solved = true; | ||
| break; | ||
| } | ||
| default: { | ||
| NYI("quant recipe type not support"); | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Clarify error message for LPBQ weight name restriction.
Line 49's error message states "LPBQ quant recipe only supports Linear weight" but this function is also invoked for Conv2D operators (line 173). The restriction is actually on the weight parameter name (must be "weight"), not the operator type. The current message may mislead developers debugging Conv2D quantization issues or attempting to apply LPBQ to Conv2D bias weights.
📝 Proposed fix to clarify the error message
- if (name != "weight") { NYI("LPBQ quant recipe only supports Linear weight, got '{}'.", name); }
+ if (name != "weight") { NYI("LPBQ quant recipe only supports weight parameter named 'weight', got '{}'.", name); }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@mllm/backends/qnn/aot/passes/PTQPass.cpp` around lines 40 - 69, The NYI error
message inside the LPBQ branch is misleading: update the NYI call (inside the
loop over annotation.weights where variables name, weight_spec and
mllm_op->getName() are used) to state that the weight parameter name must be
"weight" (include the offending name and the operator name for context) rather
than saying “only supports Linear weight”; keep the existing guard (if (name !=
"weight")) and change only the message text to clarify the restriction applies
to the parameter name.
| auto weight = writer.getContext() | ||
| ->lookupSymbolTable(base_op->getName() + ".weight") | ||
| ->outputs() | ||
| .front() | ||
| ->cast_<ir::tensor::TensorValue>(); | ||
| qnn_op_node->emplaceInput(env->captureQnnAOTNodeTensor(qnn_context_name, qnn_graph_name, weight, true)); | ||
| } | ||
|
|
||
| if (real_layer_norm_op->options().bias) { | ||
| auto bias = writer.getContext() | ||
| ->lookupSymbolTable(base_op->getName() + ".bias") | ||
| ->outputs() | ||
| .front() | ||
| ->cast_<ir::tensor::TensorValue>(); | ||
| qnn_op_node->emplaceInput(env->captureQnnAOTNodeTensor(qnn_context_name, qnn_graph_name, bias, true)); |
There was a problem hiding this comment.
Fail fast on missing LayerNorm weight/bias symbols before dereferencing.
Both weight and bias paths dereference lookupSymbolTable(...) results immediately; if a symbol is absent, this can crash the pass instead of returning a controlled failure.
Suggested fix
if (real_layer_norm_op->options().elementwise_affine) {
- auto weight = writer.getContext()
- ->lookupSymbolTable(base_op->getName() + ".weight")
- ->outputs()
- .front()
- ->cast_<ir::tensor::TensorValue>();
+ auto weight_sym = writer.getContext()->lookupSymbolTable(base_op->getName() + ".weight");
+ MLLM_RETURN_FALSE_IF_NOT(weight_sym);
+ MLLM_RETURN_FALSE_IF_NOT(weight_sym->outputs().front()->isa_<ir::tensor::TensorValue>());
+ auto weight = weight_sym->outputs().front()->cast_<ir::tensor::TensorValue>();
qnn_op_node->emplaceInput(env->captureQnnAOTNodeTensor(qnn_context_name, qnn_graph_name, weight, true));
}
if (real_layer_norm_op->options().bias) {
- auto bias = writer.getContext()
- ->lookupSymbolTable(base_op->getName() + ".bias")
- ->outputs()
- .front()
- ->cast_<ir::tensor::TensorValue>();
+ auto bias_sym = writer.getContext()->lookupSymbolTable(base_op->getName() + ".bias");
+ MLLM_RETURN_FALSE_IF_NOT(bias_sym);
+ MLLM_RETURN_FALSE_IF_NOT(bias_sym->outputs().front()->isa_<ir::tensor::TensorValue>());
+ auto bias = bias_sym->outputs().front()->cast_<ir::tensor::TensorValue>();
qnn_op_node->emplaceInput(env->captureQnnAOTNodeTensor(qnn_context_name, qnn_graph_name, bias, true));
}As per coding guidelines, "Validate inputs for public APIs and critical internal functions."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@mllm/backends/qnn/aot/visitor/LayerNorm.cpp` around lines 49 - 63, The code
dereferences the result of
writer.getContext()->lookupSymbolTable(base_op->getName() + ".weight") and
".bias" without null/emptiness checks; update the LayerNorm visitor to first
validate that lookupSymbolTable(...) returns a non-null pointer and that
->outputs() is non-empty before calling front() and cast_<...>(), and if the
symbol is missing or outputs() is empty return/propagate a controlled error (or
log and abort the pass) instead of dereferencing; apply the same validation to
both the weight and bias paths (referencing getContext(), lookupSymbolTable,
base_op->getName() + ".weight"/".bias", outputs(), front(),
cast_<ir::tensor::TensorValue>(), qnn_op_node->emplaceInput, and
env->captureQnnAOTNodeTensor) so the pass fails fast and safely.
| std::vector<uint32_t> axes_dims = {1}; | ||
| auto axes_param = QNNParamTensorWrapper::create("axes", base_op->getName() + "_axes", QNN_DATATYPE_UINT_32, axes_dims); | ||
| auto* axes_data = reinterpret_cast<uint32_t*>(axes_param->alloc()); | ||
| axes_data[0] = input->tensor_.shape().size() - 1; | ||
| qnn_op_node->emplaceParamTensor(axes_param); |
There was a problem hiding this comment.
Guard rank before deriving the last-axis index.
shape().size() - 1 underflows for rank-0 tensors and writes an invalid axes value.
Suggested fix
std::vector<uint32_t> axes_dims = {1};
auto axes_param = QNNParamTensorWrapper::create("axes", base_op->getName() + "_axes", QNN_DATATYPE_UINT_32, axes_dims);
auto* axes_data = reinterpret_cast<uint32_t*>(axes_param->alloc());
- axes_data[0] = input->tensor_.shape().size() - 1;
+ const auto rank = input->tensor_.shape().size();
+ MLLM_RETURN_FALSE_IF_NOT(rank > 0);
+ axes_data[0] = static_cast<uint32_t>(rank - 1);
qnn_op_node->emplaceParamTensor(axes_param);As per coding guidelines, "Validate inputs for public APIs and critical internal functions."
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| std::vector<uint32_t> axes_dims = {1}; | |
| auto axes_param = QNNParamTensorWrapper::create("axes", base_op->getName() + "_axes", QNN_DATATYPE_UINT_32, axes_dims); | |
| auto* axes_data = reinterpret_cast<uint32_t*>(axes_param->alloc()); | |
| axes_data[0] = input->tensor_.shape().size() - 1; | |
| qnn_op_node->emplaceParamTensor(axes_param); | |
| std::vector<uint32_t> axes_dims = {1}; | |
| auto axes_param = QNNParamTensorWrapper::create("axes", base_op->getName() + "_axes", QNN_DATATYPE_UINT_32, axes_dims); | |
| auto* axes_data = reinterpret_cast<uint32_t*>(axes_param->alloc()); | |
| const auto rank = input->tensor_.shape().size(); | |
| MLLM_RETURN_FALSE_IF_NOT(rank > 0); | |
| axes_data[0] = static_cast<uint32_t>(rank - 1); | |
| qnn_op_node->emplaceParamTensor(axes_param); |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@mllm/backends/qnn/aot/visitor/LayerNorm.cpp` around lines 68 - 72, The code
writes axes_data[0] = input->tensor_.shape().size() - 1 which underflows for
rank-0 tensors; fix by guarding the rank before subtracting: read size_t rank =
input->tensor_.shape().size(); if rank == 0 either set the axis to 0
(axes_data[0] = 0) or return/log an error for invalid input depending on
LayerNorm's contract, otherwise set axes_data[0] = static_cast<uint32_t>(rank -
1); keep the existing axes_param/axes_data and qnn_op_node->emplaceParamTensor
use but ensure you validate rank first to avoid underflow.
Please check Guidelines for Contributing.
Summary
Depends on #674.
This PR adds generic QNN AOT support needed for lowering visual encoder graphs:
GELUandLayerNorm.GELUandLayerNorm.allow_raw_float_linear=trueis explicitly set in the quant recipe config.This is model-agnostic infrastructure. A follow-up PR will add the Qwen2-VL 2B full-QNN example and runtime scripts.
Validation
git diff --cached --checkcmake --build build-qnn-aot --target mllm-qwen2vl-aot-c -j2Summary by CodeRabbit
New Features
Bug Fixes
Improvements