Skip to content

Add Qwen2-VL full QNN AOT example#676

Open
twlddd wants to merge 7 commits into
UbiquitousLearning:mainfrom
twlddd:qwen2vl-full-qnn-example-pr3
Open

Add Qwen2-VL full QNN AOT example#676
twlddd wants to merge 7 commits into
UbiquitousLearning:mainfrom
twlddd:qwen2vl-full-qnn-example-pr3

Conversation

@twlddd
Copy link
Copy Markdown

@twlddd twlddd commented May 18, 2026

Please check Guidelines for Contributing.

Draft / stacked PR. Depends on #675.

Summary

This PR adds a Qwen2-VL 2B full-QNN AOT example.

It includes:

  • Qwen2-VL QNN AOT context compiler.
  • Android QNN AOT runner for image-text inference.
  • Interactive terminal runner script.
  • Visual encoder QNN AOT path.
  • Visual bucket selection and masked padding support.
  • Qwen2-VL QNN config examples.
  • Basic fixed-set evaluation scripts.
  • Qwen2-VL model-side helpers for bucketed image preprocessing and traceable visual execution.

The compiled context binaries and model weights are intentionally not included.

Validation

  • bash -n scripts/qwen2vl_qnn/common.sh scripts/qwen2vl_qnn/push_eval_assets.sh scripts/qwen2vl_qnn/run_qnn_eval_fixed_set.sh scripts/qwen2vl_qnn/run_qnn_interactive.sh
  • cmake --build build-qnn-aot --target mllm-qwen2vl-aot-c mllm-qwen2vl-visual-aot-diag -j2

Notes

This PR is part of the full-QNN Qwen2-VL work:

  1. Shared QNN AOT runtime/lowering fixes.
  2. Generic QNN AOT visual graph lowering support.
  3. This PR: Qwen2-VL full-QNN example and runner.
    Update: the Qwen2-VL 2B example baseline is now the 2026-05-21 full-QNN AOT baseline, using W4A16 LLM weights, FP16 visual encoder weights, and a 5-bucket visual context.

Summary by CodeRabbit

  • New Features

    • Qwen2‑VL 2B visual model support for Qualcomm QNN with ahead‑of‑time compilation and device runners
    • Interactive inference now emits a performance summary; new diagnostics for visual processing and padding
    • New scripts to run, evaluate, and push assets to remote devices for automated on‑device testing
  • Documentation

    • Added end‑to‑end Qwen2‑VL QNN AOT workflow docs and updated supported models table
  • Chores

    • Ignore top‑level logs/ directory in repository configuration

Review Change Stack

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 18, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d20e73b5-d19b-4cc2-b38f-49acffc49180

📥 Commits

Reviewing files that changed from the base of the PR and between d3da4c5 and 4a420c5.

📒 Files selected for processing (1)
  • README.md
✅ Files skipped from review due to trivial changes (1)
  • README.md

📝 Walkthrough

Walkthrough

This PR adds full Qwen2VL QNN AOT support: model configs and traceable modules, visual AOT rewrites and bundle segmentation, quant recipe and PTQ extensions (LayerNorm/GELU/Linear), QNN runtime robustness fixes, visual runners/diagnostics, deployment scripts, and build/documentation updates.

Changes

Qwen2VL QNN AOT End-to-End Implementation

Layer / File(s) Summary
Model configuration and traceable implementation
examples/qwen2vl/config_2B_qnn_lpbq.json, examples/qwen2vl/qnn_aot_cfg_*.json, examples/qwen2vl_qnn_aot/modeling_qwen2vl_qnn_aot.hpp, mllm/models/qwen2vl/modeling_qwen2vl_traceable.hpp, mllm/models/qwen2vl/image_preprocessor_qwen2vl.hpp
Qwen2VL model configuration files and a new traceable AOT modeling header introduce QDQ-aware modules (Qwen2VLMLP, Qwen2VLAttention, Qwen2VLDecoder, Qwen2VLText, Qwen2VLForCausalLM), bias quant helpers, per-head debug outputs, RoPE handling, and an image preprocessor overload for explicit grid sizing.
Visual tower AOT building blocks and utilities
examples/qwen2vl_qnn_aot/visual_aot_helpers.hpp
Adds AOT-rewriteable vision modules (patch embed linear, QuickGELU MLP, masked attention with Vision RoPE, vision block and transformer wrapper), plus visual-bundle segmentation, graph naming, and weight-reshape helpers.
Visual tower compilation pipeline
examples/qwen2vl_qnn_aot/compile_visual.cpp, examples/qwen2vl_qnn_aot/compile.cpp
New CLIs to trace and optionally lower/compile visual and LLM graphs into QNN AOT contexts, supporting bundle layouts, QDQ overrides, and saving compiled contexts.
Quantization recipe patterns (LayerNorm, GELU, Linear updates)
mllm/backends/qnn/aot/passes/LLMQuantRecipePass.hpp, mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp, mllm/backends/qnn/aot/visitor/GELU.hpp, mllm/backends/qnn/aot/visitor/GELU.cpp, mllm/backends/qnn/aot/visitor/LayerNorm.hpp, mllm/backends/qnn/aot/visitor/LayerNorm.cpp
Introduces LayerNorm and GELU quant recipe patterns, extends Linear handling (allow_raw_float_linear, bias quantization), and updates elementwise constant quant handling.
LLM2QnnLoweringPass simple graph support
mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.hpp, mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp
Pass now accepts an optional simple QNN graph name, registers new patterns (GELU/LayerNorm), and implements runSimpleGraph to lower non-model/simple graphs (e.g., visual segments).
AOT pipeline creation and utilities
mllm/backends/qnn/aot/passes/AOTPipeline.hpp, mllm/backends/qnn/aot/passes/AOTPipeline.cpp, mllm/backends/qnn/aot/QnnWrappersAPI.cpp, mllm/backends/qnn/aot/passes/PTQPass.cpp, mllm/backends/qnn/aot/visitor/Conv2D.cpp
Adds createQnnAOTSimpleLoweringPipeline, IR symbol-aware tensor naming, SymPerTensor offset selection by dtype, PTQ linear/LN solving updates, and Conv2D bias quant attribute initialization.
QNN allocator, backend, and wrapper runtime fixes
mllm/backends/qnn/QNNAllocator.cpp, mllm/backends/qnn/QNNBackend.cpp, mllm/backends/qnn/QNNUtils.hpp, mllm/backends/qnn/QNNModel.cpp
Defensive diagnostics and ownership checks in QNNAllocator, rpcmem/memRegister error handling, graphExecute output-count validation, unconditional rebinding of wrapper data containers for repeated AOT runs, simplified __setDataContainer logic, and optional QNN I/O logging.
Visual graph execution runners and diagnostics
examples/qwen2vl_qnn_aot/visual_aot_run.cpp, examples/qwen2vl_qnn_aot/visual_padding_diag.cpp, examples/qwen2vl_qnn_aot/visual_padding_ab_diag.cpp
Runners and diagnostics for executing visual QNN AOT bundles, timing per stage, and comparing QNN outputs to reference models (cosine, L2, norm ratios, NaN/Inf counts).
Tensor operation implementations and improvements
mllm/backends/cpu/ops/VisionRoPEOp.cpp, mllm/core/aops/ElewiseOps.cpp, mllm/core/aops/VisionRoPEOp.cpp, mllm/mllm.hpp
Implements VisionRoPE scalar x86 paths for float16/float32, adds float16 constant support in elementwise tracing, corrects VisionRoPE op type, and bounds-checks Android stack-trace snprintf lengths.
QNN AOT visitor patterns for linear ops
mllm/backends/qnn/aot/visitor/Matmul.cpp, mllm/backends/qnn/aot/visitor/Repeat.cpp, mllm/backends/qnn/aot/visitor/View.cpp
MatMul now emits transpose scalar params, Repeat implements reshape→tile→reshape for repeat_interleave semantics, and View treats same-name no-op aliasing vs unsupported shape-change cases.
ViT timing instrumentation and performance summary
mllm/models/ARGeneration.cpp, mllm/models/qwen2vl/modeling_qwen2vl.hpp, mllm/models/qwen2vl/tokenization_qwen2vl.hpp
Adds ViT custom event timing, subtracts ViT time from prefill to compute "LLM prefill time" in perfSummary (seconds), refines average decode time reporting, and makes tokenizer image preprocessing accept explicit grid dims.
Remote evaluation and deployment scripts
scripts/qwen2vl_qnn/common.sh, scripts/qwen2vl_qnn/push_eval_assets.sh, scripts/qwen2vl_qnn/run_qnn_eval_fixed_set.sh, scripts/qwen2vl_qnn/run_qnn_interactive.sh
Adds shared Bash utilities (ADB resolution, quoting, timestamp, case reading), asset-push helper, fixed-set evaluation orchestrator with per-case logging and summarization, and interactive remote-run script.
Build configuration and example structure
.gitignore, examples/CMakeLists.txt, examples/qwen2vl_qnn_aot/CMakeLists.txt, examples/qwen2vl_qnn_aot/README.md
Registers the qwen2vl_qnn_aot example, adds CMake targets for AOT compilers and runners, includes a comprehensive README, and ignores top-level /logs/.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested reviewers

  • liang1232018
  • oreomaker
  • chenghuaWang

"I'm a rabbit in the code parade,
Hopping through bundles that humans made.
I trace the patches, time the run,
Count quant bits under the sun —
Joyful hops for builds well-laid!"

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

@chenghuaWang
Copy link
Copy Markdown
Collaborator

@twlddd Thanks for your PR! Could you provide documentation on how to run VLM model? This would help us test and reproduce the results.

@twlddd
Copy link
Copy Markdown
Author

twlddd commented May 21, 2026

@twlddd 感谢您的 PR!能否提供 VLM 模型运行文档?这将有助于我们测试和复现结果。

Thanks for the suggestion. I have added a “Run Qwen2-VL VLM on Android” section to examples/qwen2vl_qnn_aot/README.md.

The new documentation includes:

  • required runtime artifacts
  • the ModelScope download link for the prebuilt .mllm weight file and QNN context .bin
  • expected local artifact layout
  • Android device directory layout
  • adb push commands
  • interactive VLM inference command
  • fixed-set evaluation command
  • notes about required mllm/QNN runtime libraries

The large runtime artifacts are not included in this PR. They can be downloaded here:

https://www.modelscope.cn/models/twlddd/Qwen2-VL-2B-Instruct-Full-QNN-AOT-for-mllm/summary

I will keep the PR focused on the source code and reproducible run instructions, while hosting the large model/context artifacts externally.

@twlddd twlddd marked this pull request as ready for review May 21, 2026 04:31
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 11

🧹 Nitpick comments (8)
mllm/models/ARGeneration.cpp (1)

290-343: ⚡ Quick win

Consider consistent units across all timing metrics.

The main performance metrics now display in seconds (lines 311-343), but the custom events section (lines 348-359, specifically line 356) still reports in microseconds. This creates a unit inconsistency in the output. Since "ViT" is itself a custom event but is handled separately in the main summary, consider converting custom events to seconds as well for consistency.

♻️ Suggested improvement for unit consistency
       fmt::print(fg(fmt::color::white), "{:<20}: ", name);
-      fmt::print(fg(fmt::color::yellow), "{:>10.2f} μs\n", (double)duration);
+      fmt::print(fg(fmt::color::yellow), "{:>10.3f} s\n", (double)duration / 1000000.0);
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@mllm/models/ARGeneration.cpp` around lines 290 - 343, The perf summary mixes
seconds and microseconds: custom_event_time_ entries (e.g., the "ViT" handling
using vit_duration) are printed in microseconds while the main summary
(prefill_seconds, decode_seconds, total_seconds, etc.) uses seconds; convert all
custom event durations to seconds before printing to keep units consistent.
Locate the block using custom_event_time_ and vit_duration (and where custom
events are iterated/printed) in ARGeneration.cpp and change the duration
computation to divide microseconds by 1e6 (or cast to seconds using
std::chrono::duration_cast to seconds) and update the printed label/format to
show "s" and match the same fmt formatting (e.g., {:>10.3f} s) used for
prefill_seconds/decode_seconds so all timings use seconds consistently.
mllm/models/qwen2vl/image_preprocessor_qwen2vl.hpp (1)

71-78: ⚡ Quick win

Document the new grid-override overload contract.

This new public API should state expected units (grid_h/grid_w are patch-grid units), constraints (must be divisible by merge_size_), and thrown error behavior.

As per coding guidelines, "Ensure public APIs, classes, and functions have clear docstrings or comments explaining purpose, parameters, returns, and errors."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@mllm/models/qwen2vl/image_preprocessor_qwen2vl.hpp` around lines 71 - 78, Add
a short doc comment above the inline operator()(const std::string& image_path,
int32_t grid_h, int32_t grid_w) explaining that grid_h and grid_w are specified
in patch-grid units (not pixels), that the function rescales the image to grid_h
* patch_size_ by grid_w * patch_size_, that grid_h and grid_w must be positive
and divisible by merge_size_ (otherwise MLLM_ERROR_EXIT with ExitCode::kIOError
is raised), and document the return value as a pair<Tensor, Tensor> produced by
Image::open(image_path) -> processResizedImage(...) and any exceptions/exit
behavior; reference merge_size_, patch_size_, Image::open, and
processResizedImage in the comment so callers understand constraints and side
effects.
examples/qwen2vl_qnn_aot/compile_visual.cpp (1)

37-193: ⚖️ Poor tradeoff

Code duplication with visual_aot_helpers.hpp.

The classes PatchEmbedLinear, VisionMlpPrimitiveQuickGELU, Qwen2VLVisionBlockAOTRewrite, and Qwen2VisionTransformerPretrainedModelAOTRewrite are duplicated here and in visual_aot_helpers.hpp. Consider reusing the helper header to reduce maintenance burden.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/qwen2vl_qnn_aot/compile_visual.cpp` around lines 37 - 193, The file
duplicates implementations already present in visual_aot_helpers.hpp; remove the
local definitions of PatchEmbedLinear, VisionMlpPrimitiveQuickGELU,
Qwen2VLVisionBlockAOTRewrite, and
Qwen2VisionTransformerPretrainedModelAOTRewrite and instead reuse the helpers by
including visual_aot_helpers.hpp and using the existing classes/constructors
(keep references to symbols like PatchEmbedLinear, VisionMlpPrimitiveQuickGELU,
Qwen2VLVisionBlockAOTRewrite, Qwen2VisionTransformerPretrainedModelAOTRewrite,
patch_embed_, patch_merger_, blocks_ and their reg<...> registrations); update
any constructor calls or reg<...> invocations to match the helper API and delete
the duplicated code to avoid maintenance drift.
examples/qwen2vl_qnn_aot/visual_aot_run.cpp (1)

382-498: 💤 Low value

Consider extracting the bundle execution logic into a helper function.

The four bundle layout branches (6x8, tail4, early2, block1) share very similar patterns: compute graph_limit, then conditionally run each graph in sequence. This could be consolidated using a data-driven approach with a list of graph names per layout, similar to graphNamesForLayout() in visual_padding_ab_diag.cpp.

However, the current explicit approach is also readable and the example code is functional.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/qwen2vl_qnn_aot/visual_aot_run.cpp` around lines 382 - 498, The four
branches that handle bundle_layout.get() ("6x8", "tail4", "early2", "block1")
duplicate the pattern of computing graph_limit, reserving results, and pushing
sequential runVisualGraph(...) calls; refactor by extracting the execution logic
into a helper (e.g., buildAndRunBundle or runBundleGraphs) that takes the layout
string or a vector<string> of graph names (like graphNamesForLayout()), the
max_graphs option, and the common parameters (img_qnn, sin_qnn, cos_qnn,
hidden_a/hidden_b, visual_embeddings, compare_segments), computes graph_limit,
iterates the graph name list and calls runVisualGraph in sequence while pushing
to results, and then replace each branch with a call to this helper
(special-case "block1" graph list generation to produce the 32 indexed names and
visual_merger at the end).
mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.hpp (1)

19-26: ⚡ Quick win

Add brief API comments for the new simple-graph lowering interfaces.

Please document expected parameters (root_graph_name, qnn_graph_name), return semantics, and failure conditions for these public methods/functions.

As per coding guidelines, "Ensure public APIs, classes, and functions have clear docstrings or comments explaining purpose, parameters, returns, and errors."

Also applies to: 38-38

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.hpp` around lines 19 - 26,
Add concise API comments to the LLM2QnnLoweringPass class documenting the
purpose and behavior of its public methods: describe what LLM2QnnLoweringPass
represents, then for run(const ir::node_ptr_t& op) state what kind of node it
expects, the meaning of the returned uint8_t (e.g., status codes for
success/failure), and under what conditions it fails; for runSimpleGraph(const
ir::node_ptr_t& op, const std::string& root_graph_name, const std::string&
qnn_graph_name) document the expected semantics of root_graph_name and
qnn_graph_name (what graphs they identify), what the method does (lowers a
simple graph into a QNN graph), the return value meanings (success, partial
success, error codes) and any error/failure conditions (missing nodes, name
collisions, invalid graph shapes), and note any side effects (modifies IR,
creates new graph named qnn_graph_name) so callers know expected behavior.
mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp (1)

291-291: ⚡ Quick win

Use ir::IRWriter::WalkResult::WALK_CONTINUE for consistency.

mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp uses ir::IRWriter::WALK_CONTINUE only at line 291, while surrounding/other code uses ir::IRWriter::WalkResult::WALK_CONTINUE. Both forms work with the unscoped enum WalkResult in mllm/compile/ir/Node.hpp, but keep it consistent.

Suggested fix
-          return ir::IRWriter::WALK_CONTINUE;
+          return ir::IRWriter::WalkResult::WALK_CONTINUE;
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp` at line 291, Replace
the unscoped enum value usage in LLM2QnnLoweringPass by returning the scoped
enum constant; specifically change the return in the visitor/visitor-like
routine in LLM2QnnLoweringPass.cpp from ir::IRWriter::WALK_CONTINUE to
ir::IRWriter::WalkResult::WALK_CONTINUE (keep the rest of the function and
surrounding code intact) so the file matches the surrounding usage pattern of
ir::IRWriter::WalkResult.
mllm/backends/qnn/aot/QnnWrappersAPI.cpp (1)

153-172: 💤 Low value

Consider using named constants for quantization offsets.

The offset values (-128, -32768, 0) are standard symmetric quantization offsets for converting unsigned to signed representation. While correct, named constants would improve readability and maintainability.

💡 Example constants
// Near top of file or in a header
constexpr int32_t kSymmetricOffsetUInt8 = -128;      // 2^7
constexpr int32_t kSymmetricOffsetUInt16 = -32768;   // 2^15
constexpr int32_t kSymmetricOffsetSigned = 0;
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@mllm/backends/qnn/aot/QnnWrappersAPI.cpp` around lines 153 - 172, Replace
magic numbers used for quantization offsets with named constexprs and use them
in the switch handling cfg->quant_to_type: define constants (e.g.,
kSymmetricOffsetUInt8 = -128, kSymmetricOffsetUInt16 = -32768,
kSymmetricOffsetSigned = 0) in the file scope or appropriate header, then update
the switch cases for kUInt8, kUInt16, kInt8/kInt16 to assign those constants to
offset (leave MLLM_ERROR_EXIT unchanged for default). Also add a short comment
near the constants explaining they represent symmetric quantization offsets for
unsigned-to-signed conversion.
mllm/backends/qnn/aot/passes/AOTPipeline.cpp (1)

41-57: 💤 Low value

Consider adding documentation for public API function.

The new createQnnAOTSimpleLoweringPipeline function is a public API but lacks a docstring explaining its purpose, parameters (especially the qnn_graph_name parameter), and how it differs from createQnnAOTLoweringPipeline. As per coding guidelines, public APIs should have clear docstrings explaining purpose, parameters, returns, and errors.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@mllm/backends/qnn/aot/passes/AOTPipeline.cpp` around lines 41 - 57, Add a
docstring above the public function createQnnAOTSimpleLoweringPipeline
describing its purpose (builds a simple AOT lowering pass pipeline for QNN),
each parameter (env: QnnAOTEnv* used to configure AOTCompileContext,
config_path: path to config, pf: ParameterFile::ptr_t for params,
qnn_graph_name: name of the target QNN graph and how it alters lowering
behavior), the return value (vector of ir::Pass::ptr_t containing the sequence
of passes), and how it differs from createQnnAOTLoweringPipeline (e.g. simpler
pass sequence or intended use-case). Also mention any error/edge conditions
(null env or pf expectations) and where the docstring should be placed
(immediately above createQnnAOTSimpleLoweringPipeline).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@examples/qwen2vl_qnn_aot/compile.cpp`:
- Around line 696-700: The code calls MLLM_ERROR_EXIT which aborts the process,
so the following Argparse::printHelp() and return -1 are dead/unreachable; fix
by moving the help/return before calling MLLM_ERROR_EXIT or remove the
unreachable statements: either call Argparse::printHelp() and return -1 when
required arguments are missing, or call MLLM_ERROR_EXIT (and only it) to
terminate; update the conditional around model_path, model_cfg_path,
qnn_aot_cfg_files, and output_context_path accordingly (refer to MLLM_ERROR_EXIT
and Argparse::printHelp()).

In `@examples/qwen2vl_qnn_aot/modeling_qwen2vl_qnn_aot.hpp`:
- Around line 502-504: The lm_head_ is only registered when
cfg.tie_word_embeddings is true but later always accessed, causing a runtime
crash; fix by either registering lm_head_ unconditionally or guarding its usage.
Specifically, in the constructor ensure lm_head_ is always created (register a
Conv2D "lm_head" when cfg.tie_word_embeddings is false as well) or change the
later access sites to check cfg.tie_word_embeddings (or lm_head_ non-null)
before using lm_head_ in the forward/usage code; update the initialization and
any code that assumes lm_head_ exists (references to lm_head_ and
cfg.tie_word_embeddings) so the creation and consumption lifecycles match.
- Around line 149-156: The bias quantization loop in makeQuantizedBias uses
scale_value (from scale.item<mllm_fp32_t>()) as a divisor without validation
which can cause divide-by-zero or invalid results; update makeQuantizedBias to
validate scale_value (and optionally check for NaN/negative) before the loop—if
scale_value <= 0 or not finite, either throw/log a clear error or replace it
with a small positive epsilon; then proceed to compute quantized =
std::lround(bias.ptr<mllm_fp32_t>()[i] / scale_value) + zero_point_value and
clamp into quant_bias.ptr<mllm_uint16_t>()[i] as before, keeping the rest of the
logic unchanged.

In `@examples/qwen2vl_qnn_aot/visual_aot_helpers.hpp`:
- Around line 265-266: The code calling std::stoi on substrings for grid_h and
grid_w (using item and sep to split) can throw std::invalid_argument or
std::out_of_range for malformed or oversized numeric text; wrap the two
std::stoi calls (the ones assigning grid_h and grid_w) in a try-catch, validate
the substrings before conversion (e.g., non-empty and contain only optional +/-
and digits) or catch std::invalid_argument and std::out_of_range and
return/propagate a clear error (or set a default/failure path) so the caller
doesn’t crash; update any surrounding parsing function to handle the failure
case accordingly and reference the item, sep, grid_h, and grid_w symbols when
applying the change.

In `@mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp`:
- Around line 1002-1013: If a LinearOp reports options().bias but the ".bias"
symbol is missing, fail immediately instead of silently skipping: after
computing bias_name and looking up bias_reg_tensor_ir via
writer.getContext()->lookupSymbolTable(bias_name), if bias_reg_tensor_ir is null
return/fail (use the project's error macro like MLLM_RETURN_FALSE or equivalent)
with a clear message referencing bias_name; keep the existing checks
(MLLM_RETURN_FALSE_IF_NOT for RegisterOp and TensorValue) and the subsequent
creation of bias_quant_spec and annotation_attr unchanged.

In `@mllm/backends/qnn/aot/passes/PTQPass.cpp`:
- Around line 56-57: The UInt4 validation in PTQPass.cpp is off-by-one: update
the checkTypeLimits call for scale1 (currently checking range 0..16) to use the
correct maximum 15 for a 4-bit unsigned value; locate the call
checkTypeLimits<uint8_t>(scale1, 0, 16) and change the upper bound to 15 so the
UInt4 range is validated as [0,15] (keep the Int4
checkTypeLimits<int8_t>(weight, 0, 15) as-is).

In `@mllm/backends/qnn/aot/visitor/LayerNorm.cpp`:
- Around line 68-72: The current code assumes input->tensor_.shape().size() > 0
and writes axes_data[0] = input->tensor_.shape().size() - 1 which underflows for
rank-0 tensors; update the LayerNorm axis calculation to guard the rank: compute
size_t rank = input->tensor_.shape().size(); uint32_t axis = rank == 0 ? 0u :
static_cast<uint32_t>(rank - 1); then write axes_data[0] = axis (keeping
axes_dims, axes_param, axes_data, base_op->getName(), and
qnn_op_node->emplaceParamTensor unchanged) so no underflow occurs for rank-0
inputs.
- Around line 48-63: The code dereferences symbols returned by
writer.getContext()->lookupSymbolTable(base_op->getName() + ".weight"/".bias")
and then calls outputs().front()->cast_<ir::tensor::TensorValue>() without
checks; add null/type guards to avoid crashes: verify the lookupSymbolTable
result is non-null, that its outputs() is non-empty, and that the front() value
can be safely cast to ir::tensor::TensorValue before calling
env->captureQnnAOTNodeTensor and qnn_op_node->emplaceInput; on failure, handle
gracefully (return/skip/log/throw) with a clear message including the symbol
name and base_op->getName() to aid debugging.

In `@mllm/models/qwen2vl/tokenization_qwen2vl.hpp`:
- Around line 246-248: The condition that decides whether to pass explicit grid
dimensions to image_preprocessor_ is using ||, allowing invalid single-dimension
overrides; change the guard to require both image_grid_h and image_grid_w to be
> 0 (use &&) so image_preprocessor_(message.img_file_path, image_grid_h,
image_grid_w) is only called when both dimensions are valid, otherwise call
image_preprocessor_(message.img_file_path) to use defaults; update the
conditional around image_preprocessor_ where image_grid_h and image_grid_w are
used.

In `@scripts/qwen2vl_qnn/push_eval_assets.sh`:
- Around line 28-40: Validate and sanitize image_file before using it in path
joins: reject or normalize values that start with '/' or contain '..' or path
separators that escape ASSET_SRC, control/null bytes, or other unexpected
characters; ensure local_image is constructed only from ASSET_SRC + safe
basename(image_file) (or canonicalize and verify it remains under ASSET_SRC) and
ensure the remote path uses only a safe filename (not a full path) before
calling "${ADB_BIN}" push; also keep the pushed[...] check and error handling
but perform this validation immediately after reading image_file to prevent
absolute/traversal attacks.

In `@scripts/qwen2vl_qnn/run_qnn_eval_fixed_set.sh`:
- Around line 74-77: Sanitize the untrusted case_id before composing log_file to
prevent path traversal or invalid filenames: in the while read loop sanitize the
case_id variable used when creating log_file (and remote_image if reused) by
removing or replacing path separators and traversal tokens and allowing only a
safe whitelist (e.g., alphanumerics, dot, underscore, hyphen), ensure the
sanitized value is non-empty (fallback to a safe default like "case_<n>"), then
use that sanitized_case_id when building
log_file="${OUT_DIR}/${sanitized_case_id}.log"; reference case_id, log_file,
OUT_DIR and the while IFS='|' read -r case_id ... loop when making the change.

---

Nitpick comments:
In `@examples/qwen2vl_qnn_aot/compile_visual.cpp`:
- Around line 37-193: The file duplicates implementations already present in
visual_aot_helpers.hpp; remove the local definitions of PatchEmbedLinear,
VisionMlpPrimitiveQuickGELU, Qwen2VLVisionBlockAOTRewrite, and
Qwen2VisionTransformerPretrainedModelAOTRewrite and instead reuse the helpers by
including visual_aot_helpers.hpp and using the existing classes/constructors
(keep references to symbols like PatchEmbedLinear, VisionMlpPrimitiveQuickGELU,
Qwen2VLVisionBlockAOTRewrite, Qwen2VisionTransformerPretrainedModelAOTRewrite,
patch_embed_, patch_merger_, blocks_ and their reg<...> registrations); update
any constructor calls or reg<...> invocations to match the helper API and delete
the duplicated code to avoid maintenance drift.

In `@examples/qwen2vl_qnn_aot/visual_aot_run.cpp`:
- Around line 382-498: The four branches that handle bundle_layout.get() ("6x8",
"tail4", "early2", "block1") duplicate the pattern of computing graph_limit,
reserving results, and pushing sequential runVisualGraph(...) calls; refactor by
extracting the execution logic into a helper (e.g., buildAndRunBundle or
runBundleGraphs) that takes the layout string or a vector<string> of graph names
(like graphNamesForLayout()), the max_graphs option, and the common parameters
(img_qnn, sin_qnn, cos_qnn, hidden_a/hidden_b, visual_embeddings,
compare_segments), computes graph_limit, iterates the graph name list and calls
runVisualGraph in sequence while pushing to results, and then replace each
branch with a call to this helper (special-case "block1" graph list generation
to produce the 32 indexed names and visual_merger at the end).

In `@mllm/backends/qnn/aot/passes/AOTPipeline.cpp`:
- Around line 41-57: Add a docstring above the public function
createQnnAOTSimpleLoweringPipeline describing its purpose (builds a simple AOT
lowering pass pipeline for QNN), each parameter (env: QnnAOTEnv* used to
configure AOTCompileContext, config_path: path to config, pf:
ParameterFile::ptr_t for params, qnn_graph_name: name of the target QNN graph
and how it alters lowering behavior), the return value (vector of
ir::Pass::ptr_t containing the sequence of passes), and how it differs from
createQnnAOTLoweringPipeline (e.g. simpler pass sequence or intended use-case).
Also mention any error/edge conditions (null env or pf expectations) and where
the docstring should be placed (immediately above
createQnnAOTSimpleLoweringPipeline).

In `@mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp`:
- Line 291: Replace the unscoped enum value usage in LLM2QnnLoweringPass by
returning the scoped enum constant; specifically change the return in the
visitor/visitor-like routine in LLM2QnnLoweringPass.cpp from
ir::IRWriter::WALK_CONTINUE to ir::IRWriter::WalkResult::WALK_CONTINUE (keep the
rest of the function and surrounding code intact) so the file matches the
surrounding usage pattern of ir::IRWriter::WalkResult.

In `@mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.hpp`:
- Around line 19-26: Add concise API comments to the LLM2QnnLoweringPass class
documenting the purpose and behavior of its public methods: describe what
LLM2QnnLoweringPass represents, then for run(const ir::node_ptr_t& op) state
what kind of node it expects, the meaning of the returned uint8_t (e.g., status
codes for success/failure), and under what conditions it fails; for
runSimpleGraph(const ir::node_ptr_t& op, const std::string& root_graph_name,
const std::string& qnn_graph_name) document the expected semantics of
root_graph_name and qnn_graph_name (what graphs they identify), what the method
does (lowers a simple graph into a QNN graph), the return value meanings
(success, partial success, error codes) and any error/failure conditions
(missing nodes, name collisions, invalid graph shapes), and note any side
effects (modifies IR, creates new graph named qnn_graph_name) so callers know
expected behavior.

In `@mllm/backends/qnn/aot/QnnWrappersAPI.cpp`:
- Around line 153-172: Replace magic numbers used for quantization offsets with
named constexprs and use them in the switch handling cfg->quant_to_type: define
constants (e.g., kSymmetricOffsetUInt8 = -128, kSymmetricOffsetUInt16 = -32768,
kSymmetricOffsetSigned = 0) in the file scope or appropriate header, then update
the switch cases for kUInt8, kUInt16, kInt8/kInt16 to assign those constants to
offset (leave MLLM_ERROR_EXIT unchanged for default). Also add a short comment
near the constants explaining they represent symmetric quantization offsets for
unsigned-to-signed conversion.

In `@mllm/models/ARGeneration.cpp`:
- Around line 290-343: The perf summary mixes seconds and microseconds:
custom_event_time_ entries (e.g., the "ViT" handling using vit_duration) are
printed in microseconds while the main summary (prefill_seconds, decode_seconds,
total_seconds, etc.) uses seconds; convert all custom event durations to seconds
before printing to keep units consistent. Locate the block using
custom_event_time_ and vit_duration (and where custom events are
iterated/printed) in ARGeneration.cpp and change the duration computation to
divide microseconds by 1e6 (or cast to seconds using std::chrono::duration_cast
to seconds) and update the printed label/format to show "s" and match the same
fmt formatting (e.g., {:>10.3f} s) used for prefill_seconds/decode_seconds so
all timings use seconds consistently.

In `@mllm/models/qwen2vl/image_preprocessor_qwen2vl.hpp`:
- Around line 71-78: Add a short doc comment above the inline operator()(const
std::string& image_path, int32_t grid_h, int32_t grid_w) explaining that grid_h
and grid_w are specified in patch-grid units (not pixels), that the function
rescales the image to grid_h * patch_size_ by grid_w * patch_size_, that grid_h
and grid_w must be positive and divisible by merge_size_ (otherwise
MLLM_ERROR_EXIT with ExitCode::kIOError is raised), and document the return
value as a pair<Tensor, Tensor> produced by Image::open(image_path) ->
processResizedImage(...) and any exceptions/exit behavior; reference
merge_size_, patch_size_, Image::open, and processResizedImage in the comment so
callers understand constraints and side effects.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a35a1f92-c40a-4455-84aa-5dd0feaef317

📥 Commits

Reviewing files that changed from the base of the PR and between 729ca4c and d3da4c5.

⛔ Files ignored due to path filters (2)
  • scripts/qwen2vl_qnn/qwen2vl_eval_cases.tsv is excluded by !**/*.tsv
  • scripts/qwen2vl_qnn/qwen2vl_eval_cases_5bucket.tsv is excluded by !**/*.tsv
📒 Files selected for processing (51)
  • .gitignore
  • examples/CMakeLists.txt
  • examples/qwen2vl/config_2B_qnn_lpbq.json
  • examples/qwen2vl/main.cpp
  • examples/qwen2vl/qnn_aot_cfg_2B.json
  • examples/qwen2vl/qnn_aot_cfg_2B_lpbq_vprojg16_unsignedpd.json
  • examples/qwen2vl/qnn_aot_cfg_2B_visual.json
  • examples/qwen2vl/qnn_aot_cfg_2B_visual_fp16.json
  • examples/qwen2vl_qnn_aot/CMakeLists.txt
  • examples/qwen2vl_qnn_aot/README.md
  • examples/qwen2vl_qnn_aot/aot_run.cpp
  • examples/qwen2vl_qnn_aot/compile.cpp
  • examples/qwen2vl_qnn_aot/compile_visual.cpp
  • examples/qwen2vl_qnn_aot/modeling_qwen2vl_qnn_aot.hpp
  • examples/qwen2vl_qnn_aot/visual_aot_helpers.hpp
  • examples/qwen2vl_qnn_aot/visual_aot_run.cpp
  • examples/qwen2vl_qnn_aot/visual_padding_ab_diag.cpp
  • examples/qwen2vl_qnn_aot/visual_padding_diag.cpp
  • mllm/backends/cpu/ops/VisionRoPEOp.cpp
  • mllm/backends/qnn/QNNAllocator.cpp
  • mllm/backends/qnn/QNNBackend.cpp
  • mllm/backends/qnn/QNNModel.cpp
  • mllm/backends/qnn/QNNUtils.hpp
  • mllm/backends/qnn/aot/QnnWrappersAPI.cpp
  • mllm/backends/qnn/aot/passes/AOTPipeline.cpp
  • mllm/backends/qnn/aot/passes/AOTPipeline.hpp
  • mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp
  • mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.hpp
  • mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp
  • mllm/backends/qnn/aot/passes/LLMQuantRecipePass.hpp
  • mllm/backends/qnn/aot/passes/PTQPass.cpp
  • mllm/backends/qnn/aot/visitor/Conv2D.cpp
  • mllm/backends/qnn/aot/visitor/GELU.cpp
  • mllm/backends/qnn/aot/visitor/GELU.hpp
  • mllm/backends/qnn/aot/visitor/LayerNorm.cpp
  • mllm/backends/qnn/aot/visitor/LayerNorm.hpp
  • mllm/backends/qnn/aot/visitor/Matmul.cpp
  • mllm/backends/qnn/aot/visitor/Repeat.cpp
  • mllm/backends/qnn/aot/visitor/View.cpp
  • mllm/core/aops/ElewiseOps.cpp
  • mllm/core/aops/VisionRoPEOp.cpp
  • mllm/mllm.hpp
  • mllm/models/ARGeneration.cpp
  • mllm/models/qwen2vl/image_preprocessor_qwen2vl.hpp
  • mllm/models/qwen2vl/modeling_qwen2vl.hpp
  • mllm/models/qwen2vl/modeling_qwen2vl_traceable.hpp
  • mllm/models/qwen2vl/tokenization_qwen2vl.hpp
  • scripts/qwen2vl_qnn/common.sh
  • scripts/qwen2vl_qnn/push_eval_assets.sh
  • scripts/qwen2vl_qnn/run_qnn_eval_fixed_set.sh
  • scripts/qwen2vl_qnn/run_qnn_interactive.sh

Comment on lines +696 to +700
if (!model_path.isSet() || !model_cfg_path.isSet() || !qnn_aot_cfg_files.isSet() || !output_context_path.isSet()) {
MLLM_ERROR_EXIT(mllm::ExitCode::kCoreError, "Missing required argument.");
Argparse::printHelp();
return -1;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Dead code after MLLM_ERROR_EXIT.

MLLM_ERROR_EXIT terminates execution, making the subsequent Argparse::printHelp() and return -1 unreachable.

🧹 Suggested fix
   if (!model_path.isSet() || !model_cfg_path.isSet() || !qnn_aot_cfg_files.isSet() || !output_context_path.isSet()) {
+    Argparse::printHelp();
     MLLM_ERROR_EXIT(mllm::ExitCode::kCoreError, "Missing required argument.");
-    Argparse::printHelp();
-    return -1;
   }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if (!model_path.isSet() || !model_cfg_path.isSet() || !qnn_aot_cfg_files.isSet() || !output_context_path.isSet()) {
MLLM_ERROR_EXIT(mllm::ExitCode::kCoreError, "Missing required argument.");
Argparse::printHelp();
return -1;
}
if (!model_path.isSet() || !model_cfg_path.isSet() || !qnn_aot_cfg_files.isSet() || !output_context_path.isSet()) {
Argparse::printHelp();
MLLM_ERROR_EXIT(mllm::ExitCode::kCoreError, "Missing required argument.");
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/qwen2vl_qnn_aot/compile.cpp` around lines 696 - 700, The code calls
MLLM_ERROR_EXIT which aborts the process, so the following Argparse::printHelp()
and return -1 are dead/unreachable; fix by moving the help/return before calling
MLLM_ERROR_EXIT or remove the unreachable statements: either call
Argparse::printHelp() and return -1 when required arguments are missing, or call
MLLM_ERROR_EXIT (and only it) to terminate; update the conditional around
model_path, model_cfg_path, qnn_aot_cfg_files, and output_context_path
accordingly (refer to MLLM_ERROR_EXIT and Argparse::printHelp()).

Comment on lines +149 to +156
const auto scale_value = scale.item<mllm_fp32_t>();
const auto zero_point_value = zero_point.item<mllm_int32_t>();
auto quant_bias = Tensor::empty({1, 1, 1, channels}, kUInt16).alloc();

for (int32_t i = 0; i < channels; ++i) {
const auto quantized =
static_cast<int32_t>(std::lround(bias.ptr<mllm_fp32_t>()[i] / scale_value)) + zero_point_value;
quant_bias.ptr<mllm_uint16_t>()[i] = static_cast<mllm_uint16_t>(std::clamp(quantized, 0, 65535));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Validate quant scale before bias quantization.

makeQuantizedBias divides by scale_value without guarding zero/negative values. A bad exported scale can produce invalid output or undefined behavior in this hot path.

Suggested fix
   const auto scale_value = scale.item<mllm_fp32_t>();
   const auto zero_point_value = zero_point.item<mllm_int32_t>();
+  if (!(scale_value > 0.f)) {
+    MLLM_ERROR_EXIT(ExitCode::kCoreError, "Invalid quant scale {} for {}", scale_value, scale_name);
+  }
   auto quant_bias = Tensor::empty({1, 1, 1, channels}, kUInt16).alloc();

As per coding guidelines, "Validate inputs for public APIs and critical internal functions."

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const auto scale_value = scale.item<mllm_fp32_t>();
const auto zero_point_value = zero_point.item<mllm_int32_t>();
auto quant_bias = Tensor::empty({1, 1, 1, channels}, kUInt16).alloc();
for (int32_t i = 0; i < channels; ++i) {
const auto quantized =
static_cast<int32_t>(std::lround(bias.ptr<mllm_fp32_t>()[i] / scale_value)) + zero_point_value;
quant_bias.ptr<mllm_uint16_t>()[i] = static_cast<mllm_uint16_t>(std::clamp(quantized, 0, 65535));
const auto scale_value = scale.item<mllm_fp32_t>();
const auto zero_point_value = zero_point.item<mllm_int32_t>();
if (!(scale_value > 0.f)) {
MLLM_ERROR_EXIT(ExitCode::kCoreError, "Invalid quant scale {} for {}", scale_value, scale_name);
}
auto quant_bias = Tensor::empty({1, 1, 1, channels}, kUInt16).alloc();
for (int32_t i = 0; i < channels; ++i) {
const auto quantized =
static_cast<int32_t>(std::lround(bias.ptr<mllm_fp32_t>()[i] / scale_value)) + zero_point_value;
quant_bias.ptr<mllm_uint16_t>()[i] = static_cast<mllm_uint16_t>(std::clamp(quantized, 0, 65535));
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/qwen2vl_qnn_aot/modeling_qwen2vl_qnn_aot.hpp` around lines 149 -
156, The bias quantization loop in makeQuantizedBias uses scale_value (from
scale.item<mllm_fp32_t>()) as a divisor without validation which can cause
divide-by-zero or invalid results; update makeQuantizedBias to validate
scale_value (and optionally check for NaN/negative) before the loop—if
scale_value <= 0 or not finite, either throw/log a clear error or replace it
with a small positive epsilon; then proceed to compute quantized =
std::lround(bias.ptr<mllm_fp32_t>()[i] / scale_value) + zero_point_value and
clamp into quant_bias.ptr<mllm_uint16_t>()[i] as before, keeping the rest of the
logic unchanged.

Comment on lines +502 to +504
if (cfg.tie_word_embeddings) {
lm_head_ = reg<nn::Conv2D>("lm_head", cfg.hidden_size, cfg.vocab_size, CONV2D_PROPERTY_NO_BIAS);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fix lm_head_ lifecycle mismatch.

On Line 502–504, lm_head_ is conditionally registered, but on Line 539–540 it is always used. This can fail at runtime when cfg.tie_word_embeddings == false.

Suggested fix
-    if (cfg.tie_word_embeddings) {
-      lm_head_ = reg<nn::Conv2D>("lm_head", cfg.hidden_size, cfg.vocab_size, CONV2D_PROPERTY_NO_BIAS);
-    }
+    lm_head_ = reg<nn::Conv2D>("lm_head", cfg.hidden_size, cfg.vocab_size, CONV2D_PROPERTY_NO_BIAS);

Also applies to: 539-540

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/qwen2vl_qnn_aot/modeling_qwen2vl_qnn_aot.hpp` around lines 502 -
504, The lm_head_ is only registered when cfg.tie_word_embeddings is true but
later always accessed, causing a runtime crash; fix by either registering
lm_head_ unconditionally or guarding its usage. Specifically, in the constructor
ensure lm_head_ is always created (register a Conv2D "lm_head" when
cfg.tie_word_embeddings is false as well) or change the later access sites to
check cfg.tie_word_embeddings (or lm_head_ non-null) before using lm_head_ in
the forward/usage code; update the initialization and any code that assumes
lm_head_ exists (references to lm_head_ and cfg.tie_word_embeddings) so the
creation and consumption lifecycles match.

Comment on lines +265 to +266
const int32_t grid_h = std::stoi(item.substr(0, sep));
const int32_t grid_w = std::stoi(item.substr(sep + 1));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Uncaught std::stoi exceptions on malformed input.

std::stoi throws std::invalid_argument if no conversion can be performed and std::out_of_range if the value is too large. While the code validates the 'x'/'X' separator presence, it doesn't guard against non-numeric content around the separator.

🛡️ Suggested fix with exception handling
+    int32_t grid_h = 0;
+    int32_t grid_w = 0;
+    try {
+      grid_h = std::stoi(item.substr(0, sep));
+      grid_w = std::stoi(item.substr(sep + 1));
+    } catch (const std::exception& e) {
+      MLLM_ERROR_EXIT(mllm::ExitCode::kCoreError, "Invalid visual bucket grid '{}': {}", item, e.what());
+    }
-    const int32_t grid_h = std::stoi(item.substr(0, sep));
-    const int32_t grid_w = std::stoi(item.substr(sep + 1));
     if (grid_h <= 0 || grid_w <= 0 || grid_h % 2 != 0 || grid_w % 2 != 0) {
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/qwen2vl_qnn_aot/visual_aot_helpers.hpp` around lines 265 - 266, The
code calling std::stoi on substrings for grid_h and grid_w (using item and sep
to split) can throw std::invalid_argument or std::out_of_range for malformed or
oversized numeric text; wrap the two std::stoi calls (the ones assigning grid_h
and grid_w) in a try-catch, validate the substrings before conversion (e.g.,
non-empty and contain only optional +/- and digits) or catch
std::invalid_argument and std::out_of_range and return/propagate a clear error
(or set a default/failure path) so the caller doesn’t crash; update any
surrounding parsing function to handle the failure case accordingly and
reference the item, sep, grid_h, and grid_w symbols when applying the change.

Comment on lines +1002 to +1013
auto real_linear_op = dynamic_cast<mllm::aops::LinearOp*>(linear_ir->getAOp());
if (real_linear_op && real_linear_op->options().bias) {
auto bias_name = linear_ir->getAOp()->getName() + ".bias";
auto bias_reg_tensor_ir = writer.getContext()->lookupSymbolTable(bias_name);
if (bias_reg_tensor_ir) {
MLLM_RETURN_FALSE_IF_NOT(bias_reg_tensor_ir->isa_<ir::tensor::RegisterOp>());
MLLM_RETURN_FALSE_IF_NOT(bias_reg_tensor_ir->outputs().front()->isa_<ir::tensor::TensorValue>());
auto bias_tensor = bias_reg_tensor_ir->outputs().front()->cast_<ir::tensor::TensorValue>();
auto bias_quant_spec = ir::linalg::QuantizationSpecRaw::create(bias_tensor->tensor_.dtype());
bias_tensor->setAttr("quant_recipe", writer.create<ir::linalg::LinalgIRQuantizatonSpecAttr>(bias_quant_spec));
annotation_attr->annotation_.weights.insert({"bias", bias_quant_spec});
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fail fast when LinearOp declares bias but the .bias tensor is missing.

When options().bias is true, silently skipping a missing bias symbol can leave quant metadata inconsistent for this op. This should hard-fail.

Suggested fix
   auto real_linear_op = dynamic_cast<mllm::aops::LinearOp*>(linear_ir->getAOp());
   if (real_linear_op && real_linear_op->options().bias) {
     auto bias_name = linear_ir->getAOp()->getName() + ".bias";
     auto bias_reg_tensor_ir = writer.getContext()->lookupSymbolTable(bias_name);
-    if (bias_reg_tensor_ir) {
-      MLLM_RETURN_FALSE_IF_NOT(bias_reg_tensor_ir->isa_<ir::tensor::RegisterOp>());
-      MLLM_RETURN_FALSE_IF_NOT(bias_reg_tensor_ir->outputs().front()->isa_<ir::tensor::TensorValue>());
-      auto bias_tensor = bias_reg_tensor_ir->outputs().front()->cast_<ir::tensor::TensorValue>();
-      auto bias_quant_spec = ir::linalg::QuantizationSpecRaw::create(bias_tensor->tensor_.dtype());
-      bias_tensor->setAttr("quant_recipe", writer.create<ir::linalg::LinalgIRQuantizatonSpecAttr>(bias_quant_spec));
-      annotation_attr->annotation_.weights.insert({"bias", bias_quant_spec});
-    }
+    MLLM_RETURN_FALSE_IF_NOT(bias_reg_tensor_ir);
+    MLLM_RETURN_FALSE_IF_NOT(bias_reg_tensor_ir->isa_<ir::tensor::RegisterOp>());
+    MLLM_RETURN_FALSE_IF_NOT(bias_reg_tensor_ir->outputs().front()->isa_<ir::tensor::TensorValue>());
+    auto bias_tensor = bias_reg_tensor_ir->outputs().front()->cast_<ir::tensor::TensorValue>();
+    auto bias_quant_spec = ir::linalg::QuantizationSpecRaw::create(bias_tensor->tensor_.dtype());
+    bias_tensor->setAttr("quant_recipe", writer.create<ir::linalg::LinalgIRQuantizatonSpecAttr>(bias_quant_spec));
+    annotation_attr->annotation_.weights.insert({"bias", bias_quant_spec});
   }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
auto real_linear_op = dynamic_cast<mllm::aops::LinearOp*>(linear_ir->getAOp());
if (real_linear_op && real_linear_op->options().bias) {
auto bias_name = linear_ir->getAOp()->getName() + ".bias";
auto bias_reg_tensor_ir = writer.getContext()->lookupSymbolTable(bias_name);
if (bias_reg_tensor_ir) {
MLLM_RETURN_FALSE_IF_NOT(bias_reg_tensor_ir->isa_<ir::tensor::RegisterOp>());
MLLM_RETURN_FALSE_IF_NOT(bias_reg_tensor_ir->outputs().front()->isa_<ir::tensor::TensorValue>());
auto bias_tensor = bias_reg_tensor_ir->outputs().front()->cast_<ir::tensor::TensorValue>();
auto bias_quant_spec = ir::linalg::QuantizationSpecRaw::create(bias_tensor->tensor_.dtype());
bias_tensor->setAttr("quant_recipe", writer.create<ir::linalg::LinalgIRQuantizatonSpecAttr>(bias_quant_spec));
annotation_attr->annotation_.weights.insert({"bias", bias_quant_spec});
}
auto real_linear_op = dynamic_cast<mllm::aops::LinearOp*>(linear_ir->getAOp());
if (real_linear_op && real_linear_op->options().bias) {
auto bias_name = linear_ir->getAOp()->getName() + ".bias";
auto bias_reg_tensor_ir = writer.getContext()->lookupSymbolTable(bias_name);
MLLM_RETURN_FALSE_IF_NOT(bias_reg_tensor_ir);
MLLM_RETURN_FALSE_IF_NOT(bias_reg_tensor_ir->isa_<ir::tensor::RegisterOp>());
MLLM_RETURN_FALSE_IF_NOT(bias_reg_tensor_ir->outputs().front()->isa_<ir::tensor::TensorValue>());
auto bias_tensor = bias_reg_tensor_ir->outputs().front()->cast_<ir::tensor::TensorValue>();
auto bias_quant_spec = ir::linalg::QuantizationSpecRaw::create(bias_tensor->tensor_.dtype());
bias_tensor->setAttr("quant_recipe", writer.create<ir::linalg::LinalgIRQuantizatonSpecAttr>(bias_quant_spec));
annotation_attr->annotation_.weights.insert({"bias", bias_quant_spec});
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp` around lines 1002 -
1013, If a LinearOp reports options().bias but the ".bias" symbol is missing,
fail immediately instead of silently skipping: after computing bias_name and
looking up bias_reg_tensor_ir via
writer.getContext()->lookupSymbolTable(bias_name), if bias_reg_tensor_ir is null
return/fail (use the project's error macro like MLLM_RETURN_FALSE or equivalent)
with a clear message referencing bias_name; keep the existing checks
(MLLM_RETURN_FALSE_IF_NOT for RegisterOp and TensorValue) and the subsequent
creation of bias_quant_spec and annotation_attr unchanged.

Comment on lines +48 to +63
if (real_layer_norm_op->options().elementwise_affine) {
auto weight = writer.getContext()
->lookupSymbolTable(base_op->getName() + ".weight")
->outputs()
.front()
->cast_<ir::tensor::TensorValue>();
qnn_op_node->emplaceInput(env->captureQnnAOTNodeTensor(qnn_context_name, qnn_graph_name, weight, true));
}

if (real_layer_norm_op->options().bias) {
auto bias = writer.getContext()
->lookupSymbolTable(base_op->getName() + ".bias")
->outputs()
.front()
->cast_<ir::tensor::TensorValue>();
qnn_op_node->emplaceInput(env->captureQnnAOTNodeTensor(qnn_context_name, qnn_graph_name, bias, true));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Add null/type guards before dereferencing LayerNorm weight/bias symbols.

The current chained dereference assumes symbol lookup always succeeds. If a required symbol is missing or malformed, this path can crash.

Suggested fix
   if (real_layer_norm_op->options().elementwise_affine) {
-    auto weight = writer.getContext()
-                      ->lookupSymbolTable(base_op->getName() + ".weight")
-                      ->outputs()
-                      .front()
-                      ->cast_<ir::tensor::TensorValue>();
+    auto weight_ir = writer.getContext()->lookupSymbolTable(base_op->getName() + ".weight");
+    MLLM_RETURN_FALSE_IF_NOT(weight_ir);
+    MLLM_RETURN_FALSE_IF_NOT(weight_ir->isa_<ir::tensor::RegisterOp>());
+    MLLM_RETURN_FALSE_IF_NOT(weight_ir->outputs().front()->isa_<ir::tensor::TensorValue>());
+    auto weight = weight_ir->outputs().front()->cast_<ir::tensor::TensorValue>();
     qnn_op_node->emplaceInput(env->captureQnnAOTNodeTensor(qnn_context_name, qnn_graph_name, weight, true));
   }

   if (real_layer_norm_op->options().bias) {
-    auto bias = writer.getContext()
-                    ->lookupSymbolTable(base_op->getName() + ".bias")
-                    ->outputs()
-                    .front()
-                    ->cast_<ir::tensor::TensorValue>();
+    auto bias_ir = writer.getContext()->lookupSymbolTable(base_op->getName() + ".bias");
+    MLLM_RETURN_FALSE_IF_NOT(bias_ir);
+    MLLM_RETURN_FALSE_IF_NOT(bias_ir->isa_<ir::tensor::RegisterOp>());
+    MLLM_RETURN_FALSE_IF_NOT(bias_ir->outputs().front()->isa_<ir::tensor::TensorValue>());
+    auto bias = bias_ir->outputs().front()->cast_<ir::tensor::TensorValue>();
     qnn_op_node->emplaceInput(env->captureQnnAOTNodeTensor(qnn_context_name, qnn_graph_name, bias, true));
   }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@mllm/backends/qnn/aot/visitor/LayerNorm.cpp` around lines 48 - 63, The code
dereferences symbols returned by
writer.getContext()->lookupSymbolTable(base_op->getName() + ".weight"/".bias")
and then calls outputs().front()->cast_<ir::tensor::TensorValue>() without
checks; add null/type guards to avoid crashes: verify the lookupSymbolTable
result is non-null, that its outputs() is non-empty, and that the front() value
can be safely cast to ir::tensor::TensorValue before calling
env->captureQnnAOTNodeTensor and qnn_op_node->emplaceInput; on failure, handle
gracefully (return/skip/log/throw) with a clear message including the symbol
name and base_op->getName() to aid debugging.

Comment on lines +68 to +72
std::vector<uint32_t> axes_dims = {1};
auto axes_param = QNNParamTensorWrapper::create("axes", base_op->getName() + "_axes", QNN_DATATYPE_UINT_32, axes_dims);
auto* axes_data = reinterpret_cast<uint32_t*>(axes_param->alloc());
axes_data[0] = input->tensor_.shape().size() - 1;
qnn_op_node->emplaceParamTensor(axes_param);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Guard rank before computing the LayerNorm axis.

shape().size() - 1 underflows for rank-0 tensors and writes an invalid axis.

Suggested fix
   std::vector<uint32_t> axes_dims = {1};
   auto axes_param = QNNParamTensorWrapper::create("axes", base_op->getName() + "_axes", QNN_DATATYPE_UINT_32, axes_dims);
   auto* axes_data = reinterpret_cast<uint32_t*>(axes_param->alloc());
-  axes_data[0] = input->tensor_.shape().size() - 1;
+  MLLM_RETURN_FALSE_IF_NOT(!input->tensor_.shape().empty());
+  axes_data[0] = static_cast<uint32_t>(input->tensor_.shape().size() - 1);
   qnn_op_node->emplaceParamTensor(axes_param);
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
std::vector<uint32_t> axes_dims = {1};
auto axes_param = QNNParamTensorWrapper::create("axes", base_op->getName() + "_axes", QNN_DATATYPE_UINT_32, axes_dims);
auto* axes_data = reinterpret_cast<uint32_t*>(axes_param->alloc());
axes_data[0] = input->tensor_.shape().size() - 1;
qnn_op_node->emplaceParamTensor(axes_param);
std::vector<uint32_t> axes_dims = {1};
auto axes_param = QNNParamTensorWrapper::create("axes", base_op->getName() + "_axes", QNN_DATATYPE_UINT_32, axes_dims);
auto* axes_data = reinterpret_cast<uint32_t*>(axes_param->alloc());
MLLM_RETURN_FALSE_IF_NOT(!input->tensor_.shape().empty());
axes_data[0] = static_cast<uint32_t>(input->tensor_.shape().size() - 1);
qnn_op_node->emplaceParamTensor(axes_param);
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@mllm/backends/qnn/aot/visitor/LayerNorm.cpp` around lines 68 - 72, The
current code assumes input->tensor_.shape().size() > 0 and writes axes_data[0] =
input->tensor_.shape().size() - 1 which underflows for rank-0 tensors; update
the LayerNorm axis calculation to guard the rank: compute size_t rank =
input->tensor_.shape().size(); uint32_t axis = rank == 0 ? 0u :
static_cast<uint32_t>(rank - 1); then write axes_data[0] = axis (keeping
axes_dims, axes_param, axes_data, base_op->getName(), and
qnn_op_node->emplaceParamTensor unchanged) so no underflow occurs for rank-0
inputs.

Comment on lines +246 to +248
auto [img, grid_thw] = image_grid_h > 0 || image_grid_w > 0
? image_preprocessor_(message.img_file_path, image_grid_h, image_grid_w)
: image_preprocessor_(message.img_file_path);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Inconsistent grid dimension validation.

The condition uses || but both dimensions should be valid when overriding grid sizes. If only one dimension is positive (e.g., image_grid_h=10, image_grid_w=-1), the preprocessor receives an invalid image_grid_w=-1.

🛡️ Proposed fix
-    auto [img, grid_thw] = image_grid_h > 0 || image_grid_w > 0
+    auto [img, grid_thw] = (image_grid_h > 0 && image_grid_w > 0)
                                 ? image_preprocessor_(message.img_file_path, image_grid_h, image_grid_w)
                                 : image_preprocessor_(message.img_file_path);
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@mllm/models/qwen2vl/tokenization_qwen2vl.hpp` around lines 246 - 248, The
condition that decides whether to pass explicit grid dimensions to
image_preprocessor_ is using ||, allowing invalid single-dimension overrides;
change the guard to require both image_grid_h and image_grid_w to be > 0 (use
&&) so image_preprocessor_(message.img_file_path, image_grid_h, image_grid_w) is
only called when both dimensions are valid, otherwise call
image_preprocessor_(message.img_file_path) to use defaults; update the
conditional around image_preprocessor_ where image_grid_h and image_grid_w are
used.

Comment on lines +28 to +40
while IFS='|' read -r case_id image_file prompt; do
[[ -n "${case_id}" ]] || continue
[[ -n "${image_file}" ]] || continue
if [[ -n "${pushed[${image_file}]:-}" ]]; then
continue
fi
local_image="${ASSET_SRC}/${image_file}"
if [[ ! -f "${local_image}" ]]; then
echo "Missing local image for case ${case_id}: ${local_image}" >&2
exit 1
fi
echo "Pushing ${image_file}"
"${ADB_BIN}" push "${local_image}" "${REMOTE_IMAGE_DIR}/${image_file}" </dev/null
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Validate image_file before using it in path joins.

Line 34 and Line 40 trust image_file directly. A crafted value (absolute path or traversal) can read unintended local files and push to unintended remote locations.

Suggested fix
 while IFS='|' read -r case_id image_file prompt; do
   [[ -n "${case_id}" ]] || continue
   [[ -n "${image_file}" ]] || continue
+  case "${image_file}" in
+    /*|../*|*/../*|..|*"/.."|*"\\"*)
+      echo "Invalid image_file for case ${case_id}: ${image_file}" >&2
+      exit 1
+      ;;
+  esac
   if [[ -n "${pushed[${image_file}]:-}" ]]; then
     continue
   fi
As per coding guidelines, "Validate inputs for public APIs and critical internal functions."
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
while IFS='|' read -r case_id image_file prompt; do
[[ -n "${case_id}" ]] || continue
[[ -n "${image_file}" ]] || continue
if [[ -n "${pushed[${image_file}]:-}" ]]; then
continue
fi
local_image="${ASSET_SRC}/${image_file}"
if [[ ! -f "${local_image}" ]]; then
echo "Missing local image for case ${case_id}: ${local_image}" >&2
exit 1
fi
echo "Pushing ${image_file}"
"${ADB_BIN}" push "${local_image}" "${REMOTE_IMAGE_DIR}/${image_file}" </dev/null
while IFS='|' read -r case_id image_file prompt; do
[[ -n "${case_id}" ]] || continue
[[ -n "${image_file}" ]] || continue
case "${image_file}" in
/*|../*|*/../*|..|*"/.."|*"\\"*)
echo "Invalid image_file for case ${case_id}: ${image_file}" >&2
exit 1
;;
esac
if [[ -n "${pushed[${image_file}]:-}" ]]; then
continue
fi
local_image="${ASSET_SRC}/${image_file}"
if [[ ! -f "${local_image}" ]]; then
echo "Missing local image for case ${case_id}: ${local_image}" >&2
exit 1
fi
echo "Pushing ${image_file}"
"${ADB_BIN}" push "${local_image}" "${REMOTE_IMAGE_DIR}/${image_file}" </dev/null
🧰 Tools
🪛 Shellcheck (0.11.0)

[warning] 28-28: prompt appears unused. Verify use (or export if used externally).

(SC2034)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/qwen2vl_qnn/push_eval_assets.sh` around lines 28 - 40, Validate and
sanitize image_file before using it in path joins: reject or normalize values
that start with '/' or contain '..' or path separators that escape ASSET_SRC,
control/null bytes, or other unexpected characters; ensure local_image is
constructed only from ASSET_SRC + safe basename(image_file) (or canonicalize and
verify it remains under ASSET_SRC) and ensure the remote path uses only a safe
filename (not a full path) before calling "${ADB_BIN}" push; also keep the
pushed[...] check and error handling but perform this validation immediately
after reading image_file to prevent absolute/traversal attacks.

Comment on lines +74 to +77
while IFS='|' read -r case_id image_file prompt; do
remote_image="${REMOTE_IMAGE_DIR}/${image_file}"
log_file="${OUT_DIR}/${case_id}.log"
visual_args=""
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Sanitize case_id before building the log file path.

Line 76 uses untrusted case data as a filename. case_id values with separators/traversal tokens can escape OUT_DIR or produce invalid paths.

Suggested fix
 while IFS='|' read -r case_id image_file prompt; do
+  [[ -n "${case_id}" ]] || continue
+  [[ -n "${image_file}" ]] || continue
+  safe_case_id="${case_id//[^A-Za-z0-9_.-]/_}"
+  [[ -n "${safe_case_id}" ]] || safe_case_id="case"
   remote_image="${REMOTE_IMAGE_DIR}/${image_file}"
-  log_file="${OUT_DIR}/${case_id}.log"
+  log_file="${OUT_DIR}/${safe_case_id}.log"
As per coding guidelines, "Validate inputs for public APIs and critical internal functions."
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
while IFS='|' read -r case_id image_file prompt; do
remote_image="${REMOTE_IMAGE_DIR}/${image_file}"
log_file="${OUT_DIR}/${case_id}.log"
visual_args=""
while IFS='|' read -r case_id image_file prompt; do
[[ -n "${case_id}" ]] || continue
[[ -n "${image_file}" ]] || continue
safe_case_id="${case_id//[^A-Za-z0-9_.-]/_}"
[[ -n "${safe_case_id}" ]] || safe_case_id="case"
remote_image="${REMOTE_IMAGE_DIR}/${image_file}"
log_file="${OUT_DIR}/${safe_case_id}.log"
visual_args=""
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/qwen2vl_qnn/run_qnn_eval_fixed_set.sh` around lines 74 - 77, Sanitize
the untrusted case_id before composing log_file to prevent path traversal or
invalid filenames: in the while read loop sanitize the case_id variable used
when creating log_file (and remote_image if reused) by removing or replacing
path separators and traversal tokens and allowing only a safe whitelist (e.g.,
alphanumerics, dot, underscore, hyphen), ensure the sanitized value is non-empty
(fallback to a safe default like "case_<n>"), then use that sanitized_case_id
when building log_file="${OUT_DIR}/${sanitized_case_id}.log"; reference case_id,
log_file, OUT_DIR and the while IFS='|' read -r case_id ... loop when making the
change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants