Add Qwen2-VL full QNN AOT example#676
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
✅ Files skipped from review due to trivial changes (1)
📝 WalkthroughWalkthroughThis PR adds full Qwen2VL QNN AOT support: model configs and traceable modules, visual AOT rewrites and bundle segmentation, quant recipe and PTQ extensions (LayerNorm/GELU/Linear), QNN runtime robustness fixes, visual runners/diagnostics, deployment scripts, and build/documentation updates. ChangesQwen2VL QNN AOT End-to-End Implementation
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
✨ Finishing Touches🧪 Generate unit tests (beta)
|
|
@twlddd Thanks for your PR! Could you provide documentation on how to run VLM model? This would help us test and reproduce the results. |
Thanks for the suggestion. I have added a “Run Qwen2-VL VLM on Android” section to The new documentation includes:
The large runtime artifacts are not included in this PR. They can be downloaded here: https://www.modelscope.cn/models/twlddd/Qwen2-VL-2B-Instruct-Full-QNN-AOT-for-mllm/summary I will keep the PR focused on the source code and reproducible run instructions, while hosting the large model/context artifacts externally. |
There was a problem hiding this comment.
Actionable comments posted: 11
🧹 Nitpick comments (8)
mllm/models/ARGeneration.cpp (1)
290-343: ⚡ Quick winConsider consistent units across all timing metrics.
The main performance metrics now display in seconds (lines 311-343), but the custom events section (lines 348-359, specifically line 356) still reports in microseconds. This creates a unit inconsistency in the output. Since "ViT" is itself a custom event but is handled separately in the main summary, consider converting custom events to seconds as well for consistency.
♻️ Suggested improvement for unit consistency
fmt::print(fg(fmt::color::white), "{:<20}: ", name); - fmt::print(fg(fmt::color::yellow), "{:>10.2f} μs\n", (double)duration); + fmt::print(fg(fmt::color::yellow), "{:>10.3f} s\n", (double)duration / 1000000.0);🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@mllm/models/ARGeneration.cpp` around lines 290 - 343, The perf summary mixes seconds and microseconds: custom_event_time_ entries (e.g., the "ViT" handling using vit_duration) are printed in microseconds while the main summary (prefill_seconds, decode_seconds, total_seconds, etc.) uses seconds; convert all custom event durations to seconds before printing to keep units consistent. Locate the block using custom_event_time_ and vit_duration (and where custom events are iterated/printed) in ARGeneration.cpp and change the duration computation to divide microseconds by 1e6 (or cast to seconds using std::chrono::duration_cast to seconds) and update the printed label/format to show "s" and match the same fmt formatting (e.g., {:>10.3f} s) used for prefill_seconds/decode_seconds so all timings use seconds consistently.mllm/models/qwen2vl/image_preprocessor_qwen2vl.hpp (1)
71-78: ⚡ Quick winDocument the new grid-override overload contract.
This new public API should state expected units (
grid_h/grid_ware patch-grid units), constraints (must be divisible bymerge_size_), and thrown error behavior.As per coding guidelines, "Ensure public APIs, classes, and functions have clear docstrings or comments explaining purpose, parameters, returns, and errors."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@mllm/models/qwen2vl/image_preprocessor_qwen2vl.hpp` around lines 71 - 78, Add a short doc comment above the inline operator()(const std::string& image_path, int32_t grid_h, int32_t grid_w) explaining that grid_h and grid_w are specified in patch-grid units (not pixels), that the function rescales the image to grid_h * patch_size_ by grid_w * patch_size_, that grid_h and grid_w must be positive and divisible by merge_size_ (otherwise MLLM_ERROR_EXIT with ExitCode::kIOError is raised), and document the return value as a pair<Tensor, Tensor> produced by Image::open(image_path) -> processResizedImage(...) and any exceptions/exit behavior; reference merge_size_, patch_size_, Image::open, and processResizedImage in the comment so callers understand constraints and side effects.examples/qwen2vl_qnn_aot/compile_visual.cpp (1)
37-193: ⚖️ Poor tradeoffCode duplication with
visual_aot_helpers.hpp.The classes
PatchEmbedLinear,VisionMlpPrimitiveQuickGELU,Qwen2VLVisionBlockAOTRewrite, andQwen2VisionTransformerPretrainedModelAOTRewriteare duplicated here and invisual_aot_helpers.hpp. Consider reusing the helper header to reduce maintenance burden.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@examples/qwen2vl_qnn_aot/compile_visual.cpp` around lines 37 - 193, The file duplicates implementations already present in visual_aot_helpers.hpp; remove the local definitions of PatchEmbedLinear, VisionMlpPrimitiveQuickGELU, Qwen2VLVisionBlockAOTRewrite, and Qwen2VisionTransformerPretrainedModelAOTRewrite and instead reuse the helpers by including visual_aot_helpers.hpp and using the existing classes/constructors (keep references to symbols like PatchEmbedLinear, VisionMlpPrimitiveQuickGELU, Qwen2VLVisionBlockAOTRewrite, Qwen2VisionTransformerPretrainedModelAOTRewrite, patch_embed_, patch_merger_, blocks_ and their reg<...> registrations); update any constructor calls or reg<...> invocations to match the helper API and delete the duplicated code to avoid maintenance drift.examples/qwen2vl_qnn_aot/visual_aot_run.cpp (1)
382-498: 💤 Low valueConsider extracting the bundle execution logic into a helper function.
The four bundle layout branches (
6x8,tail4,early2,block1) share very similar patterns: compute graph_limit, then conditionally run each graph in sequence. This could be consolidated using a data-driven approach with a list of graph names per layout, similar tographNamesForLayout()invisual_padding_ab_diag.cpp.However, the current explicit approach is also readable and the example code is functional.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@examples/qwen2vl_qnn_aot/visual_aot_run.cpp` around lines 382 - 498, The four branches that handle bundle_layout.get() ("6x8", "tail4", "early2", "block1") duplicate the pattern of computing graph_limit, reserving results, and pushing sequential runVisualGraph(...) calls; refactor by extracting the execution logic into a helper (e.g., buildAndRunBundle or runBundleGraphs) that takes the layout string or a vector<string> of graph names (like graphNamesForLayout()), the max_graphs option, and the common parameters (img_qnn, sin_qnn, cos_qnn, hidden_a/hidden_b, visual_embeddings, compare_segments), computes graph_limit, iterates the graph name list and calls runVisualGraph in sequence while pushing to results, and then replace each branch with a call to this helper (special-case "block1" graph list generation to produce the 32 indexed names and visual_merger at the end).mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.hpp (1)
19-26: ⚡ Quick winAdd brief API comments for the new simple-graph lowering interfaces.
Please document expected parameters (
root_graph_name,qnn_graph_name), return semantics, and failure conditions for these public methods/functions.As per coding guidelines, "Ensure public APIs, classes, and functions have clear docstrings or comments explaining purpose, parameters, returns, and errors."
Also applies to: 38-38
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.hpp` around lines 19 - 26, Add concise API comments to the LLM2QnnLoweringPass class documenting the purpose and behavior of its public methods: describe what LLM2QnnLoweringPass represents, then for run(const ir::node_ptr_t& op) state what kind of node it expects, the meaning of the returned uint8_t (e.g., status codes for success/failure), and under what conditions it fails; for runSimpleGraph(const ir::node_ptr_t& op, const std::string& root_graph_name, const std::string& qnn_graph_name) document the expected semantics of root_graph_name and qnn_graph_name (what graphs they identify), what the method does (lowers a simple graph into a QNN graph), the return value meanings (success, partial success, error codes) and any error/failure conditions (missing nodes, name collisions, invalid graph shapes), and note any side effects (modifies IR, creates new graph named qnn_graph_name) so callers know expected behavior.mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp (1)
291-291: ⚡ Quick winUse
ir::IRWriter::WalkResult::WALK_CONTINUEfor consistency.
mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cppusesir::IRWriter::WALK_CONTINUEonly at line 291, while surrounding/other code usesir::IRWriter::WalkResult::WALK_CONTINUE. Both forms work with the unscopedenum WalkResultinmllm/compile/ir/Node.hpp, but keep it consistent.Suggested fix
- return ir::IRWriter::WALK_CONTINUE; + return ir::IRWriter::WalkResult::WALK_CONTINUE;🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp` at line 291, Replace the unscoped enum value usage in LLM2QnnLoweringPass by returning the scoped enum constant; specifically change the return in the visitor/visitor-like routine in LLM2QnnLoweringPass.cpp from ir::IRWriter::WALK_CONTINUE to ir::IRWriter::WalkResult::WALK_CONTINUE (keep the rest of the function and surrounding code intact) so the file matches the surrounding usage pattern of ir::IRWriter::WalkResult.mllm/backends/qnn/aot/QnnWrappersAPI.cpp (1)
153-172: 💤 Low valueConsider using named constants for quantization offsets.
The offset values (-128, -32768, 0) are standard symmetric quantization offsets for converting unsigned to signed representation. While correct, named constants would improve readability and maintainability.
💡 Example constants
// Near top of file or in a header constexpr int32_t kSymmetricOffsetUInt8 = -128; // 2^7 constexpr int32_t kSymmetricOffsetUInt16 = -32768; // 2^15 constexpr int32_t kSymmetricOffsetSigned = 0;🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@mllm/backends/qnn/aot/QnnWrappersAPI.cpp` around lines 153 - 172, Replace magic numbers used for quantization offsets with named constexprs and use them in the switch handling cfg->quant_to_type: define constants (e.g., kSymmetricOffsetUInt8 = -128, kSymmetricOffsetUInt16 = -32768, kSymmetricOffsetSigned = 0) in the file scope or appropriate header, then update the switch cases for kUInt8, kUInt16, kInt8/kInt16 to assign those constants to offset (leave MLLM_ERROR_EXIT unchanged for default). Also add a short comment near the constants explaining they represent symmetric quantization offsets for unsigned-to-signed conversion.mllm/backends/qnn/aot/passes/AOTPipeline.cpp (1)
41-57: 💤 Low valueConsider adding documentation for public API function.
The new
createQnnAOTSimpleLoweringPipelinefunction is a public API but lacks a docstring explaining its purpose, parameters (especially theqnn_graph_nameparameter), and how it differs fromcreateQnnAOTLoweringPipeline. As per coding guidelines, public APIs should have clear docstrings explaining purpose, parameters, returns, and errors.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@mllm/backends/qnn/aot/passes/AOTPipeline.cpp` around lines 41 - 57, Add a docstring above the public function createQnnAOTSimpleLoweringPipeline describing its purpose (builds a simple AOT lowering pass pipeline for QNN), each parameter (env: QnnAOTEnv* used to configure AOTCompileContext, config_path: path to config, pf: ParameterFile::ptr_t for params, qnn_graph_name: name of the target QNN graph and how it alters lowering behavior), the return value (vector of ir::Pass::ptr_t containing the sequence of passes), and how it differs from createQnnAOTLoweringPipeline (e.g. simpler pass sequence or intended use-case). Also mention any error/edge conditions (null env or pf expectations) and where the docstring should be placed (immediately above createQnnAOTSimpleLoweringPipeline).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@examples/qwen2vl_qnn_aot/compile.cpp`:
- Around line 696-700: The code calls MLLM_ERROR_EXIT which aborts the process,
so the following Argparse::printHelp() and return -1 are dead/unreachable; fix
by moving the help/return before calling MLLM_ERROR_EXIT or remove the
unreachable statements: either call Argparse::printHelp() and return -1 when
required arguments are missing, or call MLLM_ERROR_EXIT (and only it) to
terminate; update the conditional around model_path, model_cfg_path,
qnn_aot_cfg_files, and output_context_path accordingly (refer to MLLM_ERROR_EXIT
and Argparse::printHelp()).
In `@examples/qwen2vl_qnn_aot/modeling_qwen2vl_qnn_aot.hpp`:
- Around line 502-504: The lm_head_ is only registered when
cfg.tie_word_embeddings is true but later always accessed, causing a runtime
crash; fix by either registering lm_head_ unconditionally or guarding its usage.
Specifically, in the constructor ensure lm_head_ is always created (register a
Conv2D "lm_head" when cfg.tie_word_embeddings is false as well) or change the
later access sites to check cfg.tie_word_embeddings (or lm_head_ non-null)
before using lm_head_ in the forward/usage code; update the initialization and
any code that assumes lm_head_ exists (references to lm_head_ and
cfg.tie_word_embeddings) so the creation and consumption lifecycles match.
- Around line 149-156: The bias quantization loop in makeQuantizedBias uses
scale_value (from scale.item<mllm_fp32_t>()) as a divisor without validation
which can cause divide-by-zero or invalid results; update makeQuantizedBias to
validate scale_value (and optionally check for NaN/negative) before the loop—if
scale_value <= 0 or not finite, either throw/log a clear error or replace it
with a small positive epsilon; then proceed to compute quantized =
std::lround(bias.ptr<mllm_fp32_t>()[i] / scale_value) + zero_point_value and
clamp into quant_bias.ptr<mllm_uint16_t>()[i] as before, keeping the rest of the
logic unchanged.
In `@examples/qwen2vl_qnn_aot/visual_aot_helpers.hpp`:
- Around line 265-266: The code calling std::stoi on substrings for grid_h and
grid_w (using item and sep to split) can throw std::invalid_argument or
std::out_of_range for malformed or oversized numeric text; wrap the two
std::stoi calls (the ones assigning grid_h and grid_w) in a try-catch, validate
the substrings before conversion (e.g., non-empty and contain only optional +/-
and digits) or catch std::invalid_argument and std::out_of_range and
return/propagate a clear error (or set a default/failure path) so the caller
doesn’t crash; update any surrounding parsing function to handle the failure
case accordingly and reference the item, sep, grid_h, and grid_w symbols when
applying the change.
In `@mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp`:
- Around line 1002-1013: If a LinearOp reports options().bias but the ".bias"
symbol is missing, fail immediately instead of silently skipping: after
computing bias_name and looking up bias_reg_tensor_ir via
writer.getContext()->lookupSymbolTable(bias_name), if bias_reg_tensor_ir is null
return/fail (use the project's error macro like MLLM_RETURN_FALSE or equivalent)
with a clear message referencing bias_name; keep the existing checks
(MLLM_RETURN_FALSE_IF_NOT for RegisterOp and TensorValue) and the subsequent
creation of bias_quant_spec and annotation_attr unchanged.
In `@mllm/backends/qnn/aot/passes/PTQPass.cpp`:
- Around line 56-57: The UInt4 validation in PTQPass.cpp is off-by-one: update
the checkTypeLimits call for scale1 (currently checking range 0..16) to use the
correct maximum 15 for a 4-bit unsigned value; locate the call
checkTypeLimits<uint8_t>(scale1, 0, 16) and change the upper bound to 15 so the
UInt4 range is validated as [0,15] (keep the Int4
checkTypeLimits<int8_t>(weight, 0, 15) as-is).
In `@mllm/backends/qnn/aot/visitor/LayerNorm.cpp`:
- Around line 68-72: The current code assumes input->tensor_.shape().size() > 0
and writes axes_data[0] = input->tensor_.shape().size() - 1 which underflows for
rank-0 tensors; update the LayerNorm axis calculation to guard the rank: compute
size_t rank = input->tensor_.shape().size(); uint32_t axis = rank == 0 ? 0u :
static_cast<uint32_t>(rank - 1); then write axes_data[0] = axis (keeping
axes_dims, axes_param, axes_data, base_op->getName(), and
qnn_op_node->emplaceParamTensor unchanged) so no underflow occurs for rank-0
inputs.
- Around line 48-63: The code dereferences symbols returned by
writer.getContext()->lookupSymbolTable(base_op->getName() + ".weight"/".bias")
and then calls outputs().front()->cast_<ir::tensor::TensorValue>() without
checks; add null/type guards to avoid crashes: verify the lookupSymbolTable
result is non-null, that its outputs() is non-empty, and that the front() value
can be safely cast to ir::tensor::TensorValue before calling
env->captureQnnAOTNodeTensor and qnn_op_node->emplaceInput; on failure, handle
gracefully (return/skip/log/throw) with a clear message including the symbol
name and base_op->getName() to aid debugging.
In `@mllm/models/qwen2vl/tokenization_qwen2vl.hpp`:
- Around line 246-248: The condition that decides whether to pass explicit grid
dimensions to image_preprocessor_ is using ||, allowing invalid single-dimension
overrides; change the guard to require both image_grid_h and image_grid_w to be
> 0 (use &&) so image_preprocessor_(message.img_file_path, image_grid_h,
image_grid_w) is only called when both dimensions are valid, otherwise call
image_preprocessor_(message.img_file_path) to use defaults; update the
conditional around image_preprocessor_ where image_grid_h and image_grid_w are
used.
In `@scripts/qwen2vl_qnn/push_eval_assets.sh`:
- Around line 28-40: Validate and sanitize image_file before using it in path
joins: reject or normalize values that start with '/' or contain '..' or path
separators that escape ASSET_SRC, control/null bytes, or other unexpected
characters; ensure local_image is constructed only from ASSET_SRC + safe
basename(image_file) (or canonicalize and verify it remains under ASSET_SRC) and
ensure the remote path uses only a safe filename (not a full path) before
calling "${ADB_BIN}" push; also keep the pushed[...] check and error handling
but perform this validation immediately after reading image_file to prevent
absolute/traversal attacks.
In `@scripts/qwen2vl_qnn/run_qnn_eval_fixed_set.sh`:
- Around line 74-77: Sanitize the untrusted case_id before composing log_file to
prevent path traversal or invalid filenames: in the while read loop sanitize the
case_id variable used when creating log_file (and remote_image if reused) by
removing or replacing path separators and traversal tokens and allowing only a
safe whitelist (e.g., alphanumerics, dot, underscore, hyphen), ensure the
sanitized value is non-empty (fallback to a safe default like "case_<n>"), then
use that sanitized_case_id when building
log_file="${OUT_DIR}/${sanitized_case_id}.log"; reference case_id, log_file,
OUT_DIR and the while IFS='|' read -r case_id ... loop when making the change.
---
Nitpick comments:
In `@examples/qwen2vl_qnn_aot/compile_visual.cpp`:
- Around line 37-193: The file duplicates implementations already present in
visual_aot_helpers.hpp; remove the local definitions of PatchEmbedLinear,
VisionMlpPrimitiveQuickGELU, Qwen2VLVisionBlockAOTRewrite, and
Qwen2VisionTransformerPretrainedModelAOTRewrite and instead reuse the helpers by
including visual_aot_helpers.hpp and using the existing classes/constructors
(keep references to symbols like PatchEmbedLinear, VisionMlpPrimitiveQuickGELU,
Qwen2VLVisionBlockAOTRewrite, Qwen2VisionTransformerPretrainedModelAOTRewrite,
patch_embed_, patch_merger_, blocks_ and their reg<...> registrations); update
any constructor calls or reg<...> invocations to match the helper API and delete
the duplicated code to avoid maintenance drift.
In `@examples/qwen2vl_qnn_aot/visual_aot_run.cpp`:
- Around line 382-498: The four branches that handle bundle_layout.get() ("6x8",
"tail4", "early2", "block1") duplicate the pattern of computing graph_limit,
reserving results, and pushing sequential runVisualGraph(...) calls; refactor by
extracting the execution logic into a helper (e.g., buildAndRunBundle or
runBundleGraphs) that takes the layout string or a vector<string> of graph names
(like graphNamesForLayout()), the max_graphs option, and the common parameters
(img_qnn, sin_qnn, cos_qnn, hidden_a/hidden_b, visual_embeddings,
compare_segments), computes graph_limit, iterates the graph name list and calls
runVisualGraph in sequence while pushing to results, and then replace each
branch with a call to this helper (special-case "block1" graph list generation
to produce the 32 indexed names and visual_merger at the end).
In `@mllm/backends/qnn/aot/passes/AOTPipeline.cpp`:
- Around line 41-57: Add a docstring above the public function
createQnnAOTSimpleLoweringPipeline describing its purpose (builds a simple AOT
lowering pass pipeline for QNN), each parameter (env: QnnAOTEnv* used to
configure AOTCompileContext, config_path: path to config, pf:
ParameterFile::ptr_t for params, qnn_graph_name: name of the target QNN graph
and how it alters lowering behavior), the return value (vector of
ir::Pass::ptr_t containing the sequence of passes), and how it differs from
createQnnAOTLoweringPipeline (e.g. simpler pass sequence or intended use-case).
Also mention any error/edge conditions (null env or pf expectations) and where
the docstring should be placed (immediately above
createQnnAOTSimpleLoweringPipeline).
In `@mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cpp`:
- Line 291: Replace the unscoped enum value usage in LLM2QnnLoweringPass by
returning the scoped enum constant; specifically change the return in the
visitor/visitor-like routine in LLM2QnnLoweringPass.cpp from
ir::IRWriter::WALK_CONTINUE to ir::IRWriter::WalkResult::WALK_CONTINUE (keep the
rest of the function and surrounding code intact) so the file matches the
surrounding usage pattern of ir::IRWriter::WalkResult.
In `@mllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.hpp`:
- Around line 19-26: Add concise API comments to the LLM2QnnLoweringPass class
documenting the purpose and behavior of its public methods: describe what
LLM2QnnLoweringPass represents, then for run(const ir::node_ptr_t& op) state
what kind of node it expects, the meaning of the returned uint8_t (e.g., status
codes for success/failure), and under what conditions it fails; for
runSimpleGraph(const ir::node_ptr_t& op, const std::string& root_graph_name,
const std::string& qnn_graph_name) document the expected semantics of
root_graph_name and qnn_graph_name (what graphs they identify), what the method
does (lowers a simple graph into a QNN graph), the return value meanings
(success, partial success, error codes) and any error/failure conditions
(missing nodes, name collisions, invalid graph shapes), and note any side
effects (modifies IR, creates new graph named qnn_graph_name) so callers know
expected behavior.
In `@mllm/backends/qnn/aot/QnnWrappersAPI.cpp`:
- Around line 153-172: Replace magic numbers used for quantization offsets with
named constexprs and use them in the switch handling cfg->quant_to_type: define
constants (e.g., kSymmetricOffsetUInt8 = -128, kSymmetricOffsetUInt16 = -32768,
kSymmetricOffsetSigned = 0) in the file scope or appropriate header, then update
the switch cases for kUInt8, kUInt16, kInt8/kInt16 to assign those constants to
offset (leave MLLM_ERROR_EXIT unchanged for default). Also add a short comment
near the constants explaining they represent symmetric quantization offsets for
unsigned-to-signed conversion.
In `@mllm/models/ARGeneration.cpp`:
- Around line 290-343: The perf summary mixes seconds and microseconds:
custom_event_time_ entries (e.g., the "ViT" handling using vit_duration) are
printed in microseconds while the main summary (prefill_seconds, decode_seconds,
total_seconds, etc.) uses seconds; convert all custom event durations to seconds
before printing to keep units consistent. Locate the block using
custom_event_time_ and vit_duration (and where custom events are
iterated/printed) in ARGeneration.cpp and change the duration computation to
divide microseconds by 1e6 (or cast to seconds using std::chrono::duration_cast
to seconds) and update the printed label/format to show "s" and match the same
fmt formatting (e.g., {:>10.3f} s) used for prefill_seconds/decode_seconds so
all timings use seconds consistently.
In `@mllm/models/qwen2vl/image_preprocessor_qwen2vl.hpp`:
- Around line 71-78: Add a short doc comment above the inline operator()(const
std::string& image_path, int32_t grid_h, int32_t grid_w) explaining that grid_h
and grid_w are specified in patch-grid units (not pixels), that the function
rescales the image to grid_h * patch_size_ by grid_w * patch_size_, that grid_h
and grid_w must be positive and divisible by merge_size_ (otherwise
MLLM_ERROR_EXIT with ExitCode::kIOError is raised), and document the return
value as a pair<Tensor, Tensor> produced by Image::open(image_path) ->
processResizedImage(...) and any exceptions/exit behavior; reference
merge_size_, patch_size_, Image::open, and processResizedImage in the comment so
callers understand constraints and side effects.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: a35a1f92-c40a-4455-84aa-5dd0feaef317
⛔ Files ignored due to path filters (2)
scripts/qwen2vl_qnn/qwen2vl_eval_cases.tsvis excluded by!**/*.tsvscripts/qwen2vl_qnn/qwen2vl_eval_cases_5bucket.tsvis excluded by!**/*.tsv
📒 Files selected for processing (51)
.gitignoreexamples/CMakeLists.txtexamples/qwen2vl/config_2B_qnn_lpbq.jsonexamples/qwen2vl/main.cppexamples/qwen2vl/qnn_aot_cfg_2B.jsonexamples/qwen2vl/qnn_aot_cfg_2B_lpbq_vprojg16_unsignedpd.jsonexamples/qwen2vl/qnn_aot_cfg_2B_visual.jsonexamples/qwen2vl/qnn_aot_cfg_2B_visual_fp16.jsonexamples/qwen2vl_qnn_aot/CMakeLists.txtexamples/qwen2vl_qnn_aot/README.mdexamples/qwen2vl_qnn_aot/aot_run.cppexamples/qwen2vl_qnn_aot/compile.cppexamples/qwen2vl_qnn_aot/compile_visual.cppexamples/qwen2vl_qnn_aot/modeling_qwen2vl_qnn_aot.hppexamples/qwen2vl_qnn_aot/visual_aot_helpers.hppexamples/qwen2vl_qnn_aot/visual_aot_run.cppexamples/qwen2vl_qnn_aot/visual_padding_ab_diag.cppexamples/qwen2vl_qnn_aot/visual_padding_diag.cppmllm/backends/cpu/ops/VisionRoPEOp.cppmllm/backends/qnn/QNNAllocator.cppmllm/backends/qnn/QNNBackend.cppmllm/backends/qnn/QNNModel.cppmllm/backends/qnn/QNNUtils.hppmllm/backends/qnn/aot/QnnWrappersAPI.cppmllm/backends/qnn/aot/passes/AOTPipeline.cppmllm/backends/qnn/aot/passes/AOTPipeline.hppmllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.cppmllm/backends/qnn/aot/passes/LLM2QnnLoweringPass.hppmllm/backends/qnn/aot/passes/LLMQuantRecipePass.cppmllm/backends/qnn/aot/passes/LLMQuantRecipePass.hppmllm/backends/qnn/aot/passes/PTQPass.cppmllm/backends/qnn/aot/visitor/Conv2D.cppmllm/backends/qnn/aot/visitor/GELU.cppmllm/backends/qnn/aot/visitor/GELU.hppmllm/backends/qnn/aot/visitor/LayerNorm.cppmllm/backends/qnn/aot/visitor/LayerNorm.hppmllm/backends/qnn/aot/visitor/Matmul.cppmllm/backends/qnn/aot/visitor/Repeat.cppmllm/backends/qnn/aot/visitor/View.cppmllm/core/aops/ElewiseOps.cppmllm/core/aops/VisionRoPEOp.cppmllm/mllm.hppmllm/models/ARGeneration.cppmllm/models/qwen2vl/image_preprocessor_qwen2vl.hppmllm/models/qwen2vl/modeling_qwen2vl.hppmllm/models/qwen2vl/modeling_qwen2vl_traceable.hppmllm/models/qwen2vl/tokenization_qwen2vl.hppscripts/qwen2vl_qnn/common.shscripts/qwen2vl_qnn/push_eval_assets.shscripts/qwen2vl_qnn/run_qnn_eval_fixed_set.shscripts/qwen2vl_qnn/run_qnn_interactive.sh
| if (!model_path.isSet() || !model_cfg_path.isSet() || !qnn_aot_cfg_files.isSet() || !output_context_path.isSet()) { | ||
| MLLM_ERROR_EXIT(mllm::ExitCode::kCoreError, "Missing required argument."); | ||
| Argparse::printHelp(); | ||
| return -1; | ||
| } |
There was a problem hiding this comment.
Dead code after MLLM_ERROR_EXIT.
MLLM_ERROR_EXIT terminates execution, making the subsequent Argparse::printHelp() and return -1 unreachable.
🧹 Suggested fix
if (!model_path.isSet() || !model_cfg_path.isSet() || !qnn_aot_cfg_files.isSet() || !output_context_path.isSet()) {
+ Argparse::printHelp();
MLLM_ERROR_EXIT(mllm::ExitCode::kCoreError, "Missing required argument.");
- Argparse::printHelp();
- return -1;
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if (!model_path.isSet() || !model_cfg_path.isSet() || !qnn_aot_cfg_files.isSet() || !output_context_path.isSet()) { | |
| MLLM_ERROR_EXIT(mllm::ExitCode::kCoreError, "Missing required argument."); | |
| Argparse::printHelp(); | |
| return -1; | |
| } | |
| if (!model_path.isSet() || !model_cfg_path.isSet() || !qnn_aot_cfg_files.isSet() || !output_context_path.isSet()) { | |
| Argparse::printHelp(); | |
| MLLM_ERROR_EXIT(mllm::ExitCode::kCoreError, "Missing required argument."); | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@examples/qwen2vl_qnn_aot/compile.cpp` around lines 696 - 700, The code calls
MLLM_ERROR_EXIT which aborts the process, so the following Argparse::printHelp()
and return -1 are dead/unreachable; fix by moving the help/return before calling
MLLM_ERROR_EXIT or remove the unreachable statements: either call
Argparse::printHelp() and return -1 when required arguments are missing, or call
MLLM_ERROR_EXIT (and only it) to terminate; update the conditional around
model_path, model_cfg_path, qnn_aot_cfg_files, and output_context_path
accordingly (refer to MLLM_ERROR_EXIT and Argparse::printHelp()).
| const auto scale_value = scale.item<mllm_fp32_t>(); | ||
| const auto zero_point_value = zero_point.item<mllm_int32_t>(); | ||
| auto quant_bias = Tensor::empty({1, 1, 1, channels}, kUInt16).alloc(); | ||
|
|
||
| for (int32_t i = 0; i < channels; ++i) { | ||
| const auto quantized = | ||
| static_cast<int32_t>(std::lround(bias.ptr<mllm_fp32_t>()[i] / scale_value)) + zero_point_value; | ||
| quant_bias.ptr<mllm_uint16_t>()[i] = static_cast<mllm_uint16_t>(std::clamp(quantized, 0, 65535)); |
There was a problem hiding this comment.
Validate quant scale before bias quantization.
makeQuantizedBias divides by scale_value without guarding zero/negative values. A bad exported scale can produce invalid output or undefined behavior in this hot path.
Suggested fix
const auto scale_value = scale.item<mllm_fp32_t>();
const auto zero_point_value = zero_point.item<mllm_int32_t>();
+ if (!(scale_value > 0.f)) {
+ MLLM_ERROR_EXIT(ExitCode::kCoreError, "Invalid quant scale {} for {}", scale_value, scale_name);
+ }
auto quant_bias = Tensor::empty({1, 1, 1, channels}, kUInt16).alloc();As per coding guidelines, "Validate inputs for public APIs and critical internal functions."
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| const auto scale_value = scale.item<mllm_fp32_t>(); | |
| const auto zero_point_value = zero_point.item<mllm_int32_t>(); | |
| auto quant_bias = Tensor::empty({1, 1, 1, channels}, kUInt16).alloc(); | |
| for (int32_t i = 0; i < channels; ++i) { | |
| const auto quantized = | |
| static_cast<int32_t>(std::lround(bias.ptr<mllm_fp32_t>()[i] / scale_value)) + zero_point_value; | |
| quant_bias.ptr<mllm_uint16_t>()[i] = static_cast<mllm_uint16_t>(std::clamp(quantized, 0, 65535)); | |
| const auto scale_value = scale.item<mllm_fp32_t>(); | |
| const auto zero_point_value = zero_point.item<mllm_int32_t>(); | |
| if (!(scale_value > 0.f)) { | |
| MLLM_ERROR_EXIT(ExitCode::kCoreError, "Invalid quant scale {} for {}", scale_value, scale_name); | |
| } | |
| auto quant_bias = Tensor::empty({1, 1, 1, channels}, kUInt16).alloc(); | |
| for (int32_t i = 0; i < channels; ++i) { | |
| const auto quantized = | |
| static_cast<int32_t>(std::lround(bias.ptr<mllm_fp32_t>()[i] / scale_value)) + zero_point_value; | |
| quant_bias.ptr<mllm_uint16_t>()[i] = static_cast<mllm_uint16_t>(std::clamp(quantized, 0, 65535)); |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@examples/qwen2vl_qnn_aot/modeling_qwen2vl_qnn_aot.hpp` around lines 149 -
156, The bias quantization loop in makeQuantizedBias uses scale_value (from
scale.item<mllm_fp32_t>()) as a divisor without validation which can cause
divide-by-zero or invalid results; update makeQuantizedBias to validate
scale_value (and optionally check for NaN/negative) before the loop—if
scale_value <= 0 or not finite, either throw/log a clear error or replace it
with a small positive epsilon; then proceed to compute quantized =
std::lround(bias.ptr<mllm_fp32_t>()[i] / scale_value) + zero_point_value and
clamp into quant_bias.ptr<mllm_uint16_t>()[i] as before, keeping the rest of the
logic unchanged.
| if (cfg.tie_word_embeddings) { | ||
| lm_head_ = reg<nn::Conv2D>("lm_head", cfg.hidden_size, cfg.vocab_size, CONV2D_PROPERTY_NO_BIAS); | ||
| } |
There was a problem hiding this comment.
Fix lm_head_ lifecycle mismatch.
On Line 502–504, lm_head_ is conditionally registered, but on Line 539–540 it is always used. This can fail at runtime when cfg.tie_word_embeddings == false.
Suggested fix
- if (cfg.tie_word_embeddings) {
- lm_head_ = reg<nn::Conv2D>("lm_head", cfg.hidden_size, cfg.vocab_size, CONV2D_PROPERTY_NO_BIAS);
- }
+ lm_head_ = reg<nn::Conv2D>("lm_head", cfg.hidden_size, cfg.vocab_size, CONV2D_PROPERTY_NO_BIAS);Also applies to: 539-540
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@examples/qwen2vl_qnn_aot/modeling_qwen2vl_qnn_aot.hpp` around lines 502 -
504, The lm_head_ is only registered when cfg.tie_word_embeddings is true but
later always accessed, causing a runtime crash; fix by either registering
lm_head_ unconditionally or guarding its usage. Specifically, in the constructor
ensure lm_head_ is always created (register a Conv2D "lm_head" when
cfg.tie_word_embeddings is false as well) or change the later access sites to
check cfg.tie_word_embeddings (or lm_head_ non-null) before using lm_head_ in
the forward/usage code; update the initialization and any code that assumes
lm_head_ exists (references to lm_head_ and cfg.tie_word_embeddings) so the
creation and consumption lifecycles match.
| const int32_t grid_h = std::stoi(item.substr(0, sep)); | ||
| const int32_t grid_w = std::stoi(item.substr(sep + 1)); |
There was a problem hiding this comment.
Uncaught std::stoi exceptions on malformed input.
std::stoi throws std::invalid_argument if no conversion can be performed and std::out_of_range if the value is too large. While the code validates the 'x'/'X' separator presence, it doesn't guard against non-numeric content around the separator.
🛡️ Suggested fix with exception handling
+ int32_t grid_h = 0;
+ int32_t grid_w = 0;
+ try {
+ grid_h = std::stoi(item.substr(0, sep));
+ grid_w = std::stoi(item.substr(sep + 1));
+ } catch (const std::exception& e) {
+ MLLM_ERROR_EXIT(mllm::ExitCode::kCoreError, "Invalid visual bucket grid '{}': {}", item, e.what());
+ }
- const int32_t grid_h = std::stoi(item.substr(0, sep));
- const int32_t grid_w = std::stoi(item.substr(sep + 1));
if (grid_h <= 0 || grid_w <= 0 || grid_h % 2 != 0 || grid_w % 2 != 0) {🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@examples/qwen2vl_qnn_aot/visual_aot_helpers.hpp` around lines 265 - 266, The
code calling std::stoi on substrings for grid_h and grid_w (using item and sep
to split) can throw std::invalid_argument or std::out_of_range for malformed or
oversized numeric text; wrap the two std::stoi calls (the ones assigning grid_h
and grid_w) in a try-catch, validate the substrings before conversion (e.g.,
non-empty and contain only optional +/- and digits) or catch
std::invalid_argument and std::out_of_range and return/propagate a clear error
(or set a default/failure path) so the caller doesn’t crash; update any
surrounding parsing function to handle the failure case accordingly and
reference the item, sep, grid_h, and grid_w symbols when applying the change.
| auto real_linear_op = dynamic_cast<mllm::aops::LinearOp*>(linear_ir->getAOp()); | ||
| if (real_linear_op && real_linear_op->options().bias) { | ||
| auto bias_name = linear_ir->getAOp()->getName() + ".bias"; | ||
| auto bias_reg_tensor_ir = writer.getContext()->lookupSymbolTable(bias_name); | ||
| if (bias_reg_tensor_ir) { | ||
| MLLM_RETURN_FALSE_IF_NOT(bias_reg_tensor_ir->isa_<ir::tensor::RegisterOp>()); | ||
| MLLM_RETURN_FALSE_IF_NOT(bias_reg_tensor_ir->outputs().front()->isa_<ir::tensor::TensorValue>()); | ||
| auto bias_tensor = bias_reg_tensor_ir->outputs().front()->cast_<ir::tensor::TensorValue>(); | ||
| auto bias_quant_spec = ir::linalg::QuantizationSpecRaw::create(bias_tensor->tensor_.dtype()); | ||
| bias_tensor->setAttr("quant_recipe", writer.create<ir::linalg::LinalgIRQuantizatonSpecAttr>(bias_quant_spec)); | ||
| annotation_attr->annotation_.weights.insert({"bias", bias_quant_spec}); | ||
| } |
There was a problem hiding this comment.
Fail fast when LinearOp declares bias but the .bias tensor is missing.
When options().bias is true, silently skipping a missing bias symbol can leave quant metadata inconsistent for this op. This should hard-fail.
Suggested fix
auto real_linear_op = dynamic_cast<mllm::aops::LinearOp*>(linear_ir->getAOp());
if (real_linear_op && real_linear_op->options().bias) {
auto bias_name = linear_ir->getAOp()->getName() + ".bias";
auto bias_reg_tensor_ir = writer.getContext()->lookupSymbolTable(bias_name);
- if (bias_reg_tensor_ir) {
- MLLM_RETURN_FALSE_IF_NOT(bias_reg_tensor_ir->isa_<ir::tensor::RegisterOp>());
- MLLM_RETURN_FALSE_IF_NOT(bias_reg_tensor_ir->outputs().front()->isa_<ir::tensor::TensorValue>());
- auto bias_tensor = bias_reg_tensor_ir->outputs().front()->cast_<ir::tensor::TensorValue>();
- auto bias_quant_spec = ir::linalg::QuantizationSpecRaw::create(bias_tensor->tensor_.dtype());
- bias_tensor->setAttr("quant_recipe", writer.create<ir::linalg::LinalgIRQuantizatonSpecAttr>(bias_quant_spec));
- annotation_attr->annotation_.weights.insert({"bias", bias_quant_spec});
- }
+ MLLM_RETURN_FALSE_IF_NOT(bias_reg_tensor_ir);
+ MLLM_RETURN_FALSE_IF_NOT(bias_reg_tensor_ir->isa_<ir::tensor::RegisterOp>());
+ MLLM_RETURN_FALSE_IF_NOT(bias_reg_tensor_ir->outputs().front()->isa_<ir::tensor::TensorValue>());
+ auto bias_tensor = bias_reg_tensor_ir->outputs().front()->cast_<ir::tensor::TensorValue>();
+ auto bias_quant_spec = ir::linalg::QuantizationSpecRaw::create(bias_tensor->tensor_.dtype());
+ bias_tensor->setAttr("quant_recipe", writer.create<ir::linalg::LinalgIRQuantizatonSpecAttr>(bias_quant_spec));
+ annotation_attr->annotation_.weights.insert({"bias", bias_quant_spec});
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| auto real_linear_op = dynamic_cast<mllm::aops::LinearOp*>(linear_ir->getAOp()); | |
| if (real_linear_op && real_linear_op->options().bias) { | |
| auto bias_name = linear_ir->getAOp()->getName() + ".bias"; | |
| auto bias_reg_tensor_ir = writer.getContext()->lookupSymbolTable(bias_name); | |
| if (bias_reg_tensor_ir) { | |
| MLLM_RETURN_FALSE_IF_NOT(bias_reg_tensor_ir->isa_<ir::tensor::RegisterOp>()); | |
| MLLM_RETURN_FALSE_IF_NOT(bias_reg_tensor_ir->outputs().front()->isa_<ir::tensor::TensorValue>()); | |
| auto bias_tensor = bias_reg_tensor_ir->outputs().front()->cast_<ir::tensor::TensorValue>(); | |
| auto bias_quant_spec = ir::linalg::QuantizationSpecRaw::create(bias_tensor->tensor_.dtype()); | |
| bias_tensor->setAttr("quant_recipe", writer.create<ir::linalg::LinalgIRQuantizatonSpecAttr>(bias_quant_spec)); | |
| annotation_attr->annotation_.weights.insert({"bias", bias_quant_spec}); | |
| } | |
| auto real_linear_op = dynamic_cast<mllm::aops::LinearOp*>(linear_ir->getAOp()); | |
| if (real_linear_op && real_linear_op->options().bias) { | |
| auto bias_name = linear_ir->getAOp()->getName() + ".bias"; | |
| auto bias_reg_tensor_ir = writer.getContext()->lookupSymbolTable(bias_name); | |
| MLLM_RETURN_FALSE_IF_NOT(bias_reg_tensor_ir); | |
| MLLM_RETURN_FALSE_IF_NOT(bias_reg_tensor_ir->isa_<ir::tensor::RegisterOp>()); | |
| MLLM_RETURN_FALSE_IF_NOT(bias_reg_tensor_ir->outputs().front()->isa_<ir::tensor::TensorValue>()); | |
| auto bias_tensor = bias_reg_tensor_ir->outputs().front()->cast_<ir::tensor::TensorValue>(); | |
| auto bias_quant_spec = ir::linalg::QuantizationSpecRaw::create(bias_tensor->tensor_.dtype()); | |
| bias_tensor->setAttr("quant_recipe", writer.create<ir::linalg::LinalgIRQuantizatonSpecAttr>(bias_quant_spec)); | |
| annotation_attr->annotation_.weights.insert({"bias", bias_quant_spec}); | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@mllm/backends/qnn/aot/passes/LLMQuantRecipePass.cpp` around lines 1002 -
1013, If a LinearOp reports options().bias but the ".bias" symbol is missing,
fail immediately instead of silently skipping: after computing bias_name and
looking up bias_reg_tensor_ir via
writer.getContext()->lookupSymbolTable(bias_name), if bias_reg_tensor_ir is null
return/fail (use the project's error macro like MLLM_RETURN_FALSE or equivalent)
with a clear message referencing bias_name; keep the existing checks
(MLLM_RETURN_FALSE_IF_NOT for RegisterOp and TensorValue) and the subsequent
creation of bias_quant_spec and annotation_attr unchanged.
| if (real_layer_norm_op->options().elementwise_affine) { | ||
| auto weight = writer.getContext() | ||
| ->lookupSymbolTable(base_op->getName() + ".weight") | ||
| ->outputs() | ||
| .front() | ||
| ->cast_<ir::tensor::TensorValue>(); | ||
| qnn_op_node->emplaceInput(env->captureQnnAOTNodeTensor(qnn_context_name, qnn_graph_name, weight, true)); | ||
| } | ||
|
|
||
| if (real_layer_norm_op->options().bias) { | ||
| auto bias = writer.getContext() | ||
| ->lookupSymbolTable(base_op->getName() + ".bias") | ||
| ->outputs() | ||
| .front() | ||
| ->cast_<ir::tensor::TensorValue>(); | ||
| qnn_op_node->emplaceInput(env->captureQnnAOTNodeTensor(qnn_context_name, qnn_graph_name, bias, true)); |
There was a problem hiding this comment.
Add null/type guards before dereferencing LayerNorm weight/bias symbols.
The current chained dereference assumes symbol lookup always succeeds. If a required symbol is missing or malformed, this path can crash.
Suggested fix
if (real_layer_norm_op->options().elementwise_affine) {
- auto weight = writer.getContext()
- ->lookupSymbolTable(base_op->getName() + ".weight")
- ->outputs()
- .front()
- ->cast_<ir::tensor::TensorValue>();
+ auto weight_ir = writer.getContext()->lookupSymbolTable(base_op->getName() + ".weight");
+ MLLM_RETURN_FALSE_IF_NOT(weight_ir);
+ MLLM_RETURN_FALSE_IF_NOT(weight_ir->isa_<ir::tensor::RegisterOp>());
+ MLLM_RETURN_FALSE_IF_NOT(weight_ir->outputs().front()->isa_<ir::tensor::TensorValue>());
+ auto weight = weight_ir->outputs().front()->cast_<ir::tensor::TensorValue>();
qnn_op_node->emplaceInput(env->captureQnnAOTNodeTensor(qnn_context_name, qnn_graph_name, weight, true));
}
if (real_layer_norm_op->options().bias) {
- auto bias = writer.getContext()
- ->lookupSymbolTable(base_op->getName() + ".bias")
- ->outputs()
- .front()
- ->cast_<ir::tensor::TensorValue>();
+ auto bias_ir = writer.getContext()->lookupSymbolTable(base_op->getName() + ".bias");
+ MLLM_RETURN_FALSE_IF_NOT(bias_ir);
+ MLLM_RETURN_FALSE_IF_NOT(bias_ir->isa_<ir::tensor::RegisterOp>());
+ MLLM_RETURN_FALSE_IF_NOT(bias_ir->outputs().front()->isa_<ir::tensor::TensorValue>());
+ auto bias = bias_ir->outputs().front()->cast_<ir::tensor::TensorValue>();
qnn_op_node->emplaceInput(env->captureQnnAOTNodeTensor(qnn_context_name, qnn_graph_name, bias, true));
}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@mllm/backends/qnn/aot/visitor/LayerNorm.cpp` around lines 48 - 63, The code
dereferences symbols returned by
writer.getContext()->lookupSymbolTable(base_op->getName() + ".weight"/".bias")
and then calls outputs().front()->cast_<ir::tensor::TensorValue>() without
checks; add null/type guards to avoid crashes: verify the lookupSymbolTable
result is non-null, that its outputs() is non-empty, and that the front() value
can be safely cast to ir::tensor::TensorValue before calling
env->captureQnnAOTNodeTensor and qnn_op_node->emplaceInput; on failure, handle
gracefully (return/skip/log/throw) with a clear message including the symbol
name and base_op->getName() to aid debugging.
| std::vector<uint32_t> axes_dims = {1}; | ||
| auto axes_param = QNNParamTensorWrapper::create("axes", base_op->getName() + "_axes", QNN_DATATYPE_UINT_32, axes_dims); | ||
| auto* axes_data = reinterpret_cast<uint32_t*>(axes_param->alloc()); | ||
| axes_data[0] = input->tensor_.shape().size() - 1; | ||
| qnn_op_node->emplaceParamTensor(axes_param); |
There was a problem hiding this comment.
Guard rank before computing the LayerNorm axis.
shape().size() - 1 underflows for rank-0 tensors and writes an invalid axis.
Suggested fix
std::vector<uint32_t> axes_dims = {1};
auto axes_param = QNNParamTensorWrapper::create("axes", base_op->getName() + "_axes", QNN_DATATYPE_UINT_32, axes_dims);
auto* axes_data = reinterpret_cast<uint32_t*>(axes_param->alloc());
- axes_data[0] = input->tensor_.shape().size() - 1;
+ MLLM_RETURN_FALSE_IF_NOT(!input->tensor_.shape().empty());
+ axes_data[0] = static_cast<uint32_t>(input->tensor_.shape().size() - 1);
qnn_op_node->emplaceParamTensor(axes_param);📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| std::vector<uint32_t> axes_dims = {1}; | |
| auto axes_param = QNNParamTensorWrapper::create("axes", base_op->getName() + "_axes", QNN_DATATYPE_UINT_32, axes_dims); | |
| auto* axes_data = reinterpret_cast<uint32_t*>(axes_param->alloc()); | |
| axes_data[0] = input->tensor_.shape().size() - 1; | |
| qnn_op_node->emplaceParamTensor(axes_param); | |
| std::vector<uint32_t> axes_dims = {1}; | |
| auto axes_param = QNNParamTensorWrapper::create("axes", base_op->getName() + "_axes", QNN_DATATYPE_UINT_32, axes_dims); | |
| auto* axes_data = reinterpret_cast<uint32_t*>(axes_param->alloc()); | |
| MLLM_RETURN_FALSE_IF_NOT(!input->tensor_.shape().empty()); | |
| axes_data[0] = static_cast<uint32_t>(input->tensor_.shape().size() - 1); | |
| qnn_op_node->emplaceParamTensor(axes_param); |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@mllm/backends/qnn/aot/visitor/LayerNorm.cpp` around lines 68 - 72, The
current code assumes input->tensor_.shape().size() > 0 and writes axes_data[0] =
input->tensor_.shape().size() - 1 which underflows for rank-0 tensors; update
the LayerNorm axis calculation to guard the rank: compute size_t rank =
input->tensor_.shape().size(); uint32_t axis = rank == 0 ? 0u :
static_cast<uint32_t>(rank - 1); then write axes_data[0] = axis (keeping
axes_dims, axes_param, axes_data, base_op->getName(), and
qnn_op_node->emplaceParamTensor unchanged) so no underflow occurs for rank-0
inputs.
| auto [img, grid_thw] = image_grid_h > 0 || image_grid_w > 0 | ||
| ? image_preprocessor_(message.img_file_path, image_grid_h, image_grid_w) | ||
| : image_preprocessor_(message.img_file_path); |
There was a problem hiding this comment.
Inconsistent grid dimension validation.
The condition uses || but both dimensions should be valid when overriding grid sizes. If only one dimension is positive (e.g., image_grid_h=10, image_grid_w=-1), the preprocessor receives an invalid image_grid_w=-1.
🛡️ Proposed fix
- auto [img, grid_thw] = image_grid_h > 0 || image_grid_w > 0
+ auto [img, grid_thw] = (image_grid_h > 0 && image_grid_w > 0)
? image_preprocessor_(message.img_file_path, image_grid_h, image_grid_w)
: image_preprocessor_(message.img_file_path);🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@mllm/models/qwen2vl/tokenization_qwen2vl.hpp` around lines 246 - 248, The
condition that decides whether to pass explicit grid dimensions to
image_preprocessor_ is using ||, allowing invalid single-dimension overrides;
change the guard to require both image_grid_h and image_grid_w to be > 0 (use
&&) so image_preprocessor_(message.img_file_path, image_grid_h, image_grid_w) is
only called when both dimensions are valid, otherwise call
image_preprocessor_(message.img_file_path) to use defaults; update the
conditional around image_preprocessor_ where image_grid_h and image_grid_w are
used.
| while IFS='|' read -r case_id image_file prompt; do | ||
| [[ -n "${case_id}" ]] || continue | ||
| [[ -n "${image_file}" ]] || continue | ||
| if [[ -n "${pushed[${image_file}]:-}" ]]; then | ||
| continue | ||
| fi | ||
| local_image="${ASSET_SRC}/${image_file}" | ||
| if [[ ! -f "${local_image}" ]]; then | ||
| echo "Missing local image for case ${case_id}: ${local_image}" >&2 | ||
| exit 1 | ||
| fi | ||
| echo "Pushing ${image_file}" | ||
| "${ADB_BIN}" push "${local_image}" "${REMOTE_IMAGE_DIR}/${image_file}" </dev/null |
There was a problem hiding this comment.
Validate image_file before using it in path joins.
Line 34 and Line 40 trust image_file directly. A crafted value (absolute path or traversal) can read unintended local files and push to unintended remote locations.
Suggested fix
while IFS='|' read -r case_id image_file prompt; do
[[ -n "${case_id}" ]] || continue
[[ -n "${image_file}" ]] || continue
+ case "${image_file}" in
+ /*|../*|*/../*|..|*"/.."|*"\\"*)
+ echo "Invalid image_file for case ${case_id}: ${image_file}" >&2
+ exit 1
+ ;;
+ esac
if [[ -n "${pushed[${image_file}]:-}" ]]; then
continue
fi📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| while IFS='|' read -r case_id image_file prompt; do | |
| [[ -n "${case_id}" ]] || continue | |
| [[ -n "${image_file}" ]] || continue | |
| if [[ -n "${pushed[${image_file}]:-}" ]]; then | |
| continue | |
| fi | |
| local_image="${ASSET_SRC}/${image_file}" | |
| if [[ ! -f "${local_image}" ]]; then | |
| echo "Missing local image for case ${case_id}: ${local_image}" >&2 | |
| exit 1 | |
| fi | |
| echo "Pushing ${image_file}" | |
| "${ADB_BIN}" push "${local_image}" "${REMOTE_IMAGE_DIR}/${image_file}" </dev/null | |
| while IFS='|' read -r case_id image_file prompt; do | |
| [[ -n "${case_id}" ]] || continue | |
| [[ -n "${image_file}" ]] || continue | |
| case "${image_file}" in | |
| /*|../*|*/../*|..|*"/.."|*"\\"*) | |
| echo "Invalid image_file for case ${case_id}: ${image_file}" >&2 | |
| exit 1 | |
| ;; | |
| esac | |
| if [[ -n "${pushed[${image_file}]:-}" ]]; then | |
| continue | |
| fi | |
| local_image="${ASSET_SRC}/${image_file}" | |
| if [[ ! -f "${local_image}" ]]; then | |
| echo "Missing local image for case ${case_id}: ${local_image}" >&2 | |
| exit 1 | |
| fi | |
| echo "Pushing ${image_file}" | |
| "${ADB_BIN}" push "${local_image}" "${REMOTE_IMAGE_DIR}/${image_file}" </dev/null |
🧰 Tools
🪛 Shellcheck (0.11.0)
[warning] 28-28: prompt appears unused. Verify use (or export if used externally).
(SC2034)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@scripts/qwen2vl_qnn/push_eval_assets.sh` around lines 28 - 40, Validate and
sanitize image_file before using it in path joins: reject or normalize values
that start with '/' or contain '..' or path separators that escape ASSET_SRC,
control/null bytes, or other unexpected characters; ensure local_image is
constructed only from ASSET_SRC + safe basename(image_file) (or canonicalize and
verify it remains under ASSET_SRC) and ensure the remote path uses only a safe
filename (not a full path) before calling "${ADB_BIN}" push; also keep the
pushed[...] check and error handling but perform this validation immediately
after reading image_file to prevent absolute/traversal attacks.
| while IFS='|' read -r case_id image_file prompt; do | ||
| remote_image="${REMOTE_IMAGE_DIR}/${image_file}" | ||
| log_file="${OUT_DIR}/${case_id}.log" | ||
| visual_args="" |
There was a problem hiding this comment.
Sanitize case_id before building the log file path.
Line 76 uses untrusted case data as a filename. case_id values with separators/traversal tokens can escape OUT_DIR or produce invalid paths.
Suggested fix
while IFS='|' read -r case_id image_file prompt; do
+ [[ -n "${case_id}" ]] || continue
+ [[ -n "${image_file}" ]] || continue
+ safe_case_id="${case_id//[^A-Za-z0-9_.-]/_}"
+ [[ -n "${safe_case_id}" ]] || safe_case_id="case"
remote_image="${REMOTE_IMAGE_DIR}/${image_file}"
- log_file="${OUT_DIR}/${case_id}.log"
+ log_file="${OUT_DIR}/${safe_case_id}.log"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| while IFS='|' read -r case_id image_file prompt; do | |
| remote_image="${REMOTE_IMAGE_DIR}/${image_file}" | |
| log_file="${OUT_DIR}/${case_id}.log" | |
| visual_args="" | |
| while IFS='|' read -r case_id image_file prompt; do | |
| [[ -n "${case_id}" ]] || continue | |
| [[ -n "${image_file}" ]] || continue | |
| safe_case_id="${case_id//[^A-Za-z0-9_.-]/_}" | |
| [[ -n "${safe_case_id}" ]] || safe_case_id="case" | |
| remote_image="${REMOTE_IMAGE_DIR}/${image_file}" | |
| log_file="${OUT_DIR}/${safe_case_id}.log" | |
| visual_args="" |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@scripts/qwen2vl_qnn/run_qnn_eval_fixed_set.sh` around lines 74 - 77, Sanitize
the untrusted case_id before composing log_file to prevent path traversal or
invalid filenames: in the while read loop sanitize the case_id variable used
when creating log_file (and remote_image if reused) by removing or replacing
path separators and traversal tokens and allowing only a safe whitelist (e.g.,
alphanumerics, dot, underscore, hyphen), ensure the sanitized value is non-empty
(fallback to a safe default like "case_<n>"), then use that sanitized_case_id
when building log_file="${OUT_DIR}/${sanitized_case_id}.log"; reference case_id,
log_file, OUT_DIR and the while IFS='|' read -r case_id ... loop when making the
change.
Please check Guidelines for Contributing.
Draft / stacked PR. Depends on #675.
Summary
This PR adds a Qwen2-VL 2B full-QNN AOT example.
It includes:
The compiled context binaries and model weights are intentionally not included.
Validation
bash -n scripts/qwen2vl_qnn/common.sh scripts/qwen2vl_qnn/push_eval_assets.sh scripts/qwen2vl_qnn/run_qnn_eval_fixed_set.sh scripts/qwen2vl_qnn/run_qnn_interactive.shcmake --build build-qnn-aot --target mllm-qwen2vl-aot-c mllm-qwen2vl-visual-aot-diag -j2Notes
This PR is part of the full-QNN Qwen2-VL work:
Update: the Qwen2-VL 2B example baseline is now the 2026-05-21 full-QNN AOT baseline, using W4A16 LLM weights, FP16 visual encoder weights, and a 5-bucket visual context.
Summary by CodeRabbit
New Features
Documentation
Chores