Skip to content

[QNN] Hope to know how to support v68 arch soc with QNN? #18280

@ecccccsgo

Description

@ecccccsgo

Hello,

I was work with some old platform like SA8295 with htp version v68. In this repo and the aihub repo, When i use w4a16 recipe to quantize the model like smolvlm/qwen, it was error with:

[QNN Partitioner Op Support]: aten.view_copy.default | True
[QNN Partitioner Op Support]: aten.linear.default | True
[QNN Partitioner Op Support]: aten.view_copy.default | True
[QNN Partitioner Op Support]: aten.view_copy.default | True
[QNN Partitioner Op Support]: aten.permute_copy.default | True
[QNN Partitioner Op Support]: aten.view_copy.default | True
[QNN Partitioner Op Support]: aten.permute_copy.default | True
[QNN Partitioner Op Support]: aten.view_copy.default | True
[QNN Partitioner Op Support]: aten.view_copy.default | True
[ERROR] [Qnn ExecuTorch]:  <E> [4294967295] has incorrect Value 68, expected >= 73.

[ERROR] [Qnn ExecuTorch]:  <E> QnnBackend_validateOpConfig failed 3110

[ERROR] [Qnn ExecuTorch]:  <E> Failed to validate op aten_native_layer_norm_default_24 with error 0xc26

But in some paper like AutoNeural, they use the ViT: W8A16, language model:W4A16 (page 10, in the table2).

it seems the v68 arch support this type of ops. I hope you can give me some infomation about how I can use w4a16 in llm/lvm to support larger model. :)

cc @cccclai @winskuo-quic @shewu-quic @haowhsu-quic @DannyYuyang-quic @cbilgin

Metadata

Metadata

Assignees

Labels

module: qnnIssues related to Qualcomm's QNN delegate and code under backends/qualcomm/partner: qualcommFor backend delegation, kernels, demo, etc. from the 3rd-party partner, Qualcomm

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions