Support weight sharing in QNN GPU by vjatoth-qti · Pull Request #2325 · microsoft/Olive

vjatoth-qti · 2026-02-08T18:17:55Z

Describe your changes

Checklist before requesting a review

Add unit tests for this change.
Make sure all tests can pass.
Update documents if necessary.
Lint and apply fixes to your code by running lintrunner -a
Is this a user-facing change? If yes, give a description of this change to be included in the release notes.

(Optional) Issue link

+                decoder_config_extra=decoder_config_extra,
+                composite_components=None,
+            )
+            return handler


qti-mattsinc · 2026-03-09T17:50:27Z

+                    "List of context lengths to generate static models QNN_GPU."
+                    "If None or empty, falls back to single 'context_length'."
+                ),
+            ),


@vjatoth-qti, would the intended use of this in a recipe look like this?

"st": { "type": "StaticLLM", "batch_size": 1, "context_lengths": [1, 64] }

Do we ever expect len(context_lengths) > 2? If not, it seems like we could follow the NPU strategy for StaticLLM, where "context_length": x always implies a hybrid AR1 + ARx model, and we wouldn't need a new "context_lengths" key.

qti-mattsinc · 2026-03-09T18:12:56Z

            if str(device).lower() == "gpu":
                provider_options["backend_path"] = "libQnnGpu.so" if platform.system() == "Linux" else "QnnGpu.dll"
+                if share_ep_contexts:
+                    provider_options["enable_gpu_weight_sharing"] = "1"


Is enable_gpu_weight_sharing a supported provider option? The corresponding PR to onnxruntime-qnn (onnxruntime/onnxruntime-qnn#67) does not add it as one.

jambayk · 2026-04-09T16:45:45Z

changing to draft since there has been no update on this PR since feb

github-advanced-security AI found potential problems Feb 10, 2026

View reviewed changes

jambayk reviewed Feb 11, 2026

View reviewed changes

Comment thread olive/passes/onnx/static_llm.py

vjatoth-qti force-pushed the dev/vjatoth-qti/qnn-gpu-weight-sharing branch from 0c0236a to 24f1374 Compare March 1, 2026 09:11

qti-mattsinc reviewed Mar 9, 2026

View reviewed changes

jambayk marked this pull request as draft April 9, 2026 16:45

Support weight sharing feature in QNN for GPU backend

b4fc7bb

unnim-qti force-pushed the dev/vjatoth-qti/qnn-gpu-weight-sharing branch from 24f1374 to b4fc7bb Compare April 12, 2026 22:14

Merge branch 'main' into dev/vjatoth-qti/qnn-gpu-weight-sharing

7417965

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support weight sharing in QNN GPU#2325

Support weight sharing in QNN GPU#2325
vjatoth-qti wants to merge 2 commits intomicrosoft:mainfrom
CodeLinaro:dev/vjatoth-qti/qnn-gpu-weight-sharing

vjatoth-qti commented Feb 8, 2026

Uh oh!

Check warning

Uh oh!

Uh oh!

qti-mattsinc Mar 9, 2026

Uh oh!

unnim-qti Apr 13, 2026

Uh oh!

qti-mattsinc Mar 9, 2026

Uh oh!

jambayk commented Apr 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

vjatoth-qti commented Feb 8, 2026

Describe your changes

Checklist before requesting a review

(Optional) Issue link

Uh oh!

Check warning

Uh oh!

Uh oh!

qti-mattsinc Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

unnim-qti Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

qti-mattsinc Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

jambayk commented Apr 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants