Skip to content

Support weight sharing in QNN GPU#2325

Draft
vjatoth-qti wants to merge 2 commits intomicrosoft:mainfrom
CodeLinaro:dev/vjatoth-qti/qnn-gpu-weight-sharing
Draft

Support weight sharing in QNN GPU#2325
vjatoth-qti wants to merge 2 commits intomicrosoft:mainfrom
CodeLinaro:dev/vjatoth-qti/qnn-gpu-weight-sharing

Conversation

@vjatoth-qti
Copy link
Copy Markdown

Describe your changes

Checklist before requesting a review

  • Add unit tests for this change.
  • Make sure all tests can pass.
  • Update documents if necessary.
  • Lint and apply fixes to your code by running lintrunner -a
  • Is this a user-facing change? If yes, give a description of this change to be included in the release notes.

(Optional) Issue link

Comment thread olive/passes/onnx/static_llm.py Outdated
decoder_config_extra=decoder_config_extra,
composite_components=None,
)
return handler

Check warning

Code scanning / lintrunner

RUFF/RET504

Unnecessary assignment to `handler` before `return` statement. See https://docs.astral.sh/ruff/rules/unnecessary-assign
Comment thread olive/passes/onnx/static_llm.py Fixed
Comment thread olive/passes/onnx/static_llm.py
@vjatoth-qti vjatoth-qti force-pushed the dev/vjatoth-qti/qnn-gpu-weight-sharing branch from 0c0236a to 24f1374 Compare March 1, 2026 09:11
"List of context lengths to generate static models QNN_GPU."
"If None or empty, falls back to single 'context_length'."
),
),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vjatoth-qti, would the intended use of this in a recipe look like this?

        "st": {
            "type": "StaticLLM",
            "batch_size": 1,
            "context_lengths": [1, 64]
        }

Do we ever expect len(context_lengths) > 2? If not, it seems like we could follow the NPU strategy for StaticLLM, where "context_length": x always implies a hybrid AR1 + ARx model, and we wouldn't need a new "context_lengths" key.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resolved

Comment thread olive/passes/onnx/context_binary.py Outdated
if str(device).lower() == "gpu":
provider_options["backend_path"] = "libQnnGpu.so" if platform.system() == "Linux" else "QnnGpu.dll"
if share_ep_contexts:
provider_options["enable_gpu_weight_sharing"] = "1"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is enable_gpu_weight_sharing a supported provider option? The corresponding PR to onnxruntime-qnn (onnxruntime/onnxruntime-qnn#67) does not add it as one.

@jambayk
Copy link
Copy Markdown
Contributor

jambayk commented Apr 9, 2026

changing to draft since there has been no update on this PR since feb

@jambayk jambayk marked this pull request as draft April 9, 2026 16:45
@unnim-qti unnim-qti force-pushed the dev/vjatoth-qti/qnn-gpu-weight-sharing branch from 24f1374 to b4fc7bb Compare April 12, 2026 22:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants