Skip to content

[ET-VK][conv2d_dw] Extract depthwise dispatch into Conv2dDW.cpp with device-based tile selection#18301

Merged
SS-JIA merged 1 commit intogh/SS-JIA/492/origfrom
gh/SS-JIA/493/orig
Mar 18, 2026
Merged

[ET-VK][conv2d_dw] Extract depthwise dispatch into Conv2dDW.cpp with device-based tile selection#18301
SS-JIA merged 1 commit intogh/SS-JIA/492/origfrom
gh/SS-JIA/493/orig

Conversation

@pytorchbot
Copy link
Collaborator

This PR was created by the merge bot to help merge the original PR into the main branch.
ghstack PR number: #18293 by @SS-JIA
^ Please use this as the source of truth for the PR details, comments, and reviews
ghstack PR base: https://github.com/pytorch/executorch/tree/gh/SS-JIA/493/base
ghstack PR head: https://github.com/pytorch/executorch/tree/gh/SS-JIA/493/head
Merge bot PR base: https://github.com/pytorch/executorch/tree/gh/SS-JIA/492/orig
Merge bot PR head: https://github.com/pytorch/executorch/tree/gh/SS-JIA/493/orig
Differential Revision: D97058158
@diff-train-skip-merge

…device-based tile selection

Pull Request resolved: #18293

Profiling showed depthwise conv2d is 5-15x slower on Mali GPUs vs Adreno due to
register pressure from the 4x2 output tile (17 vec4 registers per thread).
Benchmarking confirmed that reducing the tile to 1x1 (7 vec4 registers) gives
4-15x speedup on Mali with no regression on Adreno.

This change extracts depthwise conv2d dispatch logic from Convolution.cpp into a
new Conv2dDW.cpp (following the Conv2dPW.cpp pattern), and adds device-based
tile size selection: b1x1 on Mali, b4x2 (current default) on Adreno.
ghstack-source-id: 353940602
@exported-using-ghexport

Differential Revision: [D97058158](https://our.internmc.facebook.com/intern/diff/D97058158/)
@pytorchbot pytorchbot requested a review from SS-JIA as a code owner March 18, 2026 18:43
@pytorch-bot
Copy link

pytorch-bot bot commented Mar 18, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18301

Note: Links to docs will display an error until the docs builds have been completed.

⏳ No Failures, 18 Pending

As of commit 22b359f with merge base ed57040 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 18, 2026
@SS-JIA SS-JIA merged commit ee0ca9c into gh/SS-JIA/492/orig Mar 18, 2026
135 of 143 checks passed
@SS-JIA SS-JIA deleted the gh/SS-JIA/493/orig branch March 18, 2026 19:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants