Commit 1ee58ed
[ez][ET-VK][q8ta_conv2d_pw] Halve accumulator to lift Adreno occupancy
Pull Request resolved: pytorch#19396
The pointwise quantized conv shader allocated ivec4 out_accum[4][2] = 32 int32 accumulators per thread, which on Adreno 740 pinned 28 full-precision registers per thread and capped ALU fiber occupancy at 37%. AOC reported 26.7% exposed long-latency stalls, evidence that occupancy was too low to hide texture and SSBO latency. Halve the accumulator to 16 ints by reducing TILE_N4 from 2 to 1 (each thread now covers 4 widths × 4 output channels = a single 4×4 output block). The compensating dispatch change is in pick_q8ta_conv2d_pw_global_wg_size: global_wg.x doubles since each thread covers half as many output channel blocks as before. Each thread still loads 1 input ivec4 (4 widths) per K-iter, preserving the natural int8x4 packing alignment, so arithmetic intensity drops only 25% (2.67 → 2.0 MAC/B, in contrast to the variant where TILE_M is halved which drops AI by 50%).
ghstack-source-id: 379519735
@exported-using-ghexport
Differential Revision: [D103770023](https://our.internmc.facebook.com/intern/diff/D103770023/)1 parent 24da2f6 commit 1ee58ed
2 files changed
Lines changed: 14 additions & 14 deletions
Lines changed: 9 additions & 7 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
22 | 22 | | |
23 | 23 | | |
24 | 24 | | |
| 25 | + | |
| 26 | + | |
25 | 27 | | |
26 | 28 | | |
27 | 29 | | |
28 | 30 | | |
29 | 31 | | |
30 | | - | |
| 32 | + | |
31 | 33 | | |
32 | 34 | | |
33 | 35 | | |
34 | | - | |
| 36 | + | |
35 | 37 | | |
36 | 38 | | |
37 | 39 | | |
| |||
86 | 88 | | |
87 | 89 | | |
88 | 90 | | |
89 | | - | |
90 | | - | |
91 | | - | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
92 | 94 | | |
93 | 95 | | |
94 | 96 | | |
| |||
137 | 139 | | |
138 | 140 | | |
139 | 141 | | |
140 | | - | |
| 142 | + | |
141 | 143 | | |
142 | 144 | | |
143 | 145 | | |
144 | | - | |
| 146 | + | |
145 | 147 | | |
146 | 148 | | |
147 | 149 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
33 | 33 | | |
34 | 34 | | |
35 | 35 | | |
36 | | - | |
37 | | - | |
38 | | - | |
39 | | - | |
40 | | - | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
41 | 39 | | |
42 | 40 | | |
43 | 41 | | |
44 | 42 | | |
45 | 43 | | |
46 | 44 | | |
47 | | - | |
48 | | - | |
| 45 | + | |
| 46 | + | |
49 | 47 | | |
50 | 48 | | |
51 | 49 | | |
| |||
0 commit comments