[ET-VK] Fix staging buffer allocation to check all memory types for HOST_CACHED#18291
[ET-VK] Fix staging buffer allocation to check all memory types for HOST_CACHED#18291SS-JIA wants to merge 2 commits intogh/SS-JIA/491/basefrom
Conversation
…OST_CACHED `test_host_cached_available()` only checked `memoryTypes[0]` to determine if HOST_CACHED memory was available. On Pixel devices, `memoryTypes[0]` is DEVICE_LOCAL without HOST_CACHED, so the function incorrectly returned `SEQUENTIAL_WRITE_BIT`. This caused DEVICE_TO_HOST staging buffers to be allocated in write-combining (uncached) memory, making CPU reads during COPY_OUTPUTS ~170x slower than necessary (~40ms vs ~237us on S24). The fix iterates over all memory types to correctly detect HOST_CACHED support. On-device profiling of edgetam_first_frame_fp16_vulkan.pte confirms the fix: - Pixel 8 Pro COPY_OUTPUTS: 40ms -> 6.3ms (-84%) - Pixel 9 Pro XL COPY_OUTPUTS: 40ms -> 2.5ms (-94%) - Pixel 8 Pro Method::execute: 492ms -> 464ms (-5.7%) - Pixel 9 Pro XL Method::execute: 445ms -> 411ms (-7.6%) Differential Revision: [D97058156](https://our.internmc.facebook.com/intern/diff/D97058156/) [ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18291
Note: Links to docs will display an error until the docs builds have been completed. ⏳ 1 Pending, 2 Unrelated FailuresAs of commit c4106b4 with merge base ed57040 ( BROKEN TRUNK - The following jobs failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This PR needs a
|
…types for HOST_CACHED" `test_host_cached_available()` only checked `memoryTypes[0]` to determine if HOST_CACHED memory was available. On Pixel devices, `memoryTypes[0]` is DEVICE_LOCAL without HOST_CACHED, so the function incorrectly returned `SEQUENTIAL_WRITE_BIT`. This caused DEVICE_TO_HOST staging buffers to be allocated in write-combining (uncached) memory, making CPU reads during COPY_OUTPUTS ~170x slower than necessary (~40ms vs ~237us on S24). The fix iterates over all memory types to correctly detect HOST_CACHED support. On-device profiling of edgetam_first_frame_fp16_vulkan.pte confirms the fix: - Pixel 8 Pro COPY_OUTPUTS: 40ms -> 6.3ms (-84%) - Pixel 9 Pro XL COPY_OUTPUTS: 40ms -> 2.5ms (-94%) - Pixel 8 Pro Method::execute: 492ms -> 464ms (-5.7%) - Pixel 9 Pro XL Method::execute: 445ms -> 411ms (-7.6%) Differential Revision: [D97058156](https://our.internmc.facebook.com/intern/diff/D97058156/) [ghstack-poisoned]
Stack from ghstack (oldest at bottom):
test_host_cached_available()only checkedmemoryTypes[0]to determine ifHOST_CACHED memory was available. On Pixel devices,
memoryTypes[0]isDEVICE_LOCAL without HOST_CACHED, so the function incorrectly returned
SEQUENTIAL_WRITE_BIT. This caused DEVICE_TO_HOST staging buffers to beallocated in write-combining (uncached) memory, making CPU reads during
COPY_OUTPUTS ~170x slower than necessary (~40ms vs ~237us on S24).
The fix iterates over all memory types to correctly detect HOST_CACHED support.
On-device profiling of edgetam_first_frame_fp16_vulkan.pte confirms the fix:
Differential Revision: D97058156