vello_hybrid: Make probe more asynchronous#1671
Conversation
5e2cd8b to
dfbef54
Compare
|
I don't know why, but CI doesn't seem to like running the test. :/ I will have to look into it tomorrow... |
597e6f1 to
47475a6
Compare
This PR introduces two changes to our CI for testing sparse strips in CI: 1) Instead of having one single job, we split them up into three jobs: The first job for checking the examples, the second for running the tests without WASM and the third for running the tests with WASM. This cuts the "bottleneck" time for this test specifically from around 13 minutes to around 6-7 minutes. This won't have an immediate effect since the overall bottleneck is still Windows CI, but it's an easy change and will likely be worth it in the future once we reduce the time of Windows CI. 2) Use MacOS-based runners instead of Ubuntu for running the WASM tests. When working on linebender#1671, I have encountered the problem that the probe test does not pass anymore [due to some weird timeout](https://github.com/linebender/vello/actions/runs/26457569847/job/77896141078). <img width="1318" height="443" alt="image" src="https://github.com/user-attachments/assets/d1b4264b-30a2-4eb8-b3a8-7f2a55d624d9" /> I've spent more than a day trying to debug this now, but have not gotten further than concluding that for some reason, invoking asynchronous methods like `setTimeout` or `requestAnimationFrame` causes those timeouts to occur, regardless of what happens in there (even when removing the actual probing, it still happened when I invoked those methods). Locally everything works fine, and apparently it also works fine when switching to a MacOS-based runners. I'm fairly certain that this is some weirdness with the Github CI runners and not with probing failing completely on Linux (though I admittedly haven't tested it since I don't have a linux laptop), so I think this is the easiest thing to do for now. I'd also like to get to the bottom of this, but I've already spent more time on this than I can really justify, so I hope that this is fine doing for now.
47475a6 to
4a4e046
Compare
4a4e046 to
77b641f
Compare
| // Just to make sure they are kept alive until the probe | ||
| // finished. This is _probably_ not necessary because WebGl is a synchronous API | ||
| // so the `delete_` commands should only be executed once rendering has finished, but | ||
| // safe is safe. |
There was a problem hiding this comment.
Could this be expanded upon. Is my assumption that this comment means "kept alive via a reference to prevent garbage collection of resources"?
There was a problem hiding this comment.
So basically, if we don't keep it around it would be dropped after probe_inner is done:
vello/sparse_strips/vello_hybrid/src/render/webgl.rs
Lines 616 to 620 in a7916e4
I was just a bit worried if it's a problem if the drop is already triggered at that point, however, I'm pretty sure it's fine because as mentioned in the comment, WebGL is conceptually synchronous, meaning that if the fence_sync and the transfer to pixel pack buffer is called before the end of the method, it is guaranteed that those operations have finished before the deletions are triggered. If you agree with that, then happy to revert that part!
There was a problem hiding this comment.
I am happy to remove (also, I would really hope that tests would catch something like this if it's wrong after removal)
| /// which can be checked again in the future. Otherwise, the probe result or an error will be | ||
| /// returned. | ||
| pub fn try_finish(mut self) -> Result<WebGlProbeStatus, WebGlProbeError> { | ||
| let status = self.gl.client_wait_sync_with_u32( |
There was a problem hiding this comment.
Luckily clientWaitSync is supported by Safari 15.
| self.gl.delete_framebuffer(Some(&probe_framebuffer)); | ||
| self.gl.delete_texture(Some(&probe_texture)); | ||
| Ok(pixels) | ||
| render_result?; |
There was a problem hiding this comment.
Is it worth a comment that we check for a valid render result here because WebGlProbeResources drop frees resources?
Alternatively could the WebGlProbeResources be created above render_scene such that the ? can be applied right there.
There was a problem hiding this comment.
Yes this is the reason, will add a comment.
66ca6bf to
99d8ff3
Compare
99d8ff3 to
fbd0db0
Compare
Context
Right now, the probing API is fully synchronous: We first construct the scene, render into using WebGL, and then cause a full pipeline stall by directly requesting a read-back from GPU to CPU memory. While it's incredibly hard to get reliable measurements, this does seem to incur a large latency on low-tier devices. Here are some numbers (again, they can vary wildly from run to run for reasons unknown to me, but overall they are in the same ballpark):
Vivo Y21:

Xiaomi RedNote 11:

Solution
Therefore, this PR proposes to switch to a slightly different approach: It makes use of WebGL Sync objects and pixel pack buffers to ensure that the vast majority of the work can happen asynchronously in the background without stalling the CPU. On the CPU side, we can then simply continuously poll the Sync object (for example in a RAF callback) to see whether the operation has finished (by calling `clientWaitSync with a timeout of 0, such that it returns immediately in case it's not done yet. I've done some crude measurements to confirm that this is very cheap). If it's not done yet, we can simply do other stuff, so the whole point is to avoid blocking while rendering is still in progress.
When applying this, basically the only synchronous parts are 1) scene construction + submitting commands to the GPU (
start_probein the screenshots below) and 2) doing the final read-back from the pixel pack buffer to the CPU (readbackin the screenshots below). While the overall time can be longer sometimes, the synchronous parts seem to be much faster than before, therefore this seems like a win to me. Especially if we plan on extending the probe scene in the future.Vivo Y21:

Xiaomi RedNote 11:
