-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Fix: execution context queue stress tests failures #16472
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Fix: execution context queue stress tests failures #16472
Conversation
Refactors and abstracts the stress test runs so no loop will run forever: the "thread setup" part has been removed, the main fiber won't block waiting for the threads to be ready, and the threads' loop will eventually timeout, and the thread return, so the main fiber won't block while joining the threads. This fixes the regular CI failures that occured often on Darwin, and may happen on Linux when running both specs in a tight loops multiple times in parallel to overload the CPU cores.
|
With this patch, the spec does not get stuck any more on my machine using seed There are two So apparently they are waiting for randomness. |
|
Whenever I try to fix the issue, mingw fails, this time on ARM64 😡 Why is it using urandom?! stdlib should always use getrandom on linux 😕 |
|
Answering myself: because the libc method check macro doesn't work in older crystal releases! So, multiple fixes:
|
|
Aside: why is urandom failing with EAGAIN? It should never block. Maybe it's wrong to make the fd non-blocking, and we should always read blocking instead since it should never block (readiness might not work). |
The spec always fails on CI for this specific target.
| # Runs a multithreaded test by starting *n* threads, waiting for all the | ||
| # threads to have been started the *publish* proc. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| # Runs a multithreaded test by starting *n* threads, waiting for all the | |
| # threads to have been started the *publish* proc. | |
| # Runs a multithreaded test by starting *n* threads, waiting for all the | |
| # threads to have been started, then runs the *publish* proc. |
| end | ||
|
|
||
| # See `#split`. | ||
| def self.split : Random |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: We should probably add this in a separate PR because it's adding a new public feature.
Refactors and abstracts the stress test specs of
Fiber::ExecutionContext::RunnablesandFiber::ExecutionContext::GlobalQueueso no loop will run forever: the "thread setup" part has been removed, the main fiber won't block waiting for the threads to be ready, and the threads' loop will eventually timeout, and the thread return, so the main fiber won't block while joining the threads.I abstracted a helper because the different tests used the same structure, and it was painful & noisy to dup the logic.
This fixes the regular CI failures that occurred often on Darwin on CI, and that I just reproduced on Linux when running these specs in tight loops multiple times in parallel to overload the CPU cores.
Might fix #16470 or least let it fail (not hang for 6h).
Related to #15630.