-
Notifications
You must be signed in to change notification settings - Fork 483
GPU: Replace OpenMP parallization with TBB - WIP #13997
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
REQUEST FOR PRODUCTION RELEASES: This will add The following labels are available |
f6b40be to
be91a01
Compare
|
Error while checking build/O2/fullCI for be91a01 at 2025-02-21 21:57: Full log here. |
be91a01 to
184803f
Compare
|
Error while checking build/O2/fullCI_slc9 for 184803f at 2025-02-21 23:50: Full log here. |
184803f to
158afe1
Compare
|
Error while checking build/O2/fullCI for 184803f at 2025-02-22 00:20: Full log here. |
|
Error while checking build/O2/fullCI_slc9 for 158afe1 at 2025-02-22 13:21: Full log here. |
158afe1 to
357e76e
Compare
|
Error while checking build/O2/fullCI_slc9 for 357e76e at 2025-02-22 16:25: Full log here. |
357e76e to
b21fbd8
Compare
Work in progress, don't merge yet!
For now, want to test in CI.
Background for this PR is that OpenMP does not have a proper scheduling approach for nested loop, or for loops with reduced parallelism. Instead of using a thread pool, it just keeps spawning and termiinating threads.
For me that is order of 10k thread spawns on an EPN node for 1 TF.
I could fix that by hacking GCC libgomp, but GCC is not interested in taking that patch, since their behavior is compliant with the OpenMP spec, which is IMHO broken.
So I decided to just do away with OpenMP.