-
Notifications
You must be signed in to change notification settings - Fork 8
Description
There's a missing feature that should be considered for the SparseBLAS: threading control (and other compute resource control).
The Intel MKL dense BLAS has mkl_set_num_threads and the very-important mkl_set_num_threads_local. The OpenBLAS has a similar pair of methods: openblas_set_num_threads and openblas_set_num_threads_local. Sadly, I think the corresponding methods are missing in the AMD AOCL library, Apple Accelerate, and the ARM BLAS.
These methods allow a user application to create multiple threads of their own, each of which can use a different number of threads when calling the dense BLAS. The problem is that none of these methods are in the BLAS or CBLAS standards, which is a mess. Packages that use the BLAS (Julia, Python, R, Eigen, my solvers, etc) must use difficult-to-write cmake scripts to detect which BLAS is in use and which functions to use (if any). There's not even a standard way to tell if the dense BLAS is multithreaded or not (either at compile time or run time).
I have the same feature in GraphBLAS, with a GxB_Context object that allows the user application to set some state (in thread local memory) so each of the user's threads can call GraphBLAS in parallel, and each of those calls can use different compute resources (different # of OpenMP threads and different GPUs). I do this without adding to the API of any other GraphBLAS method. See Chapter 8 of my user guide to SuiteSparse:GraphBLAS: https://github.com/DrTimothyAldenDavis/GraphBLAS/blob/stable/Doc/GraphBLAS_UserGuide.pdf .
SparseBLAS should do the same.
In addition, it would be wonderful if the Fortran BLAS and CBLAS both tackled this API issue, but that's another discussion. At the very least, it would be great if we didn't end up with the same problem in the SparseBLAS.