-
Notifications
You must be signed in to change notification settings - Fork 8
onemkl GPU version SpMV #52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
4e4eab7 to
c8a8cc7
Compare
c8a8cc7 to
17d52b4
Compare
| template <typename T, std::size_t Alignment = 0> | ||
| class mkl_allocator { | ||
| public: | ||
| using value_type = T; | ||
| using pointer = T*; | ||
| using const_pointer = const T*; | ||
| using reference = T&; | ||
| using const_reference = const T&; | ||
| using size_type = std::size_t; | ||
| using difference_type = std::ptrdiff_t; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we have a typename T on the allocator? don't we have potential for creating all sorts of kinds of things ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It currently follows the std::allocator design. In the operation, we will use allocator<char> to create the workspace.
| pointer allocate(std::size_t size) { | ||
| return sycl::malloc_device<value_type>(size, *(this->queue())); | ||
| } | ||
|
|
||
| void deallocate(pointer ptr, std::size_t n = 0) { | ||
| if (ptr != nullptr) { | ||
| sycl::free(ptr, *(this->queue())); | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems that allocate/deallocate should be templated, and not the allocator class, right ?
| using difference_type = std::ptrdiff_t; | ||
|
|
||
| mkl_allocator() noexcept { | ||
| auto* queue = new sycl::queue{sycl::default_selector_v}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
elsewhere we are using sycl::cpu_selector_v right now, but should probably switch to this default selector ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ahh i looked further and it seems you are handling that in spmv by using the queue that the state object is introduced with ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yhmtsai @BenBrock @upsj @YvanMokwinski and any others who are interested in design of state/policy/allocator interactions here. We should probably huddle sometime soon and discuss ownership and interaction of queue (stream) indicating device intent between the two objects that will be input into each operation:
- execution policy -- we originally designed around idea that policy should hold the queue(stream) for the operation
- state -- holds an optional allocator and any other stateful objects to be reused. The allocator needs a queue(stream), so should allocator/state take in an execution policy? If state has its own queue(stream), it seems like it will be possible to end up with multiple queues in the spmv operation -- one from policy and one from state... which one is to be used ? what if they are different ? for sycl::queues, it will affect ordering of operations, for streams, I guess if you are always using the default stream, it shouldn't matter so much, but as soon as a user creates their own stream, it could go bad...
| class spmv_state_t { | ||
| public: | ||
| spmv_state_t() : spmv_state_t(mkl::mkl_allocator<char>{}) {} | ||
|
|
||
| spmv_state_t(sycl::queue* q) : spmv_state_t(mkl::mkl_allocator<char>{q}) {} | ||
|
|
||
| spmv_state_t(mkl::mkl_allocator<char> alloc) : alloc_(alloc) {} | ||
|
|
||
| sycl::queue* queue() { | ||
| return alloc_.queue(); | ||
| } | ||
|
|
||
| private: | ||
| mkl::mkl_allocator<char> alloc_; | ||
| }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we switch to using uint8_t instead of char as a 1 byte = 8 bits intent ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again, I'm not sure we want a template T on the allocator, but rather on allocate the allocate/deallocate member functions, right ?
| mkl_allocator(sycl::queue* q) noexcept | ||
| : queue_manager_(q, [](sycl::queue* q) {}) {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| mkl_allocator(sycl::queue* q) noexcept | |
| : queue_manager_(q, [](sycl::queue* q) {}) {} | |
| /* taking a shallow copy of queue from elsewhere, so we don't own destruction */ | |
| mkl_allocator(sycl::queue* q) noexcept | |
| : queue_manager_(q, [](sycl::queue* q) {}) {} |
|
|
||
| #ifdef SPBLAS_ENABLE_ONEMKL_SYCL | ||
| #include "onemkl/device_vector.hpp" | ||
| #else |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so this is generalizing the rocsparse device spmv_test to just a device/spmv_test.cpp for rocsparse/mkl_sycl and in the future others ? Should we change the name from rocsparse/ folder to device/ or accelerator/ ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I put it into #40 (change it to device). I tended to avoid the same changes in two prs because it might make the reviews hard to exchange the idea in the same pr. I can move that into here in case this pr moves more quickly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can I suggest we shift it to thrust_device, then if we have other device examples that are specific for say, sycl or rocm or cuda we could have sycl_device, rocm_device or cuda_device folders as well ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do different folders here contain the tests with different vector allocation or specific functions?
If you mean the specific functions, I might put them into device/<vendor>
device contains the uniform test for all backend, and device/<vendor> contains the tests just for specific vendor.
Both ways are clear, so I do not mind choosing another one unless one of them makes the CMake setting worse. (I do not think so, but just in case)
Summary:
This PR make onemkl also supports Intel GPU.
Details:
ONEMKL_DEVICE_SELECTOR.Without the queue input, it will assume all default selector to select the same device with the same context such that all memory can be accessed by the other default queue.
Merge Checklist: