onemkl GPU version SpMV #52

yhmtsai · 2025-04-03T00:39:06Z

Summary:
This PR make onemkl also supports Intel GPU.

Details:

add mkl_allocator
add the state to spmv
add the incomplete thrust::device_vector for Intel GPU (@BenBrock I thought you mentioned you have something already or there is package we can use?) I only implement tiny part to fit the current test usage.
It adds the another target spblas-gpu-tests because we can test cpu and gpu via ONEMKL_DEVICE_SELECTOR.

Without the queue input, it will assume all default selector to select the same device with the same context such that all memory can be accessed by the other default queue.

Merge Checklist:

Passing CI
Update documentation or README.md
Additional Test/example added (if applicable) and passing
At least one reviewer approval
(optional) Clang sanitizer scan run and triaged
Clang formatter applied (verified as part of passing CI)

spencerpatty · 2025-04-18T16:09:36Z

include/spblas/vendor/onemkl_sycl/mkl_allocator.hpp

+template <typename T, std::size_t Alignment = 0>
+class mkl_allocator {
+public:
+  using value_type = T;
+  using pointer = T*;
+  using const_pointer = const T*;
+  using reference = T&;
+  using const_reference = const T&;
+  using size_type = std::size_t;
+  using difference_type = std::ptrdiff_t;


why do we have a typename T on the allocator? don't we have potential for creating all sorts of kinds of things ?

It currently follows the std::allocator design. In the operation, we will use allocator<char> to create the workspace.

spencerpatty · 2025-04-18T16:10:17Z

include/spblas/vendor/onemkl_sycl/mkl_allocator.hpp

+  pointer allocate(std::size_t size) {
+    return sycl::malloc_device<value_type>(size, *(this->queue()));
+  }
+
+  void deallocate(pointer ptr, std::size_t n = 0) {
+    if (ptr != nullptr) {
+      sycl::free(ptr, *(this->queue()));
+    }
+  }


it seems that allocate/deallocate should be templated, and not the allocator class, right ?

spencerpatty · 2025-04-18T16:11:08Z

include/spblas/vendor/onemkl_sycl/mkl_allocator.hpp

+  using difference_type = std::ptrdiff_t;
+
+  mkl_allocator() noexcept {
+    auto* queue = new sycl::queue{sycl::default_selector_v};


elsewhere we are using sycl::cpu_selector_v right now, but should probably switch to this default selector ...

ahh i looked further and it seems you are handling that in spmv by using the queue that the state object is introduced with ...

@yhmtsai @BenBrock @upsj @YvanMokwinski and any others who are interested in design of state/policy/allocator interactions here. We should probably huddle sometime soon and discuss ownership and interaction of queue (stream) indicating device intent between the two objects that will be input into each operation:

execution policy -- we originally designed around idea that policy should hold the queue(stream) for the operation

state -- holds an optional allocator and any other stateful objects to be reused. The allocator needs a queue(stream), so should allocator/state take in an execution policy? If state has its own queue(stream), it seems like it will be possible to end up with multiple queues in the spmv operation -- one from policy and one from state... which one is to be used ? what if they are different ? for sycl::queues, it will affect ordering of operations, for streams, I guess if you are always using the default stream, it shouldn't matter so much, but as soon as a user creates their own stream, it could go bad...

spencerpatty · 2025-04-18T16:14:08Z

include/spblas/vendor/onemkl_sycl/spmv_impl.hpp

+class spmv_state_t {
+public:
+  spmv_state_t() : spmv_state_t(mkl::mkl_allocator<char>{}) {}
+
+  spmv_state_t(sycl::queue* q) : spmv_state_t(mkl::mkl_allocator<char>{q}) {}
+
+  spmv_state_t(mkl::mkl_allocator<char> alloc) : alloc_(alloc) {}
+
+  sycl::queue* queue() {
+    return alloc_.queue();
+  }
+
+private:
+  mkl::mkl_allocator<char> alloc_;
+};


should we switch to using uint8_t instead of char as a 1 byte = 8 bits intent ?

again, I'm not sure we want a template T on the allocator, but rather on allocate the allocate/deallocate member functions, right ?

spencerpatty · 2025-04-18T16:33:00Z

include/spblas/vendor/onemkl_sycl/mkl_allocator.hpp

+  mkl_allocator(sycl::queue* q) noexcept
+      : queue_manager_(q, [](sycl::queue* q) {}) {}


Suggested change

mkl_allocator(sycl::queue* q) noexcept

: queue_manager_(q, [](sycl::queue* q) {}) {}

/* taking a shallow copy of queue from elsewhere, so we don't own destruction */

mkl_allocator(sycl::queue* q) noexcept

: queue_manager_(q, [](sycl::queue* q) {}) {}

spencerpatty · 2025-04-18T16:35:10Z

test/gtest/rocsparse/spmv_test.cpp

+
+#ifdef SPBLAS_ENABLE_ONEMKL_SYCL
+#include "onemkl/device_vector.hpp"
+#else


so this is generalizing the rocsparse device spmv_test to just a device/spmv_test.cpp for rocsparse/mkl_sycl and in the future others ? Should we change the name from rocsparse/ folder to device/ or accelerator/ ?

Yes, I put it into #40 (change it to device). I tended to avoid the same changes in two prs because it might make the reviews hard to exchange the idea in the same pr. I can move that into here in case this pr moves more quickly.

can I suggest we shift it to thrust_device, then if we have other device examples that are specific for say, sycl or rocm or cuda we could have sycl_device, rocm_device or cuda_device folders as well ...

Do different folders here contain the tests with different vector allocation or specific functions?
If you mean the specific functions, I might put them into device/<vendor>
device contains the uniform test for all backend, and device/<vendor> contains the tests just for specific vendor.
Both ways are clear, so I do not mind choosing another one unless one of them makes the CMake setting worse. (I do not think so, but just in case)

include/spblas/vendor/onemkl_sycl/mkl_allocator.hpp

BenBrock · 2025-04-25T18:06:44Z

@yhmtsai Rather than implementing device_vector and other Thrust utilities directly in the repo, I think ti's better to create an external repo that will contain those utilities. I've started working on a draft of that here: #53.

yhmtsai requested review from BenBrock and spencerpatty April 3, 2025 00:39

yhmtsai self-assigned this Apr 3, 2025

yhmtsai changed the title ~~onemkl GPU version~~ onemkl GPU version SpMV Apr 3, 2025

yhmtsai added 3 commits April 17, 2025 13:52

add mkl_allocator

26884c2

use allocator in oneMKL spmv. It still keeps no state input

90cd049

add the incompleted thrust impl for oneMKL and enable gpu spmv test

ce922af

yhmtsai force-pushed the dev/yhmtsai/onemkl_gpu branch from 4e4eab7 to c8a8cc7 Compare April 17, 2025 11:52

add gpu intel ci

17d52b4

yhmtsai force-pushed the dev/yhmtsai/onemkl_gpu branch from c8a8cc7 to 17d52b4 Compare April 17, 2025 14:22

spencerpatty reviewed Apr 18, 2025

View reviewed changes

include/spblas/vendor/onemkl_sycl/mkl_allocator.hpp Show resolved Hide resolved

yhmtsai added 2 commits April 24, 2025 19:20

add -fsycl

a69e41b

add comment on the queue shallow copy and use proper path for sycl.hpp

4bb2d5c

		mkl_allocator(sycl::queue* q) noexcept
		: queue_manager_(q, [](sycl::queue* q) {}) {}

onemkl GPU version SpMV #52

Are you sure you want to change the base?

onemkl GPU version SpMV #52

Uh oh!

Conversation

yhmtsai commented Apr 3, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

BenBrock commented Apr 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants