Skip to content

Conversation

@yhmtsai
Copy link
Contributor

@yhmtsai yhmtsai commented Apr 3, 2025

Summary:
This PR make onemkl also supports Intel GPU.

Details:

  • add mkl_allocator
  • add the state to spmv
  • add the incomplete thrust::device_vector for Intel GPU (@BenBrock I thought you mentioned you have something already or there is package we can use?) I only implement tiny part to fit the current test usage.
  • It adds the another target spblas-gpu-tests because we can test cpu and gpu via ONEMKL_DEVICE_SELECTOR.

Without the queue input, it will assume all default selector to select the same device with the same context such that all memory can be accessed by the other default queue.

Merge Checklist:

  • Passing CI
  • Update documentation or README.md
  • Additional Test/example added (if applicable) and passing
  • At least one reviewer approval
  • (optional) Clang sanitizer scan run and triaged
  • Clang formatter applied (verified as part of passing CI)

@yhmtsai yhmtsai requested review from BenBrock and spencerpatty April 3, 2025 00:39
@yhmtsai yhmtsai self-assigned this Apr 3, 2025
@yhmtsai yhmtsai changed the title onemkl GPU version onemkl GPU version SpMV Apr 3, 2025
@yhmtsai yhmtsai force-pushed the dev/yhmtsai/onemkl_gpu branch from 4e4eab7 to c8a8cc7 Compare April 17, 2025 11:52
@yhmtsai yhmtsai force-pushed the dev/yhmtsai/onemkl_gpu branch from c8a8cc7 to 17d52b4 Compare April 17, 2025 14:22
Comment on lines +8 to +17
template <typename T, std::size_t Alignment = 0>
class mkl_allocator {
public:
using value_type = T;
using pointer = T*;
using const_pointer = const T*;
using reference = T&;
using const_reference = const T&;
using size_type = std::size_t;
using difference_type = std::ptrdiff_t;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we have a typename T on the allocator? don't we have potential for creating all sorts of kinds of things ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It currently follows the std::allocator design. In the operation, we will use allocator<char> to create the workspace.

Comment on lines +41 to +49
pointer allocate(std::size_t size) {
return sycl::malloc_device<value_type>(size, *(this->queue()));
}

void deallocate(pointer ptr, std::size_t n = 0) {
if (ptr != nullptr) {
sycl::free(ptr, *(this->queue()));
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems that allocate/deallocate should be templated, and not the allocator class, right ?

using difference_type = std::ptrdiff_t;

mkl_allocator() noexcept {
auto* queue = new sycl::queue{sycl::default_selector_v};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

elsewhere we are using sycl::cpu_selector_v right now, but should probably switch to this default selector ...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahh i looked further and it seems you are handling that in spmv by using the queue that the state object is introduced with ...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yhmtsai @BenBrock @upsj @YvanMokwinski and any others who are interested in design of state/policy/allocator interactions here. We should probably huddle sometime soon and discuss ownership and interaction of queue (stream) indicating device intent between the two objects that will be input into each operation:

  1. execution policy -- we originally designed around idea that policy should hold the queue(stream) for the operation
  2. state -- holds an optional allocator and any other stateful objects to be reused. The allocator needs a queue(stream), so should allocator/state take in an execution policy? If state has its own queue(stream), it seems like it will be possible to end up with multiple queues in the spmv operation -- one from policy and one from state... which one is to be used ? what if they are different ? for sycl::queues, it will affect ordering of operations, for streams, I guess if you are always using the default stream, it shouldn't matter so much, but as soon as a user creates their own stream, it could go bad...

Comment on lines +28 to +42
class spmv_state_t {
public:
spmv_state_t() : spmv_state_t(mkl::mkl_allocator<char>{}) {}

spmv_state_t(sycl::queue* q) : spmv_state_t(mkl::mkl_allocator<char>{q}) {}

spmv_state_t(mkl::mkl_allocator<char> alloc) : alloc_(alloc) {}

sycl::queue* queue() {
return alloc_.queue();
}

private:
mkl::mkl_allocator<char> alloc_;
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we switch to using uint8_t instead of char as a 1 byte = 8 bits intent ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again, I'm not sure we want a template T on the allocator, but rather on allocate the allocate/deallocate member functions, right ?

Comment on lines +28 to +29
mkl_allocator(sycl::queue* q) noexcept
: queue_manager_(q, [](sycl::queue* q) {}) {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
mkl_allocator(sycl::queue* q) noexcept
: queue_manager_(q, [](sycl::queue* q) {}) {}
/* taking a shallow copy of queue from elsewhere, so we don't own destruction */
mkl_allocator(sycl::queue* q) noexcept
: queue_manager_(q, [](sycl::queue* q) {}) {}

Comment on lines +6 to +9

#ifdef SPBLAS_ENABLE_ONEMKL_SYCL
#include "onemkl/device_vector.hpp"
#else
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so this is generalizing the rocsparse device spmv_test to just a device/spmv_test.cpp for rocsparse/mkl_sycl and in the future others ? Should we change the name from rocsparse/ folder to device/ or accelerator/ ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I put it into #40 (change it to device). I tended to avoid the same changes in two prs because it might make the reviews hard to exchange the idea in the same pr. I can move that into here in case this pr moves more quickly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can I suggest we shift it to thrust_device, then if we have other device examples that are specific for say, sycl or rocm or cuda we could have sycl_device, rocm_device or cuda_device folders as well ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do different folders here contain the tests with different vector allocation or specific functions?
If you mean the specific functions, I might put them into device/<vendor>
device contains the uniform test for all backend, and device/<vendor> contains the tests just for specific vendor.
Both ways are clear, so I do not mind choosing another one unless one of them makes the CMake setting worse. (I do not think so, but just in case)

@BenBrock
Copy link
Collaborator

@yhmtsai Rather than implementing device_vector and other Thrust utilities directly in the repo, I think ti's better to create an external repo that will contain those utilities. I've started working on a draft of that here: #53.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants