Clean up CMake, add option for SYCL reference. #55

BenBrock · 2025-07-05T07:02:46Z

Summary:
Add experimental support for SYCL reference backend.

Details:

Clean up CMake for vendor backends.
Add support for SYCL reference backend.

Merge Checklist:

Passing CI
Update documentation or README.md
Additional Test/example added (if applicable) and passing
At least one reviewer approval
(optional) Clang sanitizer scan run and triaged
Clang formatter applied (verified as part of passing CI)

spencerpatty · 2025-07-27T23:28:10Z

include/spblas/backend/sycl/spmm_impl.hpp

+                     for (auto elem_idx = lid; elem_idx < row.size();
+                          elem_idx += lsz) {


so it looks like you are doing subgroup vector parallelism over the nonzeros in the row of A? For SpMM there might be some scenarios where it is better to do subgroup vector parallelism over elements of B (especially when there are more columns than 32, it is more rare than we think to have a sparse matrix that has on average 32 or more nonzeros in each row, so this row.size() bound often means that we are not using the full parallelism here ...

what would be ideal is to have essentially two algorithms that could be selected at runtime -- one with vector parallelism over dense matrix elements, and the other that could own one or more rows of the sparse matrix and do some sort of segmented reduction (of course sycl doesn't have a segmented scan yet, but we can do a full prefix scan over the set and then subtract stuff to get segmented scans ...

I agree—currently we've got two algorithm, the "split k" and "split j" methods. I'm running some benchmarks now, and those will hopefully tell us when to call which method. (And also generally illuminate what their performance characteristics are.)

spencerpatty · 2025-09-16T00:33:52Z

examples/sycl_reference/sycl_spmm.cpp

+  double gb = 1e-9 * (nnz * sizeof(value_t) + nnz * sizeof(index_t) +
+                      (m + 1) * sizeof(offset_t) + k * n * sizeof(value_t) +
+                      m * n * sizeof(value_t));
+


spmm is one of the few sparse algorithms that has potential of getting into compute bound region instead of just memory bound, so calculating gflops is also helpful. all others should just be looked at compared to the gb memory limits.

it can happen because of the potential reuse of B dense matrix if we are careful from cache, while streaming A matrix and limiting accesses to C (along with trying to not cache C at all)

In my opinion, for measuring peak perf of a kernel, it is a good idea to have a warmup loop with several iterations untimed, then a timed run loop that in aggregate takes on order of seconds or at least ms to run, with average time per run computed and recorded. this increases the change of repeatability and stability of measurement and runs over time and makes them much more comparable.

I've made a few updates that do both things: compute GFLOPs in addition to BW achieved, and do up to a 2 second warmup before timing.

device at runtime.

BenBrock and others added 4 commits July 5, 2025 00:01

Clean up CMake, add option for SYCL reference.

442ca6d

Intermediate commit: initial implementation, no tests yet.

d87abf6

Merge branch 'main' into dev/brock/sycl-reference

d55847d

Implement basic SpMM in SYCL.

59d015c

spencerpatty reviewed Jul 27, 2025

View reviewed changes

Implement basic benchmark.

1b82474

spencerpatty reviewed Sep 16, 2025

View reviewed changes

BenBrock added 7 commits September 17, 2025 15:17

Add spmm_benchmark for GPU backends.

d468b7d

Update SYCL SpMM benchmark as well as general SpMM benchmark.

5a825e8

Use thrust::device to avoid any potential overhead from detecting

9fc0538

device at runtime.

Add plotting

4cde552

Implement simple reordering of split k method.

e089f8a

Update

66d899f

Implement split-k algorithm with smem.

adaf75f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Clean up CMake, add option for SYCL reference. #55

Clean up CMake, add option for SYCL reference. #55

Uh oh!

BenBrock commented Jul 5, 2025 •

edited

Loading

Uh oh!

spencerpatty Jul 27, 2025

Uh oh!

spencerpatty Jul 27, 2025

Uh oh!

BenBrock Sep 20, 2025

Uh oh!

spencerpatty Sep 16, 2025

Uh oh!

spencerpatty Sep 16, 2025

Uh oh!

spencerpatty Sep 16, 2025

Uh oh!

BenBrock Sep 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		for (auto elem_idx = lid; elem_idx < row.size();
		elem_idx += lsz) {

Clean up CMake, add option for SYCL reference. #55

Are you sure you want to change the base?

Clean up CMake, add option for SYCL reference. #55

Uh oh!

Conversation

BenBrock commented Jul 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

spencerpatty Jul 27, 2025

Choose a reason for hiding this comment

Uh oh!

spencerpatty Jul 27, 2025

Choose a reason for hiding this comment

Uh oh!

BenBrock Sep 20, 2025

Choose a reason for hiding this comment

Uh oh!

spencerpatty Sep 16, 2025

Choose a reason for hiding this comment

Uh oh!

spencerpatty Sep 16, 2025

Choose a reason for hiding this comment

Uh oh!

spencerpatty Sep 16, 2025

Choose a reason for hiding this comment

Uh oh!

BenBrock Sep 20, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

BenBrock commented Jul 5, 2025 •

edited

Loading