-
Notifications
You must be signed in to change notification settings - Fork 8
Description
spblas-reference/notes/spmv.hpp
Lines 10 to 24 in 0930680
| operation_info_t info; | |
| device_policy policy; | |
| multiply_inspect(info, policy, a, x, y); | |
| multiply_inspect(info, policy, transposed(a), x, y); | |
| // Allocate more memory for y based on `info` | |
| while (/* ... */) { | |
| multiply_execute(info, policy, a, x, y); | |
| // do something with y, update x... | |
| multiply_execute(info, policy, transposed(a), y, x); | |
| // Maybe do some more stuff... | |
| } |
I like this idea of having an info type that is directly associated with some matrix structure and which is filled with 0 or more inspection based optimizations (which means it houses "stateful + read-only" optimizations. I wonder if it would be possible to have our multiply functions take in some hybrid matrix_obj object which consists of either a matrix_view or a matrix_view + an associated matrix_info_t type -- used in some way like the following snippet
csr_view<T,I,O> A(...);
matrix_info_t A_info(...);
multiply_inspect( matrix_obj{A, A_info}, descriptor, x, y, /*backend stuff*/ )
multiply_execute( matrix_obj{A, A_info}, descriptor, x,y, /*backend stuff */)or we might also skip the inspection at the cost of less performance...
csr_view<T,I,O> A(...);
multiply_execute( matrix_obj{A}, descriptor, x,y, /*backend stuff */)The benefit of this is that when we look at the sparse * sparse operation, we could have an A_info, B_info, that may contain good (read-only stateful) stuff that might be useful about A, B while creating C, and then there may be another multistage_info_t which is particular to the multi-stage operation (stateful + read/write data)
csr_view<T,I,O> A(...);
matrix_info_t A_info(...);
csr_view<T,I,O> B(...);
matrix_info_t B_info(...);
csr_view<T,I,O> C(...);
multiply_info_t mult_info(); // C = A *B^T
multiply_inspect(matrix_obj{C}, matrix_obj{A, A_info}, transpose(matrix_obj{B,B_info}), desc, /*backend stuff*/ ); // fills A_info and or B_info
multiply_execute_stage1( matrix_obj{C}, matrix_obj{A, A_info}, transpose(matrix_obj{B,B_info}), mult_info, /*backend stuff*/ ); // fills mult_info and C
multiply_execute_stage2( matrix_obj{C}, matrix_obj{A, A_info}, transpose(matrix_obj{B,B_info}), mult_info, /*backend stuff*/ ); // fills mult_info and C
multiply_execute_stage3( matrix_obj{C}, matrix_obj{A, A_info}, transpose(matrix_obj{B,B_info}), mult_info, /*backend stuff*/ ); // fills mult_info and Cmult_info might house the stateful + read/write optimizations pertaining to the multiply multi-stage process ... A_info and B_info might pertain stateful + read-only optimizations about A and or B ...
Does this idea make sense? Does anyone see any use issues ? Is it too ugly ? I worry that we will have too many overloads if we have A and possibly A_info etc separated as inputs ... And this allows us to distinguish between matrix inputs + info and operational info data ...