Currently dcompute.std.index exposes device helpers as template/property functions, e.g. GlobalIndex.x()(), so the helper body is instantiated into the importing kernel module, without compiling the dcompute's source code.
By contrast, dcompute.std.sync.barrier() is currently a normal function, which means it should be explicitly compiled.
So is it more reasonable to change dcompute.std.sync.barrier() to a zero-argument template like GlobalIndex.x()()?
Currently
dcompute.std.indexexposes device helpers as template/property functions, e.g.GlobalIndex.x()(), so the helper body is instantiated into the importing kernel module, without compiling the dcompute's source code.By contrast,
dcompute.std.sync.barrier()is currently a normal function, which means it should be explicitly compiled.So is it more reasonable to change
dcompute.std.sync.barrier()to a zero-argument template likeGlobalIndex.x()()?