We currently duplicate quite some functionality that could be reused from CUDA.jl (or GPUCompiler.jl) if it were more extensible/reusable:
- Launch syntax (integrate with
@cuda or provide something similar)
- Argument conversion during launch (
cudaconvert and friends, skipping of ghost values, detecting CPU array inputs, etc)
Compilation cache (GPUCompiler.jl's is very much CodeInstance-oriented though, so not sure if this would work, but the current _compilation_cache in cuTile.jl is naive and slow)
- Reflection utilities (hooked compilation etc; maybe not worth the integration effort)
We currently duplicate quite some functionality that could be reused from CUDA.jl (or GPUCompiler.jl) if it were more extensible/reusable:
@cudaor provide something similar)cudaconvertand friends, skipping of ghost values, detecting CPU array inputs, etc)Compilation cache (GPUCompiler.jl's is very much CodeInstance-oriented though, so not sure if this would work, but the current_compilation_cachein cuTile.jl is naive and slow)