Conversation
44afac1 to
86a878a
Compare
It now behaves polymorphically and will create device fields when that is the `next` type and host fields otherwise
As libx3d2_backends links libx3d2 and xcompact/tests link both the linking order must be libx3d2_backends then libx3d2 to prevent duplicate symbols during linking.
This is to allow initialisation of the base class whether called by OMP/CPU or the OMP/TGT object
The code runs through the test successfully - need to confirm offload and check for data movement
This means data resides on the device only
Note this is only working on AMD GPUs (although it should be supported on NVIDIA GPUs w/Cray compiler)
ia267
left a comment
There was a problem hiding this comment.
Couple of things to address before merging.
| use m_common, only: dp, pi, DIR_X | ||
| use m_mesh, only: mesh_t | ||
|
|
||
| use m_omptg_allocator, only: omptgt_allocator_t |
There was a problem hiding this comment.
typo: should be m_omptgt_allocator
I guess this file hasn't been added to CMakeListst.txt so it wasn't pickedup.
| set(CMAKE_Fortran_FLAGS_DEBUG "-g -Og -Wall -Wpedantic -Werror -Wimplicit-interface -Wimplicit-procedure -Wno-unused-dummy-argument") | ||
| set(CMAKE_Fortran_FLAGS_RELEASE "-O3 -ffast-math") | ||
| if (OMP_TGT) | ||
| # A bit of a hack - hardcoded for MI300A |
There was a problem hiding this comment.
Could we use CMake cache varibales (e.g. OMP_TGT_ARCH) so we don't need to edit CMake files?
There was a problem hiding this comment.
Good point - this was added for development, and not intended as the actual implementation
| target_link_libraries(${test_name} PRIVATE OpenMP::OpenMP_Fortran) | ||
|
|
||
| if(${backend} STREQUAL omp_tgt) | ||
| # Note this is somewhat of a hack - hardcoded to build against MI300A |
There was a problem hiding this comment.
Could we use CMake cache variables instead?
There was a problem hiding this comment.
We can have a list of options as follow:
set(gpu_archs gfx942 CACHE STRING "")
set_property(CACHE gpu_archs PROPERTY STRINGS gfx942 gfx900 gfx902 gfx906 gfx908 gfx909 gfx90a gfx90c)
One other alternative is to add an additional option of the form cmake .. -Dgpu_arch=gfx942
This would avoid updating the list of toggled option every time there is a new architecture coming out but put the responsibility of setting the correct input on the user.
In both case, we should be able to set the parameter as follow
target_compile_options(${test_name} PRIVATE "--offload-arch=${gpu_arch}")
| end subroutine | ||
|
|
||
| ! Deallocates device-resident memory before deallocating the base type | ||
| subroutine destroy(self) |
There was a problem hiding this comment.
This destroy subroutine is not bound to omptgt_allocator_t. Need to add procedure :: destroy => destroy to omptgt_allocator_t
| !$omp target teams distribute parallel do private(out_i, out_j, out_k) collapse(3) map(to:u) has_device_addr(u_) | ||
| do k = 1, dims(3) | ||
| do j = 1, dims(2) | ||
| do i = 1, dims(1) | ||
| call get_index_reordering(out_i, out_j, out_k, i, j, k, & | ||
| dir_from, dir_to, SZ, cart_padded) | ||
| u_(out_i, out_j, out_k) = u(i, j, k) | ||
| end do | ||
| end do | ||
| end do | ||
| !$omp end target teams distribute parallel do |
There was a problem hiding this comment.
Observation: Re-ordering without using shared (scratchpad) memory is likely to be inefficient on GPU due to non-coalesced memory access
This branch implements the basic offloading of data using openMP with the ability to perform the
vecaddoperation.