Initial OpenMP/GPU support by pbartholomew08 · Pull Request #228 · xcompact3d/x3d2

pbartholomew08 · 2025-09-30T10:48:25Z

This branch implements the basic offloading of data using openMP with the ability to perform the vecadd operation.

It now behaves polymorphically and will create device fields when that is the `next` type and host fields otherwise

As libx3d2_backends links libx3d2 and xcompact/tests link both the linking order must be libx3d2_backends then libx3d2 to prevent duplicate symbols during linking.

This is to allow initialisation of the base class whether called by OMP/CPU or the OMP/TGT object

The code runs through the test successfully - need to confirm offload and check for data movement

This means data resides on the device only

Note this is only working on AMD GPUs (although it should be supported on NVIDIA GPUs w/Cray compiler)

ia267

Couple of things to address before merging.

ia267 · 2026-04-06T03:41:04Z

+  use m_common, only: dp, pi, DIR_X
+  use m_mesh, only: mesh_t
+
+  use m_omptg_allocator, only: omptgt_allocator_t


typo: should be m_omptgt_allocator

I guess this file hasn't been added to CMakeListst.txt so it wasn't pickedup.

ia267 · 2026-04-06T04:00:50Z

  set(CMAKE_Fortran_FLAGS_DEBUG "-g -Og -Wall -Wpedantic -Werror -Wimplicit-interface -Wimplicit-procedure -Wno-unused-dummy-argument")
  set(CMAKE_Fortran_FLAGS_RELEASE "-O3 -ffast-math")
+  if (OMP_TGT)
+    # A bit of a hack - hardcoded for MI300A


Could we use CMake cache varibales (e.g. OMP_TGT_ARCH) so we don't need to edit CMake files?

Good point - this was added for development, and not intended as the actual implementation

ia267 · 2026-04-06T04:02:17Z

    target_link_libraries(${test_name} PRIVATE OpenMP::OpenMP_Fortran)
+
+    if(${backend} STREQUAL omp_tgt)
+      # Note this is somewhat of a hack - hardcoded to build against MI300A


Could we use CMake cache variables instead?

We can have a list of options as follow:

set(gpu_archs gfx942 CACHE STRING "") set_property(CACHE gpu_archs PROPERTY STRINGS gfx942 gfx900 gfx902 gfx906 gfx908 gfx909 gfx90a gfx90c)

One other alternative is to add an additional option of the form cmake .. -Dgpu_arch=gfx942

This would avoid updating the list of toggled option every time there is a new architecture coming out but put the responsibility of setting the correct input on the user.

In both case, we should be able to set the parameter as follow
target_compile_options(${test_name} PRIVATE "--offload-arch=${gpu_arch}")

ia267 · 2026-04-06T04:28:00Z

+  end subroutine
+
+  ! Deallocates device-resident memory before deallocating the base type
+  subroutine destroy(self)


This destroy subroutine is not bound to omptgt_allocator_t. Need to add procedure :: destroy => destroy to omptgt_allocator_t

CFD-Xing · 2026-04-07T10:01:19Z

+    !$omp target teams distribute parallel do private(out_i, out_j, out_k) collapse(3) map(to:u) has_device_addr(u_)
+    do k = 1, dims(3)
+      do j = 1, dims(2)
+        do i = 1, dims(1)
+          call get_index_reordering(out_i, out_j, out_k, i, j, k, &
+                                    dir_from, dir_to, SZ, cart_padded)
+          u_(out_i, out_j, out_k) = u(i, j, k)
+        end do
+      end do
+    end do
+    !$omp end target teams distribute parallel do


Observation: Re-ordering without using shared (scratchpad) memory is likely to be inefficient on GPU due to non-coalesced memory access

pbartholomew08 force-pushed the omp_gpu branch from 44afac1 to 86a878a Compare January 9, 2026 10:17

pbartholomew08 added 26 commits April 2, 2026 11:16

WIP: Implement a OpenMP target field type and allocator

265a028

Move OpenMP target offloads to omp/target directory

c8ae25e

Optionally build OpenMP Target backend

9dad110

Fix types in OpenMP target block allocator

1be2573

It now behaves polymorphically and will create device fields when that is the `next` type and host fields otherwise

WIP on OMP target vecadd

d1da9cf

Correcting link order

68b0fef

As libx3d2_backends links libx3d2 and xcompact/tests link both the linking order must be libx3d2_backends then libx3d2 to prevent duplicate symbols during linking.

Cleaning up test_vecadd

63dc2ac

The omp backend must assign its allocator based on class

e170514

This is to allow initialisation of the base class whether called by OMP/CPU or the OMP/TGT object

Don't declare the method as a module function

1ad568c

Need to allocate the new field pointer

833e6f5

Specify the target mapping operations when creating a field

85005a7

Initially 'working' OMP target vec add

842fc8d

The code runs through the test successfully - need to confirm offload and check for data movement

Remove debugging print statement

59b40ed

We only need the 3-D view of data on the device

a48a61c

Remove duplicate entry from CMakeLists sources

19acbc9

Mark index calculations as offloadable

c713318

Add support for get/set fields with OMP target

13f435e

WIP - attempting simplified OMP calls

06c1877

WIP allocating memory using OpenMP API

105d7d4

This means data resides on the device only

Continuing ...

6ef42de

Trying to map pointers to target...

362e992

Initial working version of OMPTARGET vec add

dbf7ee0

Note this is only working on AMD GPUs (although it should be supported on NVIDIA GPUs w/Cray compiler)

Restore IBM module

8fbf90c

Minor formatting change

46ca117

Adding support for OMP offload of timestepping

559642d

Update OMPTGT test definitions

9755b76

pbartholomew08 force-pushed the omp_gpu branch from 92b6ffe to 9755b76 Compare April 2, 2026 10:32

Run fprettify

794a8f3

pbartholomew08 marked this pull request as ready for review April 2, 2026 10:54

pbartholomew08 requested a review from ia267 April 2, 2026 10:54

ia267 requested changes Apr 6, 2026

View reviewed changes

CFD-Xing reviewed Apr 7, 2026

View reviewed changes

Comment thread tests/performance/perf_cuda_reorder.f90 Outdated

CFD-Xing reviewed Apr 7, 2026

View reviewed changes

Fixing CUDA syntax

083c5f0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initial OpenMP/GPU support#228

Initial OpenMP/GPU support#228
pbartholomew08 wants to merge 28 commits intoxcompact3d:mainfrom
pbartholomew08:omp_gpu

pbartholomew08 commented Sep 30, 2025

Uh oh!

ia267 left a comment

Uh oh!

ia267 Apr 6, 2026

Uh oh!

ia267 Apr 6, 2026

Uh oh!

pbartholomew08 Apr 7, 2026

Uh oh!

ia267 Apr 6, 2026

Uh oh!

CFD-Xing Apr 7, 2026

Uh oh!

ia267 Apr 6, 2026

Uh oh!

Uh oh!

CFD-Xing Apr 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

pbartholomew08 commented Sep 30, 2025

Uh oh!

ia267 left a comment

Choose a reason for hiding this comment

Uh oh!

ia267 Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

ia267 Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

pbartholomew08 Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

ia267 Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

CFD-Xing Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

ia267 Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

CFD-Xing Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants