Mpi serialized option by pdziekan · Pull Request #503 · igfuw/libmpdataxx

pdziekan · 2026-03-31T14:18:37Z

No description provided.

Copilot

Pull request overview

Adds a build-time option to run libmpdata++ with MPI_THREAD_SERIALIZED instead of MPI_THREAD_MULTIPLE, and introduces additional synchronization in halo-exchange paths intended to avoid unsafe concurrent MPI calls.

Changes:

Adds LIBMPDATAXX_MPI_THREAD_MULTIPLE CMake option (default ON) and corresponding compile definition.
Initializes MPI with MPI_THREAD_MULTIPLE or MPI_THREAD_SERIALIZED based on that option, updating related error handling.
Adds additional per-boundary-condition synchronization and adjusts boundary-condition ordering for distributed/cyclic cases.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
libmpdata++/solvers/detail/solver_common.hpp	Adjusts boundary-condition call ordering (rank-parity swap) and updates inline comments.
libmpdata++/solvers/detail/solver_3d.hpp	Adds per-bcond barriers in multiple halo-exchange routines and minor indexing change.
libmpdata++/solvers/detail/solver_2d.hpp	Adds per-bcond barriers in multiple halo-exchange routines.
libmpdata++/solvers/detail/solver_1d.hpp	Adds per-bcond barriers in halo-exchange routines.
libmpdata++/concurr/detail/distmem.hpp	Switches MPI init thread level between MULTIPLE/SERIALIZED and updates the thread-level check + error messages.
libmpdata++-config.cmake	Introduces CMake option and sets compile definitions/messages for MPI threading mode selection.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-31T14:26:24Z

+          {
+            bc->single_threaded ? bc->fill_halos_sclr(arr, range_ijk[1]^ext, range_ijk[2]^ext, deriv) : bc->fill_halos_sclr(arr, range_ijk_1__ext, range_ijk[2]^ext, deriv);
+            #if !defined(LIBMPDATAXX_MPI_THREAD_MULTIPLE)
+              if(this->mem->distmem.size() > 0) this->mem->barrier(); // to avoid simultaneous MPI calls (not needed if MPI_THREADS_MULTIPLE is supported and used, but many MPI implementations do not support it), as each bcond is handled by different thread


The barrier used to serialize per-boundary-condition execution is currently compiled only when LIBMPDATAXX_MPI_THREAD_MULTIPLE is defined, but the need to serialize MPI calls is exactly when MPI_THREAD_SERIALIZED is used (i.e., when LIBMPDATAXX_MPI_THREAD_MULTIPLE is not defined). As written, serialized builds can still issue concurrent MPI calls from different threads (violating MPI_THREAD_SERIALIZED), while MULTIPLE builds pay for extra barriers.

Suggested fix: invert the preprocessor condition (use #ifndef LIBMPDATAXX_MPI_THREAD_MULTIPLE or an explicit SERIALIZED macro) and consider checking distmem.size() > 1 to avoid unnecessary barriers in single-rank runs.

Suggested change

if(this->mem->distmem.size() > 0) this->mem->barrier(); // to avoid simultaneous MPI calls (not needed if MPI_THREADS_MULTIPLE is supported and used, but many MPI implementations do not support it), as each bcond is handled by different thread

if(this->mem->distmem.size() > 1) this->mem->barrier(); // to avoid simultaneous MPI calls (not needed if MPI_THREADS_MULTIPLE is supported and used, but many MPI implementations do not support it), as each bcond is handled by different thread

Copilot · 2026-03-31T14:26:25Z

+            bc->fill_halos_sclr(arr, range_ijk[1]^ext, deriv);
+            #if !defined(LIBMPDATAXX_MPI_THREAD_MULTIPLE)
+              if(this->mem->distmem.size() > 0) this->mem->barrier();
+            #endif
+          }


These per-bcond barriers intended to avoid simultaneous MPI calls are enabled only under LIBMPDATAXX_MPI_THREAD_MULTIPLE, but simultaneous MPI calls are allowed in MPI_THREAD_MULTIPLE mode and are not allowed in MPI_THREAD_SERIALIZED mode. This appears inverted and can lead to undefined behavior/deadlocks in serialized builds.

Suggested fix: invert the #ifdef so the barriers are present when building for MPI_THREAD_SERIALIZED (and optionally guard with distmem.size() > 1).

Copilot · 2026-03-31T14:26:25Z

+          for (auto &bc : this->bcs[0]) {
+            bc->fill_halos_sclr(arr, deriv);
+            #if !defined(LIBMPDATAXX_MPI_THREAD_MULTIPLE)
+              if(this->mem->distmem.size() > 0) this->mem->barrier();


Same issue as in other solvers: the extra barriers meant to prevent concurrent MPI calls are compiled only when LIBMPDATAXX_MPI_THREAD_MULTIPLE is defined, but they are needed when requesting MPI_THREAD_SERIALIZED (macro not defined). In serialized mode this can still allow multiple threads to enter MPI concurrently.

Suggested fix: invert the preprocessor condition and consider using distmem.size() > 1 to avoid overhead for single-rank runs.

Suggested change

if(this->mem->distmem.size() > 0) this->mem->barrier();

if (this->mem->distmem.size() > 1) this->mem->barrier();

Copilot · 2026-03-31T14:26:26Z

+          if (boost::mpi::environment::thread_level() != 
+            #ifdef LIBMPDATAXX_MPI_THREAD_MULTIPLE
+              boost::mpi::threading::multiple
+            #else
+              boost::mpi::threading::serialized
+            #endif
+          )
          {
-            throw std::runtime_error("libmpdata++: failed to initialise MPI environment with MPI_THREAD_MULTIPLE");
+            #ifdef LIBMPDATAXX_MPI_THREAD_MULTIPLE
+              throw std::runtime_error("libmpdata++: failed to initialise MPI environment with MPI_THREAD_MULTIPLE");
+            #else
+              throw std::runtime_error("libmpdata++: failed to initialise MPI environment with MPI_THREAD_SERIALIZED");
+            #endif


When building in serialized mode, this requires boost::mpi::environment::thread_level() to be exactly serialized. However MPI is allowed to provide a higher thread level than requested (e.g., multiple), which should still be acceptable for a build targeting MPI_THREAD_SERIALIZED. With the current != check, users/tests that initialize MPI with MPI_THREAD_MULTIPLE (many existing tests do) will incorrectly throw.

Suggested fix: compare against the requested level using an order check (provided < requested) rather than equality, or explicitly accept multiple when serialized is requested.

Copilot · 2026-03-31T14:26:26Z

+  if(LIBMPDATAXX_MPI_THREAD_MULTIPLE)
+    set(libmpdataxx_CXX_FLAGS_DEBUG "${libmpdataxx_CXX_FLAGS_DEBUG} -DLIBMPDATAXX_MPI_THREAD_MULTIPLE")
+    set(libmpdataxx_CXX_FLAGS_RELEASE "${libmpdataxx_CXX_FLAGS_RELEASE} -DLIBMPDATAXX_MPI_THREAD_MULTIPLE")
+    message(STATUS "libmpdata++ will be built with multithreaded MPI calls enabled (MPI_THREAD_MULTIPLE). Set -DLIBMPDATAXX_MPI_THREAD_MULTIPLE=OFF to use MPI_THREAD_SERIALIZED instead.")
+  else()


The new LIBMPDATAXX_MPI_THREAD_MULTIPLE option introduces a second supported MPI threading mode (SERIALIZED), but the test configuration doesn’t appear to exercise this path. Given many tests explicitly call MPI::Init_thread(MPI_THREAD_MULTIPLE), it would be easy for serialized builds to silently regress.

Suggested fix: add a CI/test matrix entry that configures with -DLIBMPDATAXX_MPI_THREAD_MULTIPLE=OFF and runs at least one MPI-enabled test (ideally covering 3D remote bconds / halo exchange ordering).

Suggested change

if(LIBMPDATAXX_MPI_THREAD_MULTIPLE)

set(libmpdataxx_CXX_FLAGS_DEBUG "${libmpdataxx_CXX_FLAGS_DEBUG} -DLIBMPDATAXX_MPI_THREAD_MULTIPLE")

set(libmpdataxx_CXX_FLAGS_RELEASE "${libmpdataxx_CXX_FLAGS_RELEASE} -DLIBMPDATAXX_MPI_THREAD_MULTIPLE")

message(STATUS "libmpdata++ will be built with multithreaded MPI calls enabled (MPI_THREAD_MULTIPLE). Set -DLIBMPDATAXX_MPI_THREAD_MULTIPLE=OFF to use MPI_THREAD_SERIALIZED instead.")

else()

if(LIBMPDATAXX_MPI_THREAD_MULTIPLE)

# Explicitly record the selected MPI threading mode for external tooling / CI

set(LIBMPDATAXX_MPI_THREAD_MODE "MULTIPLE" CACHE STRING "Selected MPI threading mode for libmpdata++ (MULTIPLE or SERIALIZED)" FORCE)

set(libmpdataxx_CXX_FLAGS_DEBUG "${libmpdataxx_CXX_FLAGS_DEBUG} -DLIBMPDATAXX_MPI_THREAD_MULTIPLE")

set(libmpdataxx_CXX_FLAGS_RELEASE "${libmpdataxx_CXX_FLAGS_RELEASE} -DLIBMPDATAXX_MPI_THREAD_MULTIPLE")

message(STATUS "libmpdata++ will be built with multithreaded MPI calls enabled (MPI_THREAD_MULTIPLE). Set -DLIBMPDATAXX_MPI_THREAD_MULTIPLE=OFF to use MPI_THREAD_SERIALIZED instead.")

else()

# Explicitly record the serialized MPI threading mode and provide a dedicated macro

set(LIBMPDATAXX_MPI_THREAD_MODE "SERIALIZED" CACHE STRING "Selected MPI threading mode for libmpdata++ (MULTIPLE or SERIALIZED)" FORCE)

set(libmpdataxx_CXX_FLAGS_DEBUG "${libmpdataxx_CXX_FLAGS_DEBUG} -DLIBMPDATAXX_MPI_THREAD_SERIALIZED")

set(libmpdataxx_CXX_FLAGS_RELEASE "${libmpdataxx_CXX_FLAGS_RELEASE} -DLIBMPDATAXX_MPI_THREAD_SERIALIZED")

Copilot · 2026-03-31T14:26:26Z

          // with distributed memory and cyclic boundary conditions,
          // leftmost node must send left first, as
          // rightmost node is waiting
-          if (d == 0 && this->mem->distmem.size() > 0 && this->mem->distmem.rank() == 0)
+          // if (d == 0 && this->mem->distmem.size() > 0 && this->mem->distmem.rank() == 0)
+          // with serialized MPI, matching left/right calls need to be done in order, doesn't hurt in multithreaded case as well
+          if (d == 0 && this->mem->distmem.size() > 0 && this->mem->distmem.rank() % 2 == 0)
            std::swap(bcl, bcr);


This commented-out previous condition plus the updated explanation is confusing and doesn’t match the new behavior (rank parity swap). Keeping dead code in comments makes it hard to tell what ordering is actually required (e.g., deadlock avoidance vs thread-serialized MPI).

Suggested fix: remove the commented-out if (...) rank() == 0 line and update the comment to precisely document why parity-based swapping is required and in which configurations (e.g., MPI with remote bconds, halo==1 send/recv patterns).

Copilot · 2026-03-31T14:26:27Z

+            for (auto &bc : this->bcs[0]) //bc->fill_halos_vctr_alng(arrvec, j, k, ad);
+            {
+              bc->single_threaded ? bc->fill_halos_vctr_alng(arrvec, j, k, ad) : bc->fill_halos_vctr_alng(arrvec, j, k, ad);
+              #if !defined(LIBMPDATAXX_MPI_THREAD_MULTIPLE)
+                if(this->mem->distmem.size() > 0) this->mem->barrier();
+              #endif


The commented-out call (//bc->fill_halos_vctr_alng...) and the ternary that calls the same function in both branches add noise without changing behavior. This makes the intent of the change (ordering/barriers) harder to follow.

Suggested fix: remove the commented-out line and replace the redundant ternary with a single unconditional call inside the loop body.

…ized

pdziekan added 2 commits March 31, 2026 15:20

serialized MPI: distmem and solver_3d

8f8156c

compile-time choice between MPI_THREAD_SERIZLIAED and MULTIPLE

845bad0

Copilot AI review requested due to automatic review settings March 31, 2026 14:18

Copilot started reviewing on behalf of pdziekan March 31, 2026 14:19 View session

fix preprocessor checks for mpi multiple thread

2e917b9

Copilot AI reviewed Mar 31, 2026

View reviewed changes

pdziekan added 3 commits April 8, 2026 14:02

GA: check MPI_SERIALIZED in unit tests

f6d123e

Merge branch 'master' of github.com:igfuw/libmpdataxx into mpi_serial…

56f21bb

…ized

GA: check MPI_SERIALIZED in unit tests fix

d8ac72e

pdziekan changed the title ~~Mpi serialized option; TODO: add it to tests~~ Mpi serialized option Apr 9, 2026

MPI serialized: ok if multiple is provided anyway

babd1a5

pdziekan merged commit 3e6eb3a into igfuw:master Apr 15, 2026
29 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mpi serialized option#503

Mpi serialized option#503
pdziekan merged 7 commits intoigfuw:masterfrom
pdziekan:mpi_serialized

pdziekan commented Mar 31, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 31, 2026

Uh oh!

Copilot AI Mar 31, 2026

Uh oh!

Copilot AI Mar 31, 2026

Uh oh!

Copilot AI Mar 31, 2026

Uh oh!

Copilot AI Mar 31, 2026

Uh oh!

Copilot AI Mar 31, 2026

Uh oh!

Copilot AI Mar 31, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	if(this->mem->distmem.size() > 0) this->mem->barrier(); // to avoid simultaneous MPI calls (not needed if MPI_THREADS_MULTIPLE is supported and used, but many MPI implementations do not support it), as each bcond is handled by different thread
	if(this->mem->distmem.size() > 1) this->mem->barrier(); // to avoid simultaneous MPI calls (not needed if MPI_THREADS_MULTIPLE is supported and used, but many MPI implementations do not support it), as each bcond is handled by different thread

	if(this->mem->distmem.size() > 0) this->mem->barrier();
	if (this->mem->distmem.size() > 1) this->mem->barrier();

Conversation

pdziekan commented Mar 31, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants