Skip to content

Mpi serialized option#503

Merged
pdziekan merged 7 commits intoigfuw:masterfrom
pdziekan:mpi_serialized
Apr 15, 2026
Merged

Mpi serialized option#503
pdziekan merged 7 commits intoigfuw:masterfrom
pdziekan:mpi_serialized

Conversation

@pdziekan
Copy link
Copy Markdown
Contributor

No description provided.

Copilot AI review requested due to automatic review settings March 31, 2026 14:18
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a build-time option to run libmpdata++ with MPI_THREAD_SERIALIZED instead of MPI_THREAD_MULTIPLE, and introduces additional synchronization in halo-exchange paths intended to avoid unsafe concurrent MPI calls.

Changes:

  • Adds LIBMPDATAXX_MPI_THREAD_MULTIPLE CMake option (default ON) and corresponding compile definition.
  • Initializes MPI with MPI_THREAD_MULTIPLE or MPI_THREAD_SERIALIZED based on that option, updating related error handling.
  • Adds additional per-boundary-condition synchronization and adjusts boundary-condition ordering for distributed/cyclic cases.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
libmpdata++/solvers/detail/solver_common.hpp Adjusts boundary-condition call ordering (rank-parity swap) and updates inline comments.
libmpdata++/solvers/detail/solver_3d.hpp Adds per-bcond barriers in multiple halo-exchange routines and minor indexing change.
libmpdata++/solvers/detail/solver_2d.hpp Adds per-bcond barriers in multiple halo-exchange routines.
libmpdata++/solvers/detail/solver_1d.hpp Adds per-bcond barriers in halo-exchange routines.
libmpdata++/concurr/detail/distmem.hpp Switches MPI init thread level between MULTIPLE/SERIALIZED and updates the thread-level check + error messages.
libmpdata++-config.cmake Introduces CMake option and sets compile definitions/messages for MPI threading mode selection.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

{
bc->single_threaded ? bc->fill_halos_sclr(arr, range_ijk[1]^ext, range_ijk[2]^ext, deriv) : bc->fill_halos_sclr(arr, range_ijk_1__ext, range_ijk[2]^ext, deriv);
#if !defined(LIBMPDATAXX_MPI_THREAD_MULTIPLE)
if(this->mem->distmem.size() > 0) this->mem->barrier(); // to avoid simultaneous MPI calls (not needed if MPI_THREADS_MULTIPLE is supported and used, but many MPI implementations do not support it), as each bcond is handled by different thread
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The barrier used to serialize per-boundary-condition execution is currently compiled only when LIBMPDATAXX_MPI_THREAD_MULTIPLE is defined, but the need to serialize MPI calls is exactly when MPI_THREAD_SERIALIZED is used (i.e., when LIBMPDATAXX_MPI_THREAD_MULTIPLE is not defined). As written, serialized builds can still issue concurrent MPI calls from different threads (violating MPI_THREAD_SERIALIZED), while MULTIPLE builds pay for extra barriers.

Suggested fix: invert the preprocessor condition (use #ifndef LIBMPDATAXX_MPI_THREAD_MULTIPLE or an explicit SERIALIZED macro) and consider checking distmem.size() > 1 to avoid unnecessary barriers in single-rank runs.

Suggested change
if(this->mem->distmem.size() > 0) this->mem->barrier(); // to avoid simultaneous MPI calls (not needed if MPI_THREADS_MULTIPLE is supported and used, but many MPI implementations do not support it), as each bcond is handled by different thread
if(this->mem->distmem.size() > 1) this->mem->barrier(); // to avoid simultaneous MPI calls (not needed if MPI_THREADS_MULTIPLE is supported and used, but many MPI implementations do not support it), as each bcond is handled by different thread

Copilot uses AI. Check for mistakes.
Comment on lines +49 to +53
bc->fill_halos_sclr(arr, range_ijk[1]^ext, deriv);
#if !defined(LIBMPDATAXX_MPI_THREAD_MULTIPLE)
if(this->mem->distmem.size() > 0) this->mem->barrier();
#endif
}
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These per-bcond barriers intended to avoid simultaneous MPI calls are enabled only under LIBMPDATAXX_MPI_THREAD_MULTIPLE, but simultaneous MPI calls are allowed in MPI_THREAD_MULTIPLE mode and are not allowed in MPI_THREAD_SERIALIZED mode. This appears inverted and can lead to undefined behavior/deadlocks in serialized builds.

Suggested fix: invert the #ifdef so the barriers are present when building for MPI_THREAD_SERIALIZED (and optionally guard with distmem.size() > 1).

Copilot uses AI. Check for mistakes.
for (auto &bc : this->bcs[0]) {
bc->fill_halos_sclr(arr, deriv);
#if !defined(LIBMPDATAXX_MPI_THREAD_MULTIPLE)
if(this->mem->distmem.size() > 0) this->mem->barrier();
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue as in other solvers: the extra barriers meant to prevent concurrent MPI calls are compiled only when LIBMPDATAXX_MPI_THREAD_MULTIPLE is defined, but they are needed when requesting MPI_THREAD_SERIALIZED (macro not defined). In serialized mode this can still allow multiple threads to enter MPI concurrently.

Suggested fix: invert the preprocessor condition and consider using distmem.size() > 1 to avoid overhead for single-rank runs.

Suggested change
if(this->mem->distmem.size() > 0) this->mem->barrier();
if (this->mem->distmem.size() > 1) this->mem->barrier();

Copilot uses AI. Check for mistakes.
Comment thread libmpdata++/concurr/detail/distmem.hpp Outdated
Comment on lines +195 to +207
if (boost::mpi::environment::thread_level() !=
#ifdef LIBMPDATAXX_MPI_THREAD_MULTIPLE
boost::mpi::threading::multiple
#else
boost::mpi::threading::serialized
#endif
)
{
throw std::runtime_error("libmpdata++: failed to initialise MPI environment with MPI_THREAD_MULTIPLE");
#ifdef LIBMPDATAXX_MPI_THREAD_MULTIPLE
throw std::runtime_error("libmpdata++: failed to initialise MPI environment with MPI_THREAD_MULTIPLE");
#else
throw std::runtime_error("libmpdata++: failed to initialise MPI environment with MPI_THREAD_SERIALIZED");
#endif
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When building in serialized mode, this requires boost::mpi::environment::thread_level() to be exactly serialized. However MPI is allowed to provide a higher thread level than requested (e.g., multiple), which should still be acceptable for a build targeting MPI_THREAD_SERIALIZED. With the current != check, users/tests that initialize MPI with MPI_THREAD_MULTIPLE (many existing tests do) will incorrectly throw.

Suggested fix: compare against the requested level using an order check (provided < requested) rather than equality, or explicitly accept multiple when serialized is requested.

Copilot uses AI. Check for mistakes.
Comment thread libmpdata++-config.cmake
Comment on lines +147 to +151
if(LIBMPDATAXX_MPI_THREAD_MULTIPLE)
set(libmpdataxx_CXX_FLAGS_DEBUG "${libmpdataxx_CXX_FLAGS_DEBUG} -DLIBMPDATAXX_MPI_THREAD_MULTIPLE")
set(libmpdataxx_CXX_FLAGS_RELEASE "${libmpdataxx_CXX_FLAGS_RELEASE} -DLIBMPDATAXX_MPI_THREAD_MULTIPLE")
message(STATUS "libmpdata++ will be built with multithreaded MPI calls enabled (MPI_THREAD_MULTIPLE). Set -DLIBMPDATAXX_MPI_THREAD_MULTIPLE=OFF to use MPI_THREAD_SERIALIZED instead.")
else()
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new LIBMPDATAXX_MPI_THREAD_MULTIPLE option introduces a second supported MPI threading mode (SERIALIZED), but the test configuration doesn’t appear to exercise this path. Given many tests explicitly call MPI::Init_thread(MPI_THREAD_MULTIPLE), it would be easy for serialized builds to silently regress.

Suggested fix: add a CI/test matrix entry that configures with -DLIBMPDATAXX_MPI_THREAD_MULTIPLE=OFF and runs at least one MPI-enabled test (ideally covering 3D remote bconds / halo exchange ordering).

Suggested change
if(LIBMPDATAXX_MPI_THREAD_MULTIPLE)
set(libmpdataxx_CXX_FLAGS_DEBUG "${libmpdataxx_CXX_FLAGS_DEBUG} -DLIBMPDATAXX_MPI_THREAD_MULTIPLE")
set(libmpdataxx_CXX_FLAGS_RELEASE "${libmpdataxx_CXX_FLAGS_RELEASE} -DLIBMPDATAXX_MPI_THREAD_MULTIPLE")
message(STATUS "libmpdata++ will be built with multithreaded MPI calls enabled (MPI_THREAD_MULTIPLE). Set -DLIBMPDATAXX_MPI_THREAD_MULTIPLE=OFF to use MPI_THREAD_SERIALIZED instead.")
else()
if(LIBMPDATAXX_MPI_THREAD_MULTIPLE)
# Explicitly record the selected MPI threading mode for external tooling / CI
set(LIBMPDATAXX_MPI_THREAD_MODE "MULTIPLE" CACHE STRING "Selected MPI threading mode for libmpdata++ (MULTIPLE or SERIALIZED)" FORCE)
set(libmpdataxx_CXX_FLAGS_DEBUG "${libmpdataxx_CXX_FLAGS_DEBUG} -DLIBMPDATAXX_MPI_THREAD_MULTIPLE")
set(libmpdataxx_CXX_FLAGS_RELEASE "${libmpdataxx_CXX_FLAGS_RELEASE} -DLIBMPDATAXX_MPI_THREAD_MULTIPLE")
message(STATUS "libmpdata++ will be built with multithreaded MPI calls enabled (MPI_THREAD_MULTIPLE). Set -DLIBMPDATAXX_MPI_THREAD_MULTIPLE=OFF to use MPI_THREAD_SERIALIZED instead.")
else()
# Explicitly record the serialized MPI threading mode and provide a dedicated macro
set(LIBMPDATAXX_MPI_THREAD_MODE "SERIALIZED" CACHE STRING "Selected MPI threading mode for libmpdata++ (MULTIPLE or SERIALIZED)" FORCE)
set(libmpdataxx_CXX_FLAGS_DEBUG "${libmpdataxx_CXX_FLAGS_DEBUG} -DLIBMPDATAXX_MPI_THREAD_SERIALIZED")
set(libmpdataxx_CXX_FLAGS_RELEASE "${libmpdataxx_CXX_FLAGS_RELEASE} -DLIBMPDATAXX_MPI_THREAD_SERIALIZED")

Copilot uses AI. Check for mistakes.
Comment on lines 104 to 110
// with distributed memory and cyclic boundary conditions,
// leftmost node must send left first, as
// rightmost node is waiting
if (d == 0 && this->mem->distmem.size() > 0 && this->mem->distmem.rank() == 0)
// if (d == 0 && this->mem->distmem.size() > 0 && this->mem->distmem.rank() == 0)
// with serialized MPI, matching left/right calls need to be done in order, doesn't hurt in multithreaded case as well
if (d == 0 && this->mem->distmem.size() > 0 && this->mem->distmem.rank() % 2 == 0)
std::swap(bcl, bcr);
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This commented-out previous condition plus the updated explanation is confusing and doesn’t match the new behavior (rank parity swap). Keeping dead code in comments makes it hard to tell what ordering is actually required (e.g., deadlock avoidance vs thread-serialized MPI).

Suggested fix: remove the commented-out if (...) rank() == 0 line and update the comment to precisely document why parity-based swapping is required and in which configurations (e.g., MPI with remote bconds, halo==1 send/recv patterns).

Copilot uses AI. Check for mistakes.
Comment on lines +78 to +83
for (auto &bc : this->bcs[0]) //bc->fill_halos_vctr_alng(arrvec, j, k, ad);
{
bc->single_threaded ? bc->fill_halos_vctr_alng(arrvec, j, k, ad) : bc->fill_halos_vctr_alng(arrvec, j, k, ad);
#if !defined(LIBMPDATAXX_MPI_THREAD_MULTIPLE)
if(this->mem->distmem.size() > 0) this->mem->barrier();
#endif
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The commented-out call (//bc->fill_halos_vctr_alng...) and the ternary that calls the same function in both branches add noise without changing behavior. This makes the intent of the change (ordering/barriers) harder to follow.

Suggested fix: remove the commented-out line and replace the redundant ternary with a single unconditional call inside the loop body.

Copilot uses AI. Check for mistakes.
@pdziekan pdziekan changed the title Mpi serialized option; TODO: add it to tests Mpi serialized option Apr 9, 2026
@pdziekan pdziekan merged commit 3e6eb3a into igfuw:master Apr 15, 2026
29 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants