The MPIPatternP2P has a bug in one of the MPI_Isend calls (
|
err = MPIIsend<MemorySpace::HOST>(&numGhostIndicesInProc, |
). The issue is that
numGhostIndicesInProc is a local to the for-loop and goes out of scope before the MPI_Isend completes the handshake with the target processor. We lucked out in most cases because it's just one integer value being sent, and hence, the handshake probably happened before
numGhostIndicesInProc went out of scope. But in some corner cases, it causes an MPI_Waitall error, where the run is stuck indefinitely as the MPI_Isend is unable to complete the handshake. The fix is easy: just replace
numGhostIndicesInProc with
d_numGhostIndicesInGhostProcs[iGhostProc]. This bug might have affected some crashes we observed in DFT-FE as well.
The MPIPatternP2P has a bug in one of the MPI_Isend calls (
dft-efe/src/utils/MPIPatternP2P.t.cpp
Line 1131 in c5060be
numGhostIndicesInProcis a local to the for-loop and goes out of scope before the MPI_Isend completes the handshake with the target processor. We lucked out in most cases because it's just one integer value being sent, and hence, the handshake probably happened beforenumGhostIndicesInProcwent out of scope. But in some corner cases, it causes an MPI_Waitall error, where the run is stuck indefinitely as the MPI_Isend is unable to complete the handshake. The fix is easy: just replacenumGhostIndicesInProcwithd_numGhostIndicesInGhostProcs[iGhostProc]. This bug might have affected some crashes we observed in DFT-FE as well.