Skip to content

Benchmark combining MPI and LPF either skips LPF run or returns error #62

@KADichev

Description

@KADichev

With attached files, the experiments with LPF do not work:

  • with lpf_hook, it seems like the peer list in src/MPI/ibverbs.cpp is empty and no communication is scheduled at all (lpf_put translates to NOP) - example synthetic4.cpp below
  • with lpf_exec, an error happens at runtime with the synthetic3.cpp:

build/_deps/lpf-src/build/bin/lpfrun -engine ibverbs -n 2 build/synthetic3 64 100
LPF Backend Error! lpf_exec( LPF_ROOT, LPF_MAX_P, &spmd, args)[srv01:716807] *** Process received signal ***
[srv01:716807] Signal: Aborted (6)
[srv01:716807] Signal code:  (-6)
[srv01:716807] [ 0] linux-vdso.so.1(__kernel_rt_sigreturn+0x0)[0x4000304ba7dc]
[srv01:716807] [ 1] /lib/aarch64-linux-gnu/libc.so.6(+0x7f1f0)[0x40003096f1f0]
[srv01:716807] [ 2] /lib/aarch64-linux-gnu/libc.so.6(raise+0x1c)[0x40003092a67c]
[srv01:716807] [ 3] /lib/aarch64-linux-gnu/libc.so.6(abort+0xe4)[0x400030917130]
[srv01:716807] [ 4] build/synthetic3(main+0x130)[0xaaaabdad7ab0]
[srv01:716807] [ 5] /lib/aarch64-linux-gnu/libc.so.6(+0x273fc)[0x4000309173fc]
[srv01:716807] [ 6] /lib/aarch64-linux-gnu/libc.so.6(__libc_start_main+0x98)[0x4000309174cc]
[srv01:716807] [ 7] build/synthetic3(_start+0x30)[0xaaaabdad6d30]
[srv01:716807] *** End of error message ***
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 0 on node srv01 exited on signal 6 (Aborted).


Compilation as follows:
lpfcxx -engine ibverbs synthetic3.cpp -o synthetic3

Execution as follows:
lpfrun -engine ibverbs synthetic3 64 100

synthetic3.cpp
synthetic4.cpp

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingquestionFurther information is requested

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions