check process binding#312
Conversation
|
I think I found a problem. [satishk@tcn3 ~]$ hwloc-calc -p -H package.numanode core:0-5
Package:0.NUMANode:0
[satishk@tcn3 ~]$ module unload hwloc/2.9.1-GCCcore-12.3.0
[satishk@tcn3 ~]$ module load hwloc/2.8.0-GCCcore-12.2.0
The following have been reloaded with a version change:
1) GCCcore/12.3.0 => GCCcore/12.2.0 2) libpciaccess/0.17-GCCcore-12.3.0 => libpciaccess/0.17-GCCcore-12.2.0 3) libxml2/2.11.4-GCCcore-12.3.0 => libxml2/2.10.3-GCCcore-12.2.0 4) numactl/2.0.16-GCCcore-12.3.0 => numactl/2.0.16-GCCcore-12.2.0
[satishk@tcn3 ~]$ hwloc-calc -p -H package.numanode core:0-5
unsupported (non-normal) --hierarchical type numanodeSomewhere between |
|
good catch. |
|
@satishskamath fallback added. i also added a check for the number of nodes. |
casparvl
left a comment
There was a problem hiding this comment.
Since this requires hwloc-calc, we should check that this is only run if it's available. And skip/print warning if it isn't available.
|
Checked OpenFOAM script. Output in rfm.err: |
|
@satishskamath is there nothing in the rfm.err file? |
|
@smoors Sorry, something went wrong during copy paste. I think this can be merged. |
|
But another concern which probably can be addressed in another PR, is the time required to do this check. Right now it is default for all tests, I assume, may be for large scaled tests (> 4 nodes), does the overhead become large? |
|
Last check, OpenFOAM all tests properly generated: |
this runs a short test in a prerun cmd to get the process binding, which is checked with the
check_process_binding.pyscript. the results are written into the job error file.fixes #307
Important
the test currently doesn't fail on binding error, as we don't yet have a bullet-proof solution for setting the binding in all cases (see also the discussion in #305). so, for now, both the errors and warnings are printed as warnings on screen, adding sanity checks can be added in a follow-up PR.
example output:
Note
i managed to get the correct launcher run command by updating the job resources in the
assign_tasks_per_compute_unitfunction. this also allowed simplifying the openfoam test and make it more robust.