Describe the bug
During the test of issue #4058, I find results of different parallel settings are totally different for same INPUT:
OMP_NUM_THREADS=1 mpirun -np 16 abacus | tee out.log
OMP_NUM_THREADS=2 mpirun -np 16 abacus | tee out.log
OMP_NUM_THREADS=2 mpirun -np 8 abacus | tee out.log
OMP_NUM_THREADS=4 mpirun -np 4 abacus | tee out.log

see more in link
Expected behavior
No response
To Reproduce
No response
Environment
No response
Additional Context
No response
Task list for Issue attackers (only for developers)
Describe the bug
During the test of issue #4058, I find results of different parallel settings are totally different for same INPUT:
see more in link
Expected behavior
No response
To Reproduce
No response
Environment
No response
Additional Context
No response
Task list for Issue attackers (only for developers)