-
Notifications
You must be signed in to change notification settings - Fork 219
ABACUS MD simulation with LCAO runs out of memory after ~37 steps #7209
Description
Details
[I am performing an MD simulation with LCAO of a silicate melt (SiO₂) system with ~324 atoms. The calculation is running on a single node with 128 CPU cores and 256 GB memory. The version is abacus-develop.
The simulation starts normally, but after about one hour (~37 MD steps), it crashes due to memory exhaustion.
Since the job runs fine at the beginning and only fails after several MD steps, it seems that memory usage is increasing during the simulation rather than being a static memory limitation.
I would like to ask:
Could this behavior be related to a compilation issue (e.g., MPI / ELPA / ScaLAPACK / memory handling)?
Is there any known issue of memory accumulation or memory leak in MD simulations (especially with LCAO)?
What is the recommended way to solve it in this case?]
Task list for Issue attackers (only for developers)
- Reproduce the performance issue on a similar system or environment.
- Identify the specific section of the code causing the performance issue.
- Investigate the issue and determine the root cause.
- Research best practices and potential solutions for the identified performance issue.
- Implement the chosen solution to address the performance issue.
- Test the implemented solution to ensure it improves performance without introducing new issues.
- Optimize the solution if necessary, considering trade-offs between performance and other factors (e.g., code complexity, readability, maintainability).
- Review and incorporate any relevant feedback from users or developers.
- Merge the improved solution into the main codebase and notify the issue reporter.