Skip to content

Fix: add runtime check for cudaHostRegister in EXX PW#7157

Merged
mohanchen merged 5 commits intodeepmodeling:developfrom
Flying-dragon-boxing:fix/exx_pw_cuda_host_register
Mar 28, 2026
Merged

Fix: add runtime check for cudaHostRegister in EXX PW#7157
mohanchen merged 5 commits intodeepmodeling:developfrom
Flying-dragon-boxing:fix/exx_pw_cuda_host_register

Conversation

@Flying-dragon-boxing
Copy link
Copy Markdown
Collaborator

Add runtime check (PARAM.inp.device == "gpu") before calling cudaHostRegister and cudaHostUnregister in EXX PW calculations. This fixes #7119.
When ABACUS is compiled with CUDA enabled but runs on CPU devices (e.g., on clusters with both CPU and GPU nodes but only compiling once), cudaHostRegister fails because there's no GPU available. This causes EXX PW calculations to crash on CPU-only nodes even when compiled with CUDA support.

Add runtime check (PARAM.inp.device == "gpu") before calling
cudaHostRegister and cudaHostUnregister to avoid failures when
CUDA is enabled but running on CPU device (e.g., on clusters with
both CPU and GPU nodes but only compiling once).
Copilot AI review requested due to automatic review settings March 28, 2026 03:02
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a runtime guard to prevent CUDA host-memory registration calls from running during EXX PW calculations when the user has selected CPU execution, addressing crashes seen on CPU-only nodes in CUDA-enabled builds (issue #7119).

Changes:

  • Wrap cudaHostRegister with if (PARAM.inp.device == "gpu") in get_exx_potential.
  • Wrap cudaHostUnregister with if (PARAM.inp.device == "gpu") in get_exx_potential.
  • Apply the same guards in get_exx_stress_potential.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Flying-dragon-boxing and others added 2 commits March 28, 2026 11:43
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@Flying-dragon-boxing Flying-dragon-boxing marked this pull request as ready for review March 28, 2026 05:18
@Flying-dragon-boxing Flying-dragon-boxing marked this pull request as draft March 28, 2026 05:54
@Flying-dragon-boxing Flying-dragon-boxing marked this pull request as ready for review March 28, 2026 07:53
@Flying-dragon-boxing
Copy link
Copy Markdown
Collaborator Author

Plz merge this first and then #7166.

Copy link
Copy Markdown
Collaborator

@mohanchen mohanchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mohanchen mohanchen merged commit e3eee6f into deepmodeling:develop Mar 28, 2026
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ABACUS compiled with GPU support crashes when running EXX PW

3 participants