Skip to content

disable pinned host memory for pod#318

Merged
AtlantaPepsi merged 2 commits into
ROCm:candidatefrom
AtlantaPepsi:egm_disable
Jun 1, 2026
Merged

disable pinned host memory for pod#318
AtlantaPepsi merged 2 commits into
ROCm:candidatefrom
AtlantaPepsi:egm_disable

Conversation

@AtlantaPepsi
Copy link
Copy Markdown
Contributor

Motivation

Disable pinned host memory for cross pod access.

Technical Details

Fabric handle export and exchange is not well supported or thoroughly tested. For current version (1.67), a cross pod transfer involving GPU executor will cleanly error out.

Test Plan

Test Result

Submission Checklist

@AtlantaPepsi AtlantaPepsi requested a review from a team as a code owner June 1, 2026 18:34
Copilot AI review requested due to automatic review settings June 1, 2026 18:34
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds upfront validation to reject cross-rank (“pod”) transfers that would require a GPU executor to access remote host (pinned) memory via HIP/CUDA fabric-handle exchange, which is currently unsupported and can lead to crashes.

Changes:

  • Add config validation that errors out when a GPU executor is used for a cross-rank transfer involving remote CPU/host memory.
  • Remove stale commented-out GetMemLocation calls in the pod-communication allocation path.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/header/TransferBench.hpp
@AtlantaPepsi AtlantaPepsi merged commit 26c9cf8 into ROCm:candidate Jun 1, 2026
9 of 10 checks passed
@AtlantaPepsi AtlantaPepsi deleted the egm_disable branch June 1, 2026 19:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants