runsc: serialize RDMA and PCI sysfs data at boot#13114
Draft
atoniolo76 wants to merge 1 commit into
Draft
Conversation
|
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
dad6dcd to
44bb161
Compare
44bb161 to
97a0121
Compare
97a0121 to
a778e60
Compare
trantoji
reviewed
May 19, 2026
Contributor
There was a problem hiding this comment.
Hey @atoniolo76,
This PR is serializing and virtualizing three things: PCIe topology, NUMA layout, and other RDMA info. Are you open to splitting this PR into three?
What are your thoughts on the following structure:
- Infra to serialize and virtualize the minimal set of sysfs nodes and gate it (via flag) to get
ib_write_cuda_bwto work. (i'm assuming this tool isdumband does not need NUMA and PCIe topology information, we can simply target a net interface and CUDA device). - Serialize and virtualize the PCIe topology and NUMA layout and keep the code vendor agnostic. Applications need the PCIe topology (bridges, NICs, and accelerator locations) so that threads can initiate data flows that take the fast path (PCIe P2P). They need the NUMA layout to optimize execution, i.e pin their control threads to the local CPU socket to avoid copies traversing the CPU socket interconnect.
- Infra to add vendor specific quirks to the virtualized sysfs to get NCCL and CX device working.
Splitting it this way separates the generic hardware platform from the vendor-specific middleware heuristics.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Snapshot host RDMA/InfiniBand and PCI device sysfs attributes before pivot_root (while the host sysfs is still accessible), serialize them as JSON, and reconstruct them as virtual kernfs entries inside the sentry. This gives NCCL and libibverbs the topology information they need for device discovery and PCI distance computation without granting the sandbox access to the real host sysfs.
Key components: