fix: retain GPU library paths when symlinks resolve outside mountpoint#165
Open
blinkagent[bot] wants to merge 1 commit intomainfrom
Open
fix: retain GPU library paths when symlinks resolve outside mountpoint#165blinkagent[bot] wants to merge 1 commit intomainfrom
blinkagent[bot] wants to merge 1 commit intomainfrom
Conversation
When the host uses /usr/lib64 (common on RHEL/Amazon Linux) and it is mounted into the outer container at /var/coder/usr/lib, symlinks inside the directory may use absolute paths referencing the original host path (e.g. /usr/lib64/libnvidia-ml.so.545.23.08). The recursiveSymlinks function would discard the entire symlink chain (including the original file within the mountpoint) when it encountered a target outside the mountpoint, returning nil. This changes the behavior to return all paths collected so far within the mountpoint instead of discarding them. The GPU libraries are still valid bind mount sources at their paths within the mountpoint. Fixes #164
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
When the host uses
/usr/lib64(common on RHEL/Amazon Linux) and it is mounted into the outer container at/var/coder/usr/lib, theusrLibGPUs()function fails to detect NVIDIA libraries.The root cause is in
recursiveSymlinks(): on these systems, symlinks inside/usr/lib64use absolute paths referencing the original host path (e.g.,libnvidia-ml.so.1 -> /usr/lib64/libnvidia-ml.so.545.23.08). When the directory is mounted at/var/coder/usr/lib, these absolute symlink targets don't start with the mountpoint prefix. The function previously returnednilwhen it encountered such a target, discarding the entire symlink chain including the original file within the mountpoint.This meant
CODER_ADD_GPU=truewould pass through/dev/nvidia*devices but mount zero NVIDIA libraries, causingnvidia-smito fail in the inner container.Fix
Changed
recursiveSymlinks()tobreakout of the loop instead ofreturn nil, nilwhen a symlink target resolves outside the mountpoint. This retains all paths collected so far within the mountpoint, which are still valid bind mount sources.The change is a single line:
return nil, nil→break.Test
Added
TestGPUs_UsrLib64Symlinkswhich creates a real filesystem layout simulating the/usr/lib64scenario:.sofileVerifies all three paths are detected as GPU bind mounts.
Fixes #164