Skip to content

Amazon linux support#127

Closed
shivakunv wants to merge 1 commit intomainfrom
amazonlinuxsupport
Closed

Amazon linux support#127
shivakunv wants to merge 1 commit intomainfrom
amazonlinuxsupport

Conversation

@shivakunv
Copy link
Copy Markdown
Contributor

No description provided.

@shivakunv shivakunv self-assigned this Sep 26, 2024
@shivakunv shivakunv force-pushed the amazonlinuxsupport branch 2 times, most recently from f6096c6 to 57fdf0b Compare October 4, 2024 06:46
Comment thread .github/workflows/ci.yaml Outdated
@shivakunv shivakunv marked this pull request as ready for review October 7, 2024 16:31
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Oct 26, 2024

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@shivakunv shivakunv force-pushed the amazonlinuxsupport branch 2 times, most recently from dedd781 to 31a2314 Compare October 26, 2024 18:35
@shivakunv shivakunv force-pushed the amazonlinuxsupport branch 2 times, most recently from 6d553ac to bc5998b Compare October 26, 2024 19:16
@shivakunv shivakunv force-pushed the amazonlinuxsupport branch 3 times, most recently from 426a2cf to 49b283c Compare October 28, 2024 16:51
Comment thread amzn2023/Dockerfile
Comment thread amzn2023/Dockerfile Outdated
Comment thread amzn2023/install.sh Outdated
Comment thread .common-ci.yml Outdated
Comment thread amzn2023/Dockerfile Outdated
Comment thread amzn2023/Dockerfile Outdated
Comment thread amzn2023/Dockerfile
Comment thread amzn2023/Dockerfile Outdated
Comment thread amzn2023/Dockerfile Outdated
Comment thread amzn2023/empty
Comment thread amzn2023/Dockerfile Outdated
Comment thread amzn2023/Dockerfile Outdated
Comment thread amzn2023/install.sh Outdated
Comment thread amzn2023/Dockerfile Outdated
Comment thread amzn2023/Dockerfile Outdated
Comment thread amzn2023/Dockerfile Outdated
Comment thread amzn2023/Dockerfile Outdated
Comment thread amzn2023/install.sh Outdated
Comment thread amzn2023/install.sh Outdated
Comment thread Makefile Outdated
Comment thread Makefile Outdated
Comment thread Makefile Outdated
Comment thread Makefile Outdated
Comment thread amzn2023/nvidia-driver Outdated
Comment thread amzn2023/nvidia-driver Outdated
@shivakunv shivakunv force-pushed the amazonlinuxsupport branch 2 times, most recently from eb0282a to c46bcaf Compare October 28, 2024 19:27
Comment thread .github/workflows/image.yaml Outdated
@shivakunv shivakunv force-pushed the amazonlinuxsupport branch 2 times, most recently from 3d7b0bb to d179a93 Compare October 29, 2024 15:38
Comment thread versions.mk Outdated
@shivakunv shivakunv force-pushed the amazonlinuxsupport branch 2 times, most recently from 74b2554 to 539af6c Compare October 30, 2024 20:24
@shivakunv
Copy link
Copy Markdown
Contributor Author

@cdesiniotis and @tariq1890 PTAL

Comment thread .nvidia-ci.yml Outdated
Comment thread Makefile Outdated
Comment thread amzn2023/Dockerfile Outdated
# due to cuda repo cache issue , nvidia-fabric-manager refers to 565 version only
# install fabric-manager and nvidia-nscq
RUN if [ "$DRIVER_TYPE" != "vgpu" ] && [ "$TARGETARCH" != "arm64" ]; then \
dnf install -y nvidia-fabric-manager libnvidia-nscq-${DRIVER_BRANCH}-${DRIVER_VERSION}-1; fi
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't commit this as-is. I just looked at the packages uploaded here, so the following should work

Suggested change
dnf install -y nvidia-fabric-manager libnvidia-nscq-${DRIVER_BRANCH}-${DRIVER_VERSION}-1; fi
dnf install -y nvidia-fabricmanager-${DRIVER_BRANCH}-${DRIVER_VERSION}-1 libnvidia-nscq-${DRIVER_BRANCH}-${DRIVER_VERSION}-1; fi

Note that you have nvidia-fabric-manager, when it should be nvidia-fabricmanager*

Copy link
Copy Markdown
Contributor Author

@shivakunv shivakunv Oct 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed 560.35.03 . 560.35.03 fabric manager not available

dnf list *fabric* 
24.23 cuda-drivers-fabricmanager.x86_64     565.57.01-1           cuda-amzn2023-x86_64
24.23 cuda-drivers-fabricmanager-555.x86_64 555.42.06-1           cuda-amzn2023-x86_64
24.23 cuda-drivers-fabricmanager-560.x86_64 560.35.03-1           cuda-amzn2023-x86_64
24.23 cuda-drivers-fabricmanager-565.x86_64 565.57.01-1           cuda-amzn2023-x86_64
24.23 libfabric.x86_64                      1.14.0-2.amzn2023.0.2 amazonlinux
24.23 libfabric-devel.x86_64                1.14.0-2.amzn2023.0.2 amazonlinux
24.23 nvidia-fabric-manager.x86_64          565.57.01-1           cuda-amzn2023-x86_64
24.23 nvidia-fabric-manager-devel.x86_64    565.57.01-1           cuda-amzn2023-x86_64

added conditional check for both the packages and installation.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed if condition and added installation of nvidia-fabric-manager-${DRIVER_VERSION}-1

Comment thread amzn2023/Dockerfile Outdated
Comment thread amzn2023/Dockerfile Outdated
@shivakunv shivakunv force-pushed the amazonlinuxsupport branch 4 times, most recently from 400da5c to ed626af Compare October 31, 2024 07:15
@shivakunv
Copy link
Copy Markdown
Contributor Author

PTAL @cdesiniotis @tariq1890

Comment thread amzn2023/Dockerfile Outdated
# Initialize the fabric manager package variable
FABRIC_PACKAGE=""; \
if dnf list nvidia-fabric-manager-${DRIVER_VERSION}-1 &>/dev/null; then \
FABRIC_PACKAGE="nvidia-fabric-manager-${DRIVER_VERSION}-1"; \
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at https://developer.download.nvidia.com/compute/cuda/repos/amzn2023/x86_64/, the only fabric manager packages available are named nvidia-fabric-manager-${DRIVER_VERSION}-1. Let's remove the conditional here and always use that package name. If the name ever changes, our builds will break and we will know right away.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed if condition and added installation of nvidia-fabric-manager-${DRIVER_VERSION}-1

Comment thread amzn2023/nvidia-driver
Comment on lines +402 to +406
if [ -f /sys/module/nvidia_fs/refcnt ]; then
nvidia_fs_refs=$(< /sys/module/nvidia_fs/refcnt)
rmmod_args+=("nvidia-fs")
((++nvidia_deps))
fi
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change required? We have a separate sidecar container for loading / unloading nvidia-fs.

Signed-off-by: shiva kumar <shivaku@nvidia.com>
@rahulait
Copy link
Copy Markdown
Contributor

@shivakunv do we need these changes? Or we are fine with closing this PR if these are no longer needed?

@shivakunv
Copy link
Copy Markdown
Contributor Author

Closing this since the release to amzn2023 is not planned.

@shivakunv shivakunv closed this Mar 24, 2026
@shivakunv shivakunv deleted the amazonlinuxsupport branch March 24, 2026 12:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants