-
Notifications
You must be signed in to change notification settings - Fork 515
Open
Description
I receive the following error:
nvidia-smi not installed
/etc/google-dataproc/startup-scripts/dataproc-initialization-script-0: line 293: is_cudnn8: command not found
Files removed: 308 (2317.2 MB)
Writing to /root/.config/pip/pip.conf
gpg: keybox '/usr/share/keyrings/adoptium.gpg' created
gpg: directory '/root/.gnupg' created
gpg: /root/.gnupg/trustdb.gpg: trustdb created
gpg: key 843C48A565F8F04B: public key "Adoptium GPG Key (DEB/RPM Signing Key) <temurin-dev@eclipse.org>" imported
gpg: Total number processed: 1
gpg: imported: 1
gpg: key C0BA5CE6DC6315A3: public key "Artifact Registry Repository Signer <artifact-registry-repository-signer@google.com>" imported
gpg: Total number processed: 1
gpg: imported: 1
gpg: keybox '/usr/share/keyrings/docker-keyring.gpg' created
gpg: key 8D81803C0EBFCD88: public key "Docker Release (CE deb) <docker@docker.com>" imported
gpg: Total number processed: 1
gpg: imported: 1
/etc/apt/sources.list.d/google-cloud.list
gpg: keybox '/usr/share/keyrings/cloud.google.gpg' created
gpg: key C0BA5CE6DC6315A3: public key "Artifact Registry Repository Signer <artifact-registry-repository-signer@google.com>" imported
gpg: Total number processed: 1
gpg: imported: 1
Reading package lists...
Building dependency tree...
Reading state information...
0 upgraded, 0 newly installed, 0 to remove and 31 not upgraded.
Canceled hold on systemd.
Canceled hold on libsystemd0.
real 0m8.038s
user 0m4.309s
sys 0m1.458s
nvidia-smi not installed
acl:
- entity: project-owners-****************
projectTeam:
projectNumber: '****************'
team: owners
role: OWNER
- entity: project-editors-****************
projectTeam:
projectNumber: '****************'
team: editors
role: OWNER
- entity: project-viewers-****************
projectTeam:
projectNumber: '****************'
team: viewers
role: READER
- email: ****************-compute@developer.gserviceaccount.com
entity: user-****************-compute@developer.gserviceaccount.com
role: OWNER
bucket: dataproc-temp-europe-west1-****************-jupf0b8s
component_count: 7
content_type: application/octet-stream
crc32c_hash: pOhoiw==
creation_time: 2025-04-22T07:38:19+0000
etag: CPzuiYyR64wDEAE=
generation: '1745307499132796'
metageneration: 1
name: dpgce-packages/nvidia/NVIDIA-Linux-x86_64-550.142.run
size: 307296728
storage_class: STANDARD
storage_class_update_time: 2025-04-22T07:38:19+0000
storage_url: gs://dataproc-temp-europe-west1-****************-jupf0b8s/dpgce-packages/nvidia/NVIDIA-Linux-x86_64-550.142.run#1745307499132796
update_time: 2025-04-22T07:38:19+0000
Copying gs://dataproc-temp-europe-west1-****************-jupf0b8s/dpgce-packages/nvidia/NVIDIA-Linux-x86_64-550.142.run to file:///mnt/shm/userspace.run
....
Average throughput: 691.9MiB/s
real 0m2.103s
user 0m2.386s
sys 0m1.030s
real 0m20.568s
user 0m5.847s
sys 0m5.105s
/opt/install-dpgce /
acl:
- entity: project-owners-****************
projectTeam:
projectNumber: '****************'
team: owners
role: OWNER
- entity: project-editors-****************
projectTeam:
projectNumber: '****************'
team: editors
role: OWNER
- entity: project-viewers-****************
projectTeam:
projectNumber: '****************'
team: viewers
role: READER
- email: ****************-compute@developer.gserviceaccount.com
entity: user-****************-compute@developer.gserviceaccount.com
role: OWNER
bucket: dataproc-temp-europe-west1-****************-jupf0b8s
content_type: application/x-tar
crc32c_hash: hUkg3A==
creation_time: 2025-04-22T07:40:18+0000
etag: CKHpisWR64wDEAE=
generation: '1745307618686113'
md5_hash: u5lHOXdDD2qH/CYP1wGVxw==
metageneration: 1
name: dpgce-packages/nvidia/kmod/debian12/6.1.0-32-cloud-amd64/unsigned/kmod_debian12_550.142.tar.gz
size: 25508565
storage_class: STANDARD
storage_class_update_time: 2025-04-22T07:40:18+0000
storage_url: gs://dataproc-temp-europe-west1-****************-jupf0b8s/dpgce-packages/nvidia/kmod/debian12/6.1.0-32-cloud-amd64/unsigned/kmod_debian12_550.142.tar.gz#1745307618686113
update_time: 2025-04-22T07:40:18+0000
cache hit
opt/install-dpgce/open-gpu-kernel-modules/kernel-open/build.log
opt/install-dpgce/open-gpu-kernel-modules/kernel-open/build_error.log
lib/modules/6.1.0-32-cloud-amd64/kernel/drivers/video/nvidia-uvm.ko
lib/modules/6.1.0-32-cloud-amd64/kernel/drivers/video/nvidia-drm.ko
lib/modules/6.1.0-32-cloud-amd64/kernel/drivers/video/nvidia-peermem.ko
lib/modules/6.1.0-32-cloud-amd64/kernel/drivers/video/nvidia-modeset.ko
lib/modules/6.1.0-32-cloud-amd64/kernel/drivers/video/nvidia.ko
/
NVIDIA GPU driver provided by NVIDIA was installed successfully
acl:
- entity: project-owners-****************
projectTeam:
projectNumber: '****************'
team: owners
role: OWNER
- entity: project-editors-****************
projectTeam:
projectNumber: '****************'
team: editors
role: OWNER
- entity: project-viewers-****************
projectTeam:
projectNumber: '****************'
team: viewers
role: READER
- email: ****************-compute@developer.gserviceaccount.com
entity: user-****************-compute@developer.gserviceaccount.com
role: OWNER
bucket: dataproc-temp-europe-west1-****************-jupf0b8s
component_count: 32
content_type: application/octet-stream
crc32c_hash: ROiILQ==
creation_time: 2025-04-22T07:54:44+0000
etag: CL+w9OGU64wDEAE=
generation: '1745308484442175'
metageneration: 1
name: dpgce-packages/nvidia/cuda_12.6.3_560.35.05_linux.run
size: 4446722669
storage_class: STANDARD
storage_class_update_time: 2025-04-22T07:54:44+0000
storage_url: gs://dataproc-temp-europe-west1-****************-jupf0b8s/dpgce-packages/nvidia/cuda_12.6.3_560.35.05_linux.run#1745308484442175
update_time: 2025-04-22T07:54:44+0000
Copying gs://dataproc-temp-europe-west1-****************-jupf0b8s/dpgce-packages/nvidia/cuda_12.6.3_560.35.05_linux.run to file:///mnt/shm/cuda.run
.................
Average throughput: 1.3GiB/s
real 0m4.643s
user 0m16.077s
sys 0m20.338s
real 2m39.479s
user 2m19.921s
sys 0m48.780s
Selecting previously unselected package cuda-keyring.
(Reading database ... 166259 files and directories currently installed.)
Preparing to unpack /mnt/shm/cuda-keyring.deb ...
Unpacking cuda-keyring (1.1-1) ...
Setting up cuda-keyring (1.1-1) ...
unable to rmmod nvidia_uvm
unable to rmmod nvidia_drm
unable to rmmod nvidia_modeset
unable to rmmod nvidia
/opt/install-dpgce /
ERROR: (gcloud.storage.objects.describe) gs://dataproc-temp-europe-west1-****************-jupf0b8s/dpgce-packages%2Fnvidia%2Fnccl%2Fdebian12%2Fnccl-build_debian12_2.23.4-1%2Bcuda12.6.tar.gz not found: 404.
Copying file:///opt/install-dpgce/nccl-build_debian12_2.23.4-1+cuda12.6.tar.gz.building to gs://dataproc-temp-europe-west1-****************-jupf0b8s/dpgce-packages/nvidia/nccl/debian12/nccl-build_debian12_2.23.4-1+cuda12.6.tar.gz.building
/opt/install-dpgce/nccl /opt/install-dpgce /
real 0m57.433s
user 0m14.465s
sys 0m5.950s
It cannot find gs://dataproc-temp-europe-west1-****************-jupf0b8s/dpgce-packages%2Fnvidia%2Fnccl%2Fdebian12%2Fnccl-build_debian12_2.23.4-1%2Bcuda12.6.tar.gz file however that file exists in the storage.
Metadata
Metadata
Assignees
Labels
No labels
