-
Notifications
You must be signed in to change notification settings - Fork 466
Open
Labels
featureissue/PR that proposes a new feature or functionalityissue/PR that proposes a new feature or functionalitylifecycle/frozenneeds-triageissue or PR has not been assigned a priority-px labelissue or PR has not been assigned a priority-px label
Description
Description
I am trying to deploy the NVIDIA GPU Operator on a kubernetes cluster where the worker nodes are running Debian 13 (Trixie).
Currently, the GPU Operator attempts to pull the driver image with the tag 580.105.08-debian13, but this image does not exist in the nvcr.io/nvidia/driver repository, leading to an ImagePullBackOff error.
Steps to Reproduce (if applicable)
- Install GPU Operator on a node running Debian 13.
- The operator automatically detects the OS and tries to pull
nvcr.io/nvidia/driver:<version>-debian13. - Pod fails with:
failed to resolve image: nvcr.io/nvidia/driver:580.105.08-debian13: not found.
Environment details
- GPU Operator Version: v25.10.1
- Kubernetes Version: v1.34.3
- Node OS: Debian 13 (Trixie)
- GPU Model: PNY Quadro P4000
Possible Workaround / Proposal
- Provide official
debian13based driver images innvcr.io. - In the meantime, is there a recommended fallback (e.g., using
ubuntu22.04ordebian12images) that is known to be compatible with Debian 13 hosts?
# k8s-worker1
lspci -nnk | grep -A3 NVIDIA
01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP104GL [Quadro P4000] [10de:1bb1] (rev a1)
Subsystem: NVIDIA Corporation Device [10de:11a3]
Kernel modules: nouveau
01:00.1 Audio device [0403]: NVIDIA Corporation GP104 High Definition Audio Controller [10de:10f0] (rev a1)
Subsystem: NVIDIA Corporation Device [10de:11a3]
Kernel modules: snd_hda_intel
# k8s-master1
$ k get po -n gpu-operator
NAME READY STATUS RESTARTS AGE
gpu-feature-discovery-l8lp4 0/1 Init:0/1 0 10m
gpu-operator-7569f8b499-7pz7w 1/1 Running 0 10m
gpu-operator-node-feature-discovery-gc-55ffc49ccc-5s5vz 1/1 Running 0 10m
gpu-operator-node-feature-discovery-master-6b5787f695-cqdtd 1/1 Running 0 10m
gpu-operator-node-feature-discovery-worker-27pcx 1/1 Running 0 10m
gpu-operator-node-feature-discovery-worker-gx6dr 1/1 Running 0 10m
gpu-operator-node-feature-discovery-worker-rpzfz 1/1 Running 0 10m
gpu-operator-node-feature-discovery-worker-v28zx 1/1 Running 0 10m
nvidia-container-toolkit-daemonset-spmwx 0/1 Init:0/1 0 10m
nvidia-dcgm-exporter-f9wnw 0/1 Init:0/1 0 10m
nvidia-device-plugin-daemonset-9mhc2 0/1 Init:0/1 0 10m
nvidia-driver-daemonset-5psh9 0/1 ImagePullBackOff 0 10m
nvidia-operator-validator-7h4b7 0/1 Init:0/4 0 10m
$
$
$ k describe pod -n gpu-operator nvidia-driver-daemonset-5psh9
Name: nvidia-driver-daemonset-5psh9
Namespace: gpu-operator
Priority: 2000001000
Priority Class Name: system-node-critical
Service Account: nvidia-driver
Node: k8s-worker1/192.168.100.201
Start Time: Wed, 18 Mar 2026 11:48:27 +0900
Labels: app=nvidia-driver-daemonset
app.kubernetes.io/component=nvidia-driver
app.kubernetes.io/managed-by=gpu-operator
controller-revision-hash=7488457bc7
helm.sh/chart=gpu-operator-v25.10.1
nvidia.com/precompiled=false
pod-template-generation=1
Annotations: k8s.v1.cni.cncf.io/network-status:
[{
"name": "cilium",
"interface": "eth0",
"ips": [
"10.210.3.178"
],
"default": true,
"dns": {},
"gateway": [
"10.210.3.137"
]
}]
kubectl.kubernetes.io/default-container: nvidia-driver-ctr
Status: Pending
IP: 10.210.3.178
IPs:
IP: 10.210.3.178
Controlled By: DaemonSet/nvidia-driver-daemonset
Init Containers:
k8s-driver-manager:
Container ID: containerd://cbd2bbea9e025e29161c6aee776260cd820540a668302de1473372460544cea0
Image: nvcr.io/nvidia/cloud-native/k8s-driver-manager:v0.9.1
Image ID: nvcr.io/nvidia/cloud-native/k8s-driver-manager@sha256:c549346eb993fda62e9bf665aabaacc88abc06b0b24e69635427d4d71c2d5ed4
Port: <none>
Host Port: <none>
Command:
driver-manager
Args:
uninstall_driver
State: Terminated
Reason: Completed
Exit Code: 0
Started: Wed, 18 Mar 2026 11:48:34 +0900
Finished: Wed, 18 Mar 2026 11:48:41 +0900
Ready: True
Restart Count: 0
Environment:
NODE_NAME: (v1:spec.nodeName)
NVIDIA_VISIBLE_DEVICES: void
ENABLE_GPU_POD_EVICTION: true
ENABLE_AUTO_DRAIN: false
DRAIN_USE_FORCE: false
DRAIN_POD_SELECTOR_LABEL:
DRAIN_TIMEOUT_SECONDS: 0s
DRAIN_DELETE_EMPTYDIR_DATA: false
OPERATOR_NAMESPACE: gpu-operator (v1:metadata.namespace)
Mounts:
/host from host-root (ro)
/run/mellanox/drivers from run-mellanox-drivers (rw)
/run/nvidia from run-nvidia (rw)
/sys from host-sys (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-pkczh (ro)
Containers:
nvidia-driver-ctr:
Container ID:
Image: nvcr.io/nvidia/driver:580.105.08-debian13
Image ID:
Port: <none>
Host Port: <none>
Command:
nvidia-driver
Args:
init
State: Waiting
Reason: ErrImagePull
Ready: False
Restart Count: 0
Startup: exec [sh -c [ -f /sys/module/nvidia/refcnt ] && nvidia-smi && touch /run/nvidia/validations/.driver-ctr-ready] delay=60s timeout=60s period=10s #success=1 #failure=120
Environment:
NODE_NAME: (v1:spec.nodeName)
NODE_IP: (v1:status.hostIP)
KERNEL_MODULE_TYPE: auto
Mounts:
/dev/log from dev-log (rw)
/host-etc/os-release from host-os-release (ro)
/lib/firmware from nv-firmware (rw)
/run/mellanox/drivers from run-mellanox-drivers (rw)
/run/mellanox/drivers/usr/src from mlnx-ofed-usr-src (rw)
/run/nvidia from run-nvidia (rw)
/run/nvidia-fabricmanager from run-nvidia-fabricmanager (rw)
/run/nvidia-topologyd from run-nvidia-topologyd (rw)
/sys/devices/system/memory/auto_online_blocks from sysfs-memory-online (rw)
/sys/module/firmware_class/parameters/path from firmware-search-path (rw)
/var/log from var-log (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-pkczh (ro)
Conditions:
Type Status
PodReadyToStartContainers True
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
run-nvidia:
Type: HostPath (bare host directory volume)
Path: /run/nvidia
HostPathType: DirectoryOrCreate
var-log:
Type: HostPath (bare host directory volume)
Path: /var/log
HostPathType:
dev-log:
Type: HostPath (bare host directory volume)
Path: /dev/log
HostPathType:
host-os-release:
Type: HostPath (bare host directory volume)
Path: /etc/os-release
HostPathType:
run-nvidia-fabricmanager:
Type: HostPath (bare host directory volume)
Path: /run/nvidia-fabricmanager
HostPathType: DirectoryOrCreate
run-nvidia-topologyd:
Type: HostPath (bare host directory volume)
Path: /run/nvidia-topologyd
HostPathType: DirectoryOrCreate
mlnx-ofed-usr-src:
Type: HostPath (bare host directory volume)
Path: /run/mellanox/drivers/usr/src
HostPathType: DirectoryOrCreate
run-mellanox-drivers:
Type: HostPath (bare host directory volume)
Path: /run/mellanox/drivers
HostPathType: DirectoryOrCreate
run-nvidia-validations:
Type: HostPath (bare host directory volume)
Path: /run/nvidia/validations
HostPathType: DirectoryOrCreate
host-root:
Type: HostPath (bare host directory volume)
Path: /
HostPathType:
host-sys:
Type: HostPath (bare host directory volume)
Path: /sys
HostPathType: Directory
firmware-search-path:
Type: HostPath (bare host directory volume)
Path: /sys/module/firmware_class/parameters/path
HostPathType:
sysfs-memory-online:
Type: HostPath (bare host directory volume)
Path: /sys/devices/system/memory/auto_online_blocks
HostPathType:
nv-firmware:
Type: HostPath (bare host directory volume)
Path: /run/nvidia/driver/lib/firmware
HostPathType: DirectoryOrCreate
kube-api-access-pkczh:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
Optional: false
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: nvidia.com/gpu.deploy.driver=true
Tolerations: node.kubernetes.io/disk-pressure:NoSchedule op=Exists
node.kubernetes.io/memory-pressure:NoSchedule op=Exists
node.kubernetes.io/not-ready:NoExecute op=Exists
node.kubernetes.io/pid-pressure:NoSchedule op=Exists
node.kubernetes.io/unreachable:NoExecute op=Exists
node.kubernetes.io/unschedulable:NoSchedule op=Exists
nvidia.com/gpu:NoSchedule op=Exists
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 11m default-scheduler Successfully assigned gpu-operator/nvidia-driver-daemonset-5psh9 to k8s-worker1
Normal AddedInterface 11m multus Add eth0 [10.210.3.178/32] from cilium
Normal Pulling 11m kubelet Pulling image "nvcr.io/nvidia/cloud-native/k8s-driver-manager:v0.9.1"
Normal Pulled 11m kubelet Successfully pulled image "nvcr.io/nvidia/cloud-native/k8s-driver-manager:v0.9.1" in 4.177s (4.177s including waiting). Image size: 35134858 bytes.
Normal Created 11m kubelet Created container: k8s-driver-manager
Normal Started 11m kubelet Started container k8s-driver-manager
Normal Pulling 8m26s (x5 over 11m) kubelet Pulling image "nvcr.io/nvidia/driver:580.105.08-debian13"
Warning Failed 8m25s (x5 over 11m) kubelet Failed to pull image "nvcr.io/nvidia/driver:580.105.08-debian13": rpc error: code = NotFound desc = failed to pull and unpack image "nvcr.io/nvidia/driver:580.105.08-debian13": failed to resolve image: nvcr.io/nvidia/driver:580.105.08-debian13: not found
Warning Failed 8m25s (x5 over 11m) kubelet Error: ErrImagePull
Normal BackOff 92s (x41 over 10m) kubelet Back-off pulling image "nvcr.io/nvidia/driver:580.105.08-debian13"
Warning Failed 77s (x42 over 10m) kubelet Error: ImagePullBackOffReactions are currently unavailable
Metadata
Metadata
Assignees
Labels
featureissue/PR that proposes a new feature or functionalityissue/PR that proposes a new feature or functionalitylifecycle/frozenneeds-triageissue or PR has not been assigned a priority-px labelissue or PR has not been assigned a priority-px label