Skip to content

[Feature]: Support for Debian 13 (Trixie) #2225

@pys0530

Description

@pys0530

Description

I am trying to deploy the NVIDIA GPU Operator on a kubernetes cluster where the worker nodes are running Debian 13 (Trixie).

Currently, the GPU Operator attempts to pull the driver image with the tag 580.105.08-debian13, but this image does not exist in the nvcr.io/nvidia/driver repository, leading to an ImagePullBackOff error.

Steps to Reproduce (if applicable)

  1. Install GPU Operator on a node running Debian 13.
  2. The operator automatically detects the OS and tries to pull nvcr.io/nvidia/driver:<version>-debian13.
  3. Pod fails with: failed to resolve image: nvcr.io/nvidia/driver:580.105.08-debian13: not found.

Environment details

  • GPU Operator Version: v25.10.1
  • Kubernetes Version: v1.34.3
  • Node OS: Debian 13 (Trixie)
  • GPU Model: PNY Quadro P4000

Possible Workaround / Proposal

  • Provide official debian13 based driver images in nvcr.io.
  • In the meantime, is there a recommended fallback (e.g., using ubuntu22.04 or debian12 images) that is known to be compatible with Debian 13 hosts?

# k8s-worker1
lspci -nnk | grep -A3 NVIDIA
01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP104GL [Quadro P4000] [10de:1bb1] (rev a1)
        Subsystem: NVIDIA Corporation Device [10de:11a3]
        Kernel modules: nouveau
01:00.1 Audio device [0403]: NVIDIA Corporation GP104 High Definition Audio Controller [10de:10f0] (rev a1)
        Subsystem: NVIDIA Corporation Device [10de:11a3]
        Kernel modules: snd_hda_intel

# k8s-master1
$ k get po -n gpu-operator 
NAME                                                          READY   STATUS             RESTARTS   AGE
gpu-feature-discovery-l8lp4                                   0/1     Init:0/1           0          10m
gpu-operator-7569f8b499-7pz7w                                 1/1     Running            0          10m
gpu-operator-node-feature-discovery-gc-55ffc49ccc-5s5vz       1/1     Running            0          10m
gpu-operator-node-feature-discovery-master-6b5787f695-cqdtd   1/1     Running            0          10m
gpu-operator-node-feature-discovery-worker-27pcx              1/1     Running            0          10m
gpu-operator-node-feature-discovery-worker-gx6dr              1/1     Running            0          10m
gpu-operator-node-feature-discovery-worker-rpzfz              1/1     Running            0          10m
gpu-operator-node-feature-discovery-worker-v28zx              1/1     Running            0          10m
nvidia-container-toolkit-daemonset-spmwx                      0/1     Init:0/1           0          10m
nvidia-dcgm-exporter-f9wnw                                    0/1     Init:0/1           0          10m
nvidia-device-plugin-daemonset-9mhc2                          0/1     Init:0/1           0          10m
nvidia-driver-daemonset-5psh9                                 0/1     ImagePullBackOff   0          10m
nvidia-operator-validator-7h4b7                               0/1     Init:0/4           0          10m
$
$
$ k describe pod -n gpu-operator nvidia-driver-daemonset-5psh9 
Name:                 nvidia-driver-daemonset-5psh9
Namespace:            gpu-operator
Priority:             2000001000
Priority Class Name:  system-node-critical
Service Account:      nvidia-driver
Node:                 k8s-worker1/192.168.100.201
Start Time:           Wed, 18 Mar 2026 11:48:27 +0900
Labels:               app=nvidia-driver-daemonset
                      app.kubernetes.io/component=nvidia-driver
                      app.kubernetes.io/managed-by=gpu-operator
                      controller-revision-hash=7488457bc7
                      helm.sh/chart=gpu-operator-v25.10.1
                      nvidia.com/precompiled=false
                      pod-template-generation=1
Annotations:          k8s.v1.cni.cncf.io/network-status:
                        [{
                            "name": "cilium",
                            "interface": "eth0",
                            "ips": [
                                "10.210.3.178"
                            ],
                            "default": true,
                            "dns": {},
                            "gateway": [
                                "10.210.3.137"
                            ]
                        }]
                      kubectl.kubernetes.io/default-container: nvidia-driver-ctr
Status:               Pending
IP:                   10.210.3.178
IPs:
  IP:           10.210.3.178
Controlled By:  DaemonSet/nvidia-driver-daemonset
Init Containers:
  k8s-driver-manager:
    Container ID:  containerd://cbd2bbea9e025e29161c6aee776260cd820540a668302de1473372460544cea0
    Image:         nvcr.io/nvidia/cloud-native/k8s-driver-manager:v0.9.1
    Image ID:      nvcr.io/nvidia/cloud-native/k8s-driver-manager@sha256:c549346eb993fda62e9bf665aabaacc88abc06b0b24e69635427d4d71c2d5ed4
    Port:          <none>
    Host Port:     <none>
    Command:
      driver-manager
    Args:
      uninstall_driver
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Wed, 18 Mar 2026 11:48:34 +0900
      Finished:     Wed, 18 Mar 2026 11:48:41 +0900
    Ready:          True
    Restart Count:  0
    Environment:
      NODE_NAME:                    (v1:spec.nodeName)
      NVIDIA_VISIBLE_DEVICES:      void
      ENABLE_GPU_POD_EVICTION:     true
      ENABLE_AUTO_DRAIN:           false
      DRAIN_USE_FORCE:             false
      DRAIN_POD_SELECTOR_LABEL:    
      DRAIN_TIMEOUT_SECONDS:       0s
      DRAIN_DELETE_EMPTYDIR_DATA:  false
      OPERATOR_NAMESPACE:          gpu-operator (v1:metadata.namespace)
    Mounts:
      /host from host-root (ro)
      /run/mellanox/drivers from run-mellanox-drivers (rw)
      /run/nvidia from run-nvidia (rw)
      /sys from host-sys (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-pkczh (ro)
Containers:
  nvidia-driver-ctr:
    Container ID:  
    Image:         nvcr.io/nvidia/driver:580.105.08-debian13
    Image ID:      
    Port:          <none>
    Host Port:     <none>
    Command:
      nvidia-driver
    Args:
      init
    State:          Waiting
      Reason:       ErrImagePull
    Ready:          False
    Restart Count:  0
    Startup:        exec [sh -c [ -f /sys/module/nvidia/refcnt ] && nvidia-smi && touch /run/nvidia/validations/.driver-ctr-ready] delay=60s timeout=60s period=10s #success=1 #failure=120
    Environment:
      NODE_NAME:            (v1:spec.nodeName)
      NODE_IP:              (v1:status.hostIP)
      KERNEL_MODULE_TYPE:  auto
    Mounts:
      /dev/log from dev-log (rw)
      /host-etc/os-release from host-os-release (ro)
      /lib/firmware from nv-firmware (rw)
      /run/mellanox/drivers from run-mellanox-drivers (rw)
      /run/mellanox/drivers/usr/src from mlnx-ofed-usr-src (rw)
      /run/nvidia from run-nvidia (rw)
      /run/nvidia-fabricmanager from run-nvidia-fabricmanager (rw)
      /run/nvidia-topologyd from run-nvidia-topologyd (rw)
      /sys/devices/system/memory/auto_online_blocks from sysfs-memory-online (rw)
      /sys/module/firmware_class/parameters/path from firmware-search-path (rw)
      /var/log from var-log (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-pkczh (ro)
Conditions:
  Type                        Status
  PodReadyToStartContainers   True 
  Initialized                 True 
  Ready                       False 
  ContainersReady             False 
  PodScheduled                True 
Volumes:
  run-nvidia:
    Type:          HostPath (bare host directory volume)
    Path:          /run/nvidia
    HostPathType:  DirectoryOrCreate
  var-log:
    Type:          HostPath (bare host directory volume)
    Path:          /var/log
    HostPathType:  
  dev-log:
    Type:          HostPath (bare host directory volume)
    Path:          /dev/log
    HostPathType:  
  host-os-release:
    Type:          HostPath (bare host directory volume)
    Path:          /etc/os-release
    HostPathType:  
  run-nvidia-fabricmanager:
    Type:          HostPath (bare host directory volume)
    Path:          /run/nvidia-fabricmanager
    HostPathType:  DirectoryOrCreate
  run-nvidia-topologyd:
    Type:          HostPath (bare host directory volume)
    Path:          /run/nvidia-topologyd
    HostPathType:  DirectoryOrCreate
  mlnx-ofed-usr-src:
    Type:          HostPath (bare host directory volume)
    Path:          /run/mellanox/drivers/usr/src
    HostPathType:  DirectoryOrCreate
  run-mellanox-drivers:
    Type:          HostPath (bare host directory volume)
    Path:          /run/mellanox/drivers
    HostPathType:  DirectoryOrCreate
  run-nvidia-validations:
    Type:          HostPath (bare host directory volume)
    Path:          /run/nvidia/validations
    HostPathType:  DirectoryOrCreate
  host-root:
    Type:          HostPath (bare host directory volume)
    Path:          /
    HostPathType:  
  host-sys:
    Type:          HostPath (bare host directory volume)
    Path:          /sys
    HostPathType:  Directory
  firmware-search-path:
    Type:          HostPath (bare host directory volume)
    Path:          /sys/module/firmware_class/parameters/path
    HostPathType:  
  sysfs-memory-online:
    Type:          HostPath (bare host directory volume)
    Path:          /sys/devices/system/memory/auto_online_blocks
    HostPathType:  
  nv-firmware:
    Type:          HostPath (bare host directory volume)
    Path:          /run/nvidia/driver/lib/firmware
    HostPathType:  DirectoryOrCreate
  kube-api-access-pkczh:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    Optional:                false
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              nvidia.com/gpu.deploy.driver=true
Tolerations:                 node.kubernetes.io/disk-pressure:NoSchedule op=Exists
                             node.kubernetes.io/memory-pressure:NoSchedule op=Exists
                             node.kubernetes.io/not-ready:NoExecute op=Exists
                             node.kubernetes.io/pid-pressure:NoSchedule op=Exists
                             node.kubernetes.io/unreachable:NoExecute op=Exists
                             node.kubernetes.io/unschedulable:NoSchedule op=Exists
                             nvidia.com/gpu:NoSchedule op=Exists
Events:
  Type     Reason          Age                  From               Message
  ----     ------          ----                 ----               -------
  Normal   Scheduled       11m                  default-scheduler  Successfully assigned gpu-operator/nvidia-driver-daemonset-5psh9 to k8s-worker1
  Normal   AddedInterface  11m                  multus             Add eth0 [10.210.3.178/32] from cilium
  Normal   Pulling         11m                  kubelet            Pulling image "nvcr.io/nvidia/cloud-native/k8s-driver-manager:v0.9.1"
  Normal   Pulled          11m                  kubelet            Successfully pulled image "nvcr.io/nvidia/cloud-native/k8s-driver-manager:v0.9.1" in 4.177s (4.177s including waiting). Image size: 35134858 bytes.
  Normal   Created         11m                  kubelet            Created container: k8s-driver-manager
  Normal   Started         11m                  kubelet            Started container k8s-driver-manager
  Normal   Pulling         8m26s (x5 over 11m)  kubelet            Pulling image "nvcr.io/nvidia/driver:580.105.08-debian13"
  Warning  Failed          8m25s (x5 over 11m)  kubelet            Failed to pull image "nvcr.io/nvidia/driver:580.105.08-debian13": rpc error: code = NotFound desc = failed to pull and unpack image "nvcr.io/nvidia/driver:580.105.08-debian13": failed to resolve image: nvcr.io/nvidia/driver:580.105.08-debian13: not found
  Warning  Failed          8m25s (x5 over 11m)  kubelet            Error: ErrImagePull
  Normal   BackOff         92s (x41 over 10m)   kubelet            Back-off pulling image "nvcr.io/nvidia/driver:580.105.08-debian13"
  Warning  Failed          77s (x42 over 10m)   kubelet            Error: ImagePullBackOff

Metadata

Metadata

Assignees

No one assigned

    Labels

    featureissue/PR that proposes a new feature or functionalitylifecycle/frozenneeds-triageissue or PR has not been assigned a priority-px label

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions