fix(gpu): select one CDI GPU by default for Docker and Podman

## Description

Update Docker and Podman GPU sandbox defaults so `--gpu` prefers one CDI GPU device instead of defaulting to `nvidia.com/gpu=all`.

This is part of the GPU roadmap in #1444. `--gpu` means the active driver's default GPU behavior, and for GPU-enabled drivers that default should inject or allocate one suitable GPU when the runtime supports individual device selection.

## Context

Parent roadmap: #1444

Current local-container behavior maps a GPU request with no explicit `gpu_device` to `nvidia.com/gpu=all` through the shared CDI helper. That makes Docker and Podman inconsistent with Kubernetes and VM behavior, where a default GPU request maps to one GPU.

Docker has priority for implementation because OpenShell's Docker GPU path and CDI discovery are more mature today. Podman should be handled in the same task, but may require additional runtime support or an out-of-band CDI device discovery path. Upstream Podman behavior such as containers/podman#28712 may be relevant.

## Proposed Scope

- Define local-container default GPU selection semantics for Docker and Podman.
- Change Docker default `--gpu` behavior to prefer one CDI GPU device instead of `nvidia.com/gpu=all`.
- Change Podman default `--gpu` behavior to prefer one CDI GPU device instead of `nvidia.com/gpu=all`.
- Prefer runtime-reported CDI inventory when available.
- Preserve explicit `--gpu-device` behavior as a driver-native advanced option.
- Do not add multi-GPU count support in this task.
- Do not require OpenShell-managed GPU assignment/exclusivity tracking in this task.

## Target Behavior

Default GPU selection should use this order:

1. If the runtime reports individual CDI GPU devices, select one individual device.
2. If reliable CDI inventory is unavailable but individual device IDs are expected to work, fall back to `nvidia.com/gpu=0`.
3. If the runtime/platform only reports or supports `nvidia.com/gpu=all`, such as some WSL2-based setups, use `nvidia.com/gpu=all` as a compatibility fallback.

Additional behavior:

- `openshell sandbox create --gpu ...` on Docker injects one CDI GPU device when individual device selection is available.
- `openshell sandbox create --gpu ...` on Podman injects one CDI GPU device when individual device selection is available.
- `openshell sandbox create --gpu --gpu-device nvidia.com/gpu=0 ...` continues to pass the explicit CDI device ID through.
- The fallback to `nvidia.com/gpu=all` should be intentional and documented, not the default for platforms with individual device selection.
- Non-zero `gpu_count` remains unsupported unless a driver explicitly implements count-based allocation.

## Out of Scope

This task fixes default GPU device selection cardinality. It does not require OpenShell to track active GPU assignments or prevent two OpenShell sandboxes from selecting the same default GPU.

If multiple sandboxes are created concurrently, selecting the same default fallback device is acceptable until a separate allocation/exclusivity task is implemented.

## Open Questions

- Where should CDI inventory discovery live: shared OpenShell core helper, driver-specific code, or both?
- What should Podman use as the authoritative CDI device inventory source before runtime-level enumeration is reliable?
- Should assignment/exclusivity tracking be added later at the driver level or as part of a broader resource allocation model?

## Definition of Done

- [ ] Docker default `--gpu` prefers one individual CDI GPU device when available.
- [ ] Podman default `--gpu` prefers one individual CDI GPU device when available.
- [ ] If reliable CDI inventory is unavailable and individual IDs are expected to work, default selection falls back to `nvidia.com/gpu=0`.
- [ ] If individual selection is unavailable, `nvidia.com/gpu=all` remains available as a documented compatibility fallback.
- [ ] Explicit `--gpu-device` pass-through behavior is preserved for Docker and Podman.
- [ ] Tests cover individual-device default selection, fallback selection, and explicit device pass-through.
- [ ] Docs describe Docker/Podman default GPU behavior, compatibility fallback behavior, and `--gpu-device` as an advanced driver-native option.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(gpu): select one CDI GPU by default for Docker and Podman #1477

Description

Context

Proposed Scope

Target Behavior

Out of Scope

Open Questions

Definition of Done

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

fix(gpu): select one CDI GPU by default for Docker and Podman #1477

Description

Description

Context

Proposed Scope

Target Behavior

Out of Scope

Open Questions

Definition of Done

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions