fix(sandbox): restore GPU procfs baseline by elezar · Pull Request #1522 · NVIDIA/OpenShell

elezar · 2026-05-22T13:47:40Z

Summary

Restore CUDA GPU startup compatibility by promoting /proc from
filesystem_policy.read_only to filesystem_policy.read_write when /proc
is part of the active GPU runtime baseline.

This keeps the change intentionally narrow. The existing baseline enrichment
already places /proc in the GPU read-write baseline because CUDA writes
/proc/<pid>/task/<tid>/comm during initialization. The missing behavior was
that an existing read-only /proc entry caused enrichment to skip the
read-write baseline path. This PR restores that promotion and emits an
informational log message when it happens.

Broader handling for user-supplied policy conflicts and explicit baseline
conflict controls is left to follow-up work such as #1629.

Related Issue

Fixes #1486

Related follow-up: #1629

Changes

Promote /proc from read_only to read_write when the GPU read-write
baseline requires it.
Preserve existing behavior for other read-only/read-write baseline conflicts.
Emit an informational log when /proc is promoted for GPU runtime
compatibility.
Add a regression test covering GPU baseline enrichment without network
policy.

Testing

mise exec -- cargo fmt --all
mise exec -- cargo test -p openshell-sandbox --lib baseline_tests -- --nocapture
mise run pre-commit completed Helm lint, Rust format, Rust check, Rust clippy, markdown lint, and license checks; python:proto failed in the parallel run because grpc_tools was missing after .venv recreation.
mise run python:proto

Checklist

Follows Conventional Commits
Commits are signed off (DCO)
Architecture/docs updated (not applicable for this minimal runtime fix)

github-actions · 2026-05-22T13:48:05Z

🌿 Preview your docs: https://nvidia-preview-pr-1522.docs.buildwithfern.com/openshell

pimlock

LGTM with a few nits and questions.

pimlock · 2026-05-29T03:30:52Z

+    fn allow_cuda_procfs_writes_allows_descendant_comm_write() {
+        if std::env::var_os(PROCFS_WRITE_HELPER_ENV).is_some() {
+            return;
+        }


From what I can tell the allow_cuda_procfs_writes_allows_descendant_comm_write launches the current test binary with PROCFS_WRITE_HELPER_ENV, and allow_cuda_procfs_writes_helper only runs real helper logic in that child process.

Could we add a short comment or rename the helper test to make that harness pattern clearer? I assume a subprocess is needed because Landlock restrictions are irreversible for the current process.

I'm going to remove the implicit broadening of the permissions from this PR and defer it to #1628. We can look at improving the readability of the tests there.

pimlock · 2026-05-29T03:58:47Z

            // TLS handshakes.
            grpc_retry("Policy discovery sync", || {
                grpc_client::discover_and_sync_policy(endpoint, id, sandbox, &discovered)
            })
            .await?
        };

        // Ensure baseline filesystem paths are present for proxy-mode
        // sandboxes.  If the policy was enriched, sync the updated version
        // back to the gateway so users can see the effective policy.
-        let enriched = enrich_proto_baseline_paths(&mut proto_policy);
+        let enriched = enrich_proto_baseline_paths(&mut proto_policy)?;
        if enriched
            && let Some(sandbox_name) = sandbox.as_deref()
            && let Err(e) = grpc_client::sync_policy(endpoint, sandbox_name, &proto_policy).await
        {
            warn!(
                error = %e,
                "Failed to sync enriched policy back to gateway (non-fatal)"
            );
        }


This is outside of changes here, but the comment in the BaselinePolicySource got me here (with how the enrich_proto_baseline_paths uses Custom).

I think the flow in this function could be improved:

line 2278, where we got policy from the server: lines 2313-2325 should likely live there, because that part is only applicable to when we get the policy from the gateway, as if we got the policy as default/from the image, we already enriched it, so there is no need to do that again?

unless there is a situation in which this is still useful, even after the policy was initially enriched? Maybe safer to leave it, but maybe worth determining the source in this function, e.g. having the if in 2277 to capture both the policy and source?

pimlock · 2026-05-29T04:27:03Z

+    let result = ruleset
+        .add_rule(PathBeneath::new(
+            path_fd,
+            make_bitflags!(AccessFs::{WriteFile}),


How much narrower is this WriteFile from a standard read_write access? I'm wondering mostly because with /proc being in added with read_write to the policy, the user analyzing what's allowed sees it in the policy. It's more risky, but explicit.

With this, the /proc is in the policy as read-only, but some narrower scope is allowed, but without any mention in the policy.

So it's between "wider, but explicit" vs "narrower, but implicit". I'm not sure what are the risks of the former, but I think that may be a better option. WDYT?

The read-write permissions that OpenShell applies are taken from AccessFs::from_all: https://docs.rs/landlock/0.4.4/src/landlock/fs.rs.html#104

This includes the from_read: https://docs.rs/landlock/0.4.4/src/landlock/fs.rs.html#116-118
And from_write: https://docs.rs/landlock/0.4.4/src/landlock/fs.rs.html#130-143 (which is ABI version dependent).

Assuming V1 this means that the additional permissions that promoting to read-write provide are:

WriteFile

RemoveDir

RemoveFile

MakeChar

MakeDir

MakeReg

MakeSock

MakeFifo

MakeBlock

MakeSym

I think that in the case of OpenShell, where auditing what is allowed/accessed is important, explicit behaviour should be favoured over implicit behaviour. Does this maybe mean that we should keep the broader behaviour for now (the first commit) and then look at how we can better surface the permissions that we are adding? Perhaps adding a finer granularity for the policies makes sense?

I have created #1628 to track looking into this properly instead of adding this as a "quick" fix on top of this PR.

elezar · 2026-06-01T19:29:39Z

Thanks for your initial review @pimlock. After the initial back and forth, I realised that there were a number of edge cases that I was not considering. I believe I was trying to detect user intent with insufficient signal and as such have updated this PR to ALWAYS promote /proc to read-write if GPUs are requested and instead capture explicit intent in #1629 as a follow-up. This PR would unblock the GPU-enabled tests, but I'm happy to continue iterating on it if required.

Signed-off-by: Evan Lezar <elezar@nvidia.com>

elezar requested review from a team, derekwaynecarr, maxamillion and mrunalp as code owners May 22, 2026 13:47

elezar mentioned this pull request May 22, 2026

fix(sandbox): decouple GPU baseline from network policy #1524

Merged

6 tasks

elezar changed the base branch from main to fix/1486-gpu-enrichment-no-network/elezar May 22, 2026 14:06

Base automatically changed from fix/1486-gpu-enrichment-no-network/elezar to main May 27, 2026 08:20

elezar force-pushed the fix/1486-gpu-sandbox-filesystem-policy/elezar branch from 96a1caa to 59e399a Compare May 27, 2026 09:02

elezar mentioned this pull request May 28, 2026

feat(gpu): derive sandbox access requirements from CDI specs #1606

Open

17 tasks

elezar force-pushed the fix/1486-gpu-sandbox-filesystem-policy/elezar branch from 12bde4d to d73e6de Compare May 28, 2026 19:22

pimlock reviewed May 29, 2026

View reviewed changes

This was referenced May 29, 2026

feat(sandbox): narrow GPU procfs permissions and surface runtime additions #1628

Open

feat(policy): add runtime baseline conflict controls #1629

Open

elezar force-pushed the fix/1486-gpu-sandbox-filesystem-policy/elezar branch 2 times, most recently from 2f3b5b2 to a0171ff Compare June 1, 2026 18:29

elezar changed the title ~~fix(sandbox): restore GPU filesystem baseline~~ fix(sandbox): restore GPU procfs baseline Jun 1, 2026

elezar requested a review from pimlock June 1, 2026 19:29

fix(sandbox): restore GPU procfs baseline

c828f23

Signed-off-by: Evan Lezar <elezar@nvidia.com>

elezar force-pushed the fix/1486-gpu-sandbox-filesystem-policy/elezar branch from a0171ff to c828f23 Compare June 2, 2026 08:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(sandbox): restore GPU procfs baseline#1522

fix(sandbox): restore GPU procfs baseline#1522
elezar wants to merge 1 commit into
mainfrom
fix/1486-gpu-sandbox-filesystem-policy/elezar

elezar commented May 22, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 22, 2026

Uh oh!

pimlock left a comment

Uh oh!

Uh oh!

pimlock May 29, 2026

Uh oh!

elezar May 29, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pimlock May 29, 2026

Uh oh!

pimlock May 29, 2026

Uh oh!

elezar May 29, 2026

Uh oh!

elezar May 29, 2026

Uh oh!

elezar commented Jun 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

elezar commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Related Issue

Changes

Testing

Checklist

Uh oh!

github-actions Bot commented May 22, 2026

Uh oh!

pimlock left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

pimlock May 29, 2026

Choose a reason for hiding this comment

Uh oh!

elezar May 29, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pimlock May 29, 2026

Choose a reason for hiding this comment

Uh oh!

pimlock May 29, 2026

Choose a reason for hiding this comment

Uh oh!

elezar May 29, 2026

Choose a reason for hiding this comment

Uh oh!

elezar May 29, 2026

Choose a reason for hiding this comment

Uh oh!

elezar commented Jun 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

elezar commented May 22, 2026 •

edited

Loading