perf(sandbox): streaming SHA256 and spawn_blocking for identity resolution by koiker · Pull Request #555 · NVIDIA/OpenShell

koiker · 2026-03-23T19:51:16Z

Replaces #553 (auto-closed by vouch gate)

Summary

The proxy's TOFU (Trust-On-First-Use) identity resolution performs synchronous /proc scanning and SHA256 hashing of binaries on every cold-cache network request. For large binaries like Node.js (~124 MB), this blocks the async Tokio runtime for nearly a second — stalling all concurrent connections — and allocates the entire file contents in memory just to hash them.

This PR fixes both issues:

Streaming SHA256: Replace std::fs::read() + Sha256::digest() (which loads the full binary into RAM) with a 64 KB buffered streaming read+hash loop. For a 124 MB binary this eliminates a 124 MB heap allocation per cold-cache check.
spawn_blocking wrapper: Wrap the entire evaluate_opa_tcp() call in tokio::task::spawn_blocking() in both handle_tcp_connection and handle_forward_proxy. The identity resolution does heavy synchronous I/O (/proc scanning, file hashing) that must not run on the async executor.
Profiling instrumentation: Add a lightweight file-based perf_log() helper that writes timestamped phase timings to /var/log/openshell-perf.log (or /tmp), providing visibility into proxy latency without depending on the tracing pipeline.

Context

Commit f88aecf ("avoid repeated TOFU rehashing for unchanged binaries") added fingerprint-based caching that made the warm path fast (0 ms TOFU, 11 ms total evaluate_opa_tcp). However:

The cold path still reads the entire binary into memory before hashing — a 124 MB allocation for Node.js.
The hashing and /proc I/O run synchronously on the Tokio runtime, blocking all other connections during the ~1 s cold-cache window.

Profiling Data (Node.js binary, 124 MB)

Phase	Cold cache	Warm cache
`file_sha256`	~890 ms	0 ms (fingerprint hit)
`evaluate_opa_tcp` total	1002 ms	11 ms
OPA evaluation	1 ms	1 ms
DNS + TCP connect	166–437 ms	166–437 ms

Files Changed

crates/openshell-sandbox/src/procfs.rs — streaming SHA256 in file_sha256(), phase timing in resolve_tcp_peer_identity() and find_pid_by_socket_inode()
crates/openshell-sandbox/src/proxy.rs — spawn_blocking wrapper around evaluate_opa_tcp() in both call sites, phase timing throughout
crates/openshell-sandbox/src/identity.rs — phase timing in verify_or_cache()

Test Plan

cargo build --release succeeds (cross-compiled for x86_64-unknown-linux-gnu)
Deployed to live NemoClaw sandbox and verified with curl and node requests through the proxy
Cold-cache: ~1 s total (dominated by SHA256 of 124 MB binary, now non-blocking)
Warm-cache: 11 ms total (fingerprint cache hit, unchanged from baseline)
No functional regressions — policy allow/deny decisions unchanged
cargo test -p openshell-sandbox (existing identity and procfs tests)

Signed-off-by: Rafael Koike koike.rafael@gmail.com

Made with Cursor

Key changes: - Replace full file read + SHA256 with streaming 64KB-buffered hash (saves 124MB allocation for node binary) - Wrap evaluate_opa_tcp in spawn_blocking to prevent blocking tokio runtime during heavy /proc I/O and SHA256 computation - Add file-based perf logging for profiling proxy latency phases Profiling data (node binary, 124MB): - Cold TOFU: ~890ms (read+hash), warm: 0ms (cache hit) - evaluate_opa_tcp: cold=1002ms, warm=11ms - OPA evaluation: 1ms - DNS+TCP connect: 166-437ms Made-with: Cursor

github-actions · 2026-03-23T19:51:26Z

All contributors have signed the DCO ✍️ ✅
_{Posted by the DCO Assistant Lite bot.}

koiker · 2026-03-23T19:51:28Z

I have read the DCO document and I hereby sign the DCO.

koiker · 2026-03-23T19:52:11Z

recheck

johntmyers

Is the perf_log something we need to actually ship or was this more for valiadting the improvements?

koiker · 2026-03-24T13:21:18Z

Is the perf_log something we need to actually ship or was this more for valiadting the improvements?

Good question! The perf_log instrumentation isn't required for the two core improvements (streaming SHA256 and spawn_blocking) — those work independently.

That said, I found it valuable during profiling and left it in intentionally. It makes it easy to spot exactly which phase of proxy request handling is taking time (identity resolution, TOFU hashing, OPA evaluation, DNS+TCP connect) without needing to attach external tools like strace or modify the tracing pipeline. The output goes to /var/log/openshell-perf.log and is lightweight (one line per phase).

Happy to remove it if you'd prefer to keep the diff focused on the functional changes, or I can gate it behind a feature flag/compile-time option if that's a better fit. Please let me know your preference.

koiker requested a review from a team as a code owner March 23, 2026 19:51

johntmyers reviewed Mar 23, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(sandbox): streaming SHA256 and spawn_blocking for identity resolution#555

perf(sandbox): streaming SHA256 and spawn_blocking for identity resolution#555
koiker wants to merge 1 commit intoNVIDIA:mainfrom
koiker:perf/tracing-instrumentation

koiker commented Mar 23, 2026

Uh oh!

github-actions bot commented Mar 23, 2026 •

edited

Loading

Uh oh!

koiker commented Mar 23, 2026

Uh oh!

koiker commented Mar 23, 2026

Uh oh!

johntmyers left a comment

Uh oh!

koiker commented Mar 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

koiker commented Mar 23, 2026

Summary

Context

Profiling Data (Node.js binary, 124 MB)

Files Changed

Test Plan

Uh oh!

github-actions bot commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

koiker commented Mar 23, 2026

Uh oh!

koiker commented Mar 23, 2026

Uh oh!

johntmyers left a comment

Choose a reason for hiding this comment

Uh oh!

koiker commented Mar 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions bot commented Mar 23, 2026 •

edited

Loading