perf(sandbox): streaming SHA256 and spawn_blocking for identity resolution#555
perf(sandbox): streaming SHA256 and spawn_blocking for identity resolution#555koiker wants to merge 1 commit intoNVIDIA:mainfrom
Conversation
Key changes: - Replace full file read + SHA256 with streaming 64KB-buffered hash (saves 124MB allocation for node binary) - Wrap evaluate_opa_tcp in spawn_blocking to prevent blocking tokio runtime during heavy /proc I/O and SHA256 computation - Add file-based perf logging for profiling proxy latency phases Profiling data (node binary, 124MB): - Cold TOFU: ~890ms (read+hash), warm: 0ms (cache hit) - evaluate_opa_tcp: cold=1002ms, warm=11ms - OPA evaluation: 1ms - DNS+TCP connect: 166-437ms Made-with: Cursor
|
All contributors have signed the DCO ✍️ ✅ |
|
I have read the DCO document and I hereby sign the DCO. |
|
recheck |
johntmyers
left a comment
There was a problem hiding this comment.
Is the perf_log something we need to actually ship or was this more for valiadting the improvements?
Good question! The That said, I found it valuable during profiling and left it in intentionally. It makes it easy to spot exactly which phase of proxy request handling is taking time (identity resolution, TOFU hashing, OPA evaluation, DNS+TCP connect) without needing to attach external tools like Happy to remove it if you'd prefer to keep the diff focused on the functional changes, or I can gate it behind a feature flag/compile-time option if that's a better fit. Please let me know your preference. |
Summary
The proxy's TOFU (Trust-On-First-Use) identity resolution performs synchronous
/procscanning and SHA256 hashing of binaries on every cold-cache network request. For large binaries like Node.js (~124 MB), this blocks the async Tokio runtime for nearly a second — stalling all concurrent connections — and allocates the entire file contents in memory just to hash them.This PR fixes both issues:
std::fs::read()+Sha256::digest()(which loads the full binary into RAM) with a 64 KB buffered streaming read+hash loop. For a 124 MB binary this eliminates a 124 MB heap allocation per cold-cache check.spawn_blockingwrapper: Wrap the entireevaluate_opa_tcp()call intokio::task::spawn_blocking()in bothhandle_tcp_connectionandhandle_forward_proxy. The identity resolution does heavy synchronous I/O (/procscanning, file hashing) that must not run on the async executor.perf_log()helper that writes timestamped phase timings to/var/log/openshell-perf.log(or/tmp), providing visibility into proxy latency without depending on thetracingpipeline.Context
Commit
f88aecf("avoid repeated TOFU rehashing for unchanged binaries") added fingerprint-based caching that made the warm path fast (0 ms TOFU, 11 ms totalevaluate_opa_tcp). However:/procI/O run synchronously on the Tokio runtime, blocking all other connections during the ~1 s cold-cache window.Profiling Data (Node.js binary, 124 MB)
file_sha256evaluate_opa_tcptotalFiles Changed
crates/openshell-sandbox/src/procfs.rs— streaming SHA256 infile_sha256(), phase timing inresolve_tcp_peer_identity()andfind_pid_by_socket_inode()crates/openshell-sandbox/src/proxy.rs—spawn_blockingwrapper aroundevaluate_opa_tcp()in both call sites, phase timing throughoutcrates/openshell-sandbox/src/identity.rs— phase timing inverify_or_cache()Test Plan
cargo build --releasesucceeds (cross-compiled forx86_64-unknown-linux-gnu)curlandnoderequests through the proxycargo test -p openshell-sandbox(existing identity and procfs tests)Signed-off-by: Rafael Koike koike.rafael@gmail.com
Made with Cursor