perf(pm): handle warm registry cache hits in scheduler#2966
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a synchronous cache lookup mechanism, registry_cache_lookup_sync, to the installation scheduler to optimize performance for cache hits by bypassing the download queue. The changes include the implementation of the sync lookup function and updates to the scheduler's logic and tests. Feedback suggests that performing synchronous I/O within the main scheduler loop may block the async executor, potentially leading to performance degradation on slow filesystems, and recommends considering spawn_blocking or a batching mechanism.
| if let Some(cache_path) = registry_cache_lookup_sync(&package.name, &package.version) { | ||
| self.download_done.insert(key, cache_path.clone()); | ||
| if let Some(spec) = waiter { | ||
| self.clone_queue.push_back(ReadyClone { spec, cache_path }); | ||
| } | ||
| return; | ||
| } |
There was a problem hiding this comment.
Performing synchronous I/O (registry_cache_lookup_sync) inside the main scheduler loop can block the async executor. While this is intended as a performance optimization to avoid task spawning overhead for cache hits, it introduces a risk of head-of-line blocking for the entire installation process if the filesystem is slow (e.g., network drives or high I/O wait). Since the scheduler is responsible for pumping all downloads and clones, any delay here affects overall throughput. Consider if this trade-off is acceptable for all supported environments, or if a small batching mechanism or spawn_blocking should be used for the probe.
📊 pm-bench-phases ·
|
| PM | wall | ±σ | user | sys | RSS | pgMinor |
|---|---|---|---|---|---|---|
| bun | 9.27s | 0.15s | 10.64s | 10.32s | 689M | 326.7K |
| utoo-next | 8.08s | 0.18s | 10.94s | 12.35s | 928M | 120.9K |
| utoo-npm | 8.09s | 0.22s | 11.02s | 12.25s | 978M | 123.4K |
| utoo | 7.90s | 0.23s | 11.89s | 12.52s | 943M | 150.6K |
| PM | vCtx | iCtx | netRX | netTX | cache | node_mod | lock |
|---|---|---|---|---|---|---|---|
| bun | 17.3K | 19.7K | 1.20G | 7M | 1.88G | 1.76G | 1M |
| utoo-next | 127.2K | 85.6K | 1.17G | 5M | 1.73G | 1.72G | 2M |
| utoo-npm | 123.5K | 90.9K | 1.17G | 5M | 1.73G | 1.72G | 2M |
| utoo | 123.0K | 96.5K | 1.17G | 6M | 1.72G | 1.72G | 2M |
p1_resolve
| PM | wall | ±σ | user | sys | RSS | pgMinor |
|---|---|---|---|---|---|---|
| bun | 2.13s | 0.06s | 4.29s | 1.15s | 533M | 169.0K |
| utoo-next | 3.15s | 0.11s | 5.50s | 2.28s | 625M | 84.6K |
| utoo-npm | 3.23s | 0.06s | 5.54s | 2.32s | 616M | 88.5K |
| utoo | 2.53s | 0.07s | 6.21s | 1.75s | 656M | 124.9K |
| PM | vCtx | iCtx | netRX | netTX | cache | node_mod | lock |
|---|---|---|---|---|---|---|---|
| bun | 9.4K | 4.9K | 203M | 3M | 108M | - | 1M |
| utoo-next | 76.5K | 94.1K | 201M | 3M | 7M | 3M | 2M |
| utoo-npm | 77.2K | 93.6K | 201M | 3M | 7M | 3M | 2M |
| utoo | 15.4K | 20.5K | 204M | 3M | 7M | 3M | 2M |
p3_cold_install
| PM | wall | ±σ | user | sys | RSS | pgMinor |
|---|---|---|---|---|---|---|
| bun | 6.93s | 0.22s | 6.46s | 9.98s | 626M | 210.1K |
| utoo-next | 7.28s | 2.23s | 5.57s | 11.20s | 522M | 62.1K |
| utoo-npm | 7.36s | 2.09s | 5.56s | 11.06s | 460M | 61.3K |
| utoo | 6.76s | 1.69s | 5.28s | 10.92s | 476M | 57.3K |
| PM | vCtx | iCtx | netRX | netTX | cache | node_mod | lock |
|---|---|---|---|---|---|---|---|
| bun | 5.4K | 7.2K | 1.00G | 4M | 1.77G | 1.77G | 1M |
| utoo-next | 123.9K | 50.8K | 1001M | 3M | 1.72G | 1.72G | 3M |
| utoo-npm | 113.1K | 49.8K | 1000M | 3M | 1.72G | 1.72G | 3M |
| utoo | 123.5K | 80.8K | 1000M | 3M | 1.72G | 1.72G | 3M |
p4_warm_link
| PM | wall | ±σ | user | sys | RSS | pgMinor |
|---|---|---|---|---|---|---|
| bun | 3.51s | 0.03s | 0.19s | 2.48s | 134M | 32.7K |
| utoo-next | 2.56s | 0.44s | 0.53s | 3.94s | 79M | 18.6K |
| utoo-npm | 2.48s | 0.07s | 0.53s | 3.84s | 80M | 18.5K |
| utoo | 2.43s | 0.15s | 0.51s | 3.90s | 62M | 14.3K |
| PM | vCtx | iCtx | netRX | netTX | cache | node_mod | lock |
|---|---|---|---|---|---|---|---|
| bun | 266 | 25 | 5M | 9K | 1.93G | 1.74G | 1M |
| utoo-next | 42.1K | 19.0K | 12K | 22K | 1.72G | 1.72G | 2M |
| utoo-npm | 40.1K | 19.1K | 15K | 9K | 1.72G | 1.72G | 2M |
| utoo | 50.6K | 24.0K | 13K | 24K | 1.73G | 1.72G | 2M |
npmmirror.com: no output captured.
GHA run 1 readRun: https://github.com/utooland/utoo/actions/runs/26015795276
Conclusion: do not fold this as-is. Moving only the registry cache-hit probe into the scheduler does not reduce p4 scheduling cost; it likely just shifts the hot path while seeded-cache probe and clone worker costs remain dominant. |
|
Closing this PM performance experiment after the investigation phase. The benchmark data and conclusions are preserved in the PR body/comments; we will split the validated pieces into smaller reviewable PRs for the formal ship path. |
What
AB experiment for p4 warm-link ctx: let the install scheduler handle registry cache hits synchronously in the main loop before enqueueing download work.
Why
In p4 the lockfile and registry cache are already warm, but every registry package still enters a worker task just to check
<cache>/<name>/<version>/_resolved. This keeps the scheduler state centralized while removing one per-package tokio worker from the all-cache-hit path.Notes
_resolvedmarker check and only updates scheduler-owned state.react@18.2.0in cache.Validation
cargo fmtcargo test -p utoo-pmcargo clippy -p utoo-pm --all-targets -- -D warnings --no-depsBench plan
Trigger linux/npmjs phase bench and compare mainly: