Skip to content

perf(pm): handle warm registry cache hits in scheduler#2966

Closed
elrrrrrrr wants to merge 1 commit into
perf/pm-resolver-demand-bfsfrom
exp/pm-install-sync-cache-hit-b33d922
Closed

perf(pm): handle warm registry cache hits in scheduler#2966
elrrrrrrr wants to merge 1 commit into
perf/pm-resolver-demand-bfsfrom
exp/pm-install-sync-cache-hit-b33d922

Conversation

@elrrrrrrr
Copy link
Copy Markdown
Contributor

What

AB experiment for p4 warm-link ctx: let the install scheduler handle registry cache hits synchronously in the main loop before enqueueing download work.

Why

In p4 the lockfile and registry cache are already warm, but every registry package still enters a worker task just to check <cache>/<name>/<version>/_resolved. This keeps the scheduler state centralized while removing one per-package tokio worker from the all-cache-hit path.

Notes

  • Cache miss behavior is unchanged: misses still enqueue download work and use the existing async downloader.
  • The sync probe is limited to the cheap _resolved marker check and only updates scheduler-owned state.
  • Adjusted the scheduler dedupe unit test to avoid depending on whether the local machine happens to have react@18.2.0 in cache.

Validation

  • cargo fmt
  • cargo test -p utoo-pm
  • cargo clippy -p utoo-pm --all-targets -- -D warnings --no-deps

Bench plan

Trigger linux/npmjs phase bench and compare mainly:

phase expected signal
p3 cold install should be neutral; cache misses still download through the same queue
p4 warm link target phase; expect lower ctx from skipping per-package cache-hit worker tasks

@elrrrrrrr elrrrrrrr added A-Pkg Manager Area: Package Manager benchmark Run pm-bench on PR labels May 18, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a synchronous cache lookup mechanism, registry_cache_lookup_sync, to the installation scheduler to optimize performance for cache hits by bypassing the download queue. The changes include the implementation of the sync lookup function and updates to the scheduler's logic and tests. Feedback suggests that performing synchronous I/O within the main scheduler loop may block the async executor, potentially leading to performance degradation on slow filesystems, and recommends considering spawn_blocking or a batching mechanism.

Comment on lines +390 to +396
if let Some(cache_path) = registry_cache_lookup_sync(&package.name, &package.version) {
self.download_done.insert(key, cache_path.clone());
if let Some(spec) = waiter {
self.clone_queue.push_back(ReadyClone { spec, cache_path });
}
return;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Performing synchronous I/O (registry_cache_lookup_sync) inside the main scheduler loop can block the async executor. While this is intended as a performance optimization to avoid task spawning overhead for cache hits, it introduces a risk of head-of-line blocking for the entire installation process if the filesystem is slow (e.g., network drives or high I/O wait). Since the scheduler is responsible for pumping all downloads and clones, any delay here affects overall throughput. Consider if this trade-off is acceptable for all supported environments, or if a small batching mechanism or spawn_blocking should be used for the probe.

@github-actions
Copy link
Copy Markdown

📊 pm-bench-phases · d60b3ff · linux (ubuntu-latest)

Workflow run — ant-design

PMs: utoo (this branch) · utoo-npm (latest published) · bun (latest)

npmjs.org

p0_full_cold

PM wall ±σ user sys RSS pgMinor
bun 9.27s 0.15s 10.64s 10.32s 689M 326.7K
utoo-next 8.08s 0.18s 10.94s 12.35s 928M 120.9K
utoo-npm 8.09s 0.22s 11.02s 12.25s 978M 123.4K
utoo 7.90s 0.23s 11.89s 12.52s 943M 150.6K
PM vCtx iCtx netRX netTX cache node_mod lock
bun 17.3K 19.7K 1.20G 7M 1.88G 1.76G 1M
utoo-next 127.2K 85.6K 1.17G 5M 1.73G 1.72G 2M
utoo-npm 123.5K 90.9K 1.17G 5M 1.73G 1.72G 2M
utoo 123.0K 96.5K 1.17G 6M 1.72G 1.72G 2M

p1_resolve

PM wall ±σ user sys RSS pgMinor
bun 2.13s 0.06s 4.29s 1.15s 533M 169.0K
utoo-next 3.15s 0.11s 5.50s 2.28s 625M 84.6K
utoo-npm 3.23s 0.06s 5.54s 2.32s 616M 88.5K
utoo 2.53s 0.07s 6.21s 1.75s 656M 124.9K
PM vCtx iCtx netRX netTX cache node_mod lock
bun 9.4K 4.9K 203M 3M 108M - 1M
utoo-next 76.5K 94.1K 201M 3M 7M 3M 2M
utoo-npm 77.2K 93.6K 201M 3M 7M 3M 2M
utoo 15.4K 20.5K 204M 3M 7M 3M 2M

p3_cold_install

PM wall ±σ user sys RSS pgMinor
bun 6.93s 0.22s 6.46s 9.98s 626M 210.1K
utoo-next 7.28s 2.23s 5.57s 11.20s 522M 62.1K
utoo-npm 7.36s 2.09s 5.56s 11.06s 460M 61.3K
utoo 6.76s 1.69s 5.28s 10.92s 476M 57.3K
PM vCtx iCtx netRX netTX cache node_mod lock
bun 5.4K 7.2K 1.00G 4M 1.77G 1.77G 1M
utoo-next 123.9K 50.8K 1001M 3M 1.72G 1.72G 3M
utoo-npm 113.1K 49.8K 1000M 3M 1.72G 1.72G 3M
utoo 123.5K 80.8K 1000M 3M 1.72G 1.72G 3M

p4_warm_link

PM wall ±σ user sys RSS pgMinor
bun 3.51s 0.03s 0.19s 2.48s 134M 32.7K
utoo-next 2.56s 0.44s 0.53s 3.94s 79M 18.6K
utoo-npm 2.48s 0.07s 0.53s 3.84s 80M 18.5K
utoo 2.43s 0.15s 0.51s 3.90s 62M 14.3K
PM vCtx iCtx netRX netTX cache node_mod lock
bun 266 25 5M 9K 1.93G 1.74G 1M
utoo-next 42.1K 19.0K 12K 22K 1.72G 1.72G 2M
utoo-npm 40.1K 19.1K 15K 9K 1.72G 1.72G 2M
utoo 50.6K 24.0K 13K 24K 1.73G 1.72G 2M

npmmirror.com: no output captured.

@elrrrrrrr
Copy link
Copy Markdown
Contributor Author

GHA run 1 read

Run: https://github.com/utooland/utoo/actions/runs/26015795276

phase utoo wall utoo ctx same-run utoo-next same-run bun read
p3_cold_install 6.76s ±1.69 123.5K / 80.8K 7.28s, 123.9K / 50.8K 6.93s, 5.4K / 7.2K wall ok but iCtx regresses materially
p4_warm_link 2.43s ±0.15 50.6K / 24.0K 2.56s, 42.1K / 19.0K 3.51s, 266 / 25 not positive; ctx is worse than same-run baselines and much worse than #2965

Conclusion: do not fold this as-is. Moving only the registry cache-hit probe into the scheduler does not reduce p4 scheduling cost; it likely just shifts the hot path while seeded-cache probe and clone worker costs remain dominant.

@elrrrrrrr
Copy link
Copy Markdown
Contributor Author

Closing this PM performance experiment after the investigation phase. The benchmark data and conclusions are preserved in the PR body/comments; we will split the validated pieces into smaller reviewable PRs for the formal ship path.

@elrrrrrrr elrrrrrrr closed this May 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-Pkg Manager Area: Package Manager benchmark Run pm-bench on PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant