Skip to content

perf(pm): prefetch lockfile downloads by depth#2977

Draft
elrrrrrrr wants to merge 5 commits into
perf/pm-resolver-demand-bfsfrom
exp/pm-install-depth-order-prefetch-b33d922
Draft

perf(pm): prefetch lockfile downloads by depth#2977
elrrrrrrr wants to merge 5 commits into
perf/pm-resolver-demand-bfsfrom
exp/pm-install-depth-order-prefetch-b33d922

Conversation

@elrrrrrrr
Copy link
Copy Markdown
Contributor

@elrrrrrrr elrrrrrrr commented May 18, 2026

Summary

  • prefetch registry tarballs from package-lock in depth order instead of HashMap iteration order
  • reuse the same sorted-depth helper for the install loop
  • add a small unit test for deterministic depth ordering

Hypothesis

p3 cold install has an existing lockfile but an empty cache. The previous lockfile prefetch can enqueue deep tarballs ahead of shallow packages, so early ensure_clone calls may wait behind non-critical downloads. Shallow-first prefetch should preserve p4 behavior and improve p3 critical-path latency.

Validation

  • cargo fmt
  • cargo test -p utoo-pm service::install::tests::test_sorted_depths
  • cargo test -p utoo-pm service::install::tests::test_is_prefetchable_registry_tarball
  • cargo clippy -p utoo-pm --all-targets -- -D warnings --no-deps

Benchmark

GHA linux/npmjs, run 26020954979:

phase utoo wall utoo ctx #2974 wall #2974 ctx utoo-next wall bun wall
p0_full_cold 7.45s ±0.05 83.7K / 47.6K 7.58s ±0.13 84.9K / 51.9K 7.96s ±0.07 9.04s ±0.30
p1_resolve 2.25s ±0.02 14.3K / 17.7K 2.44s ±0.03 16.8K / 21.4K 2.94s ±0.05 2.25s ±0.03
p3_cold_install 6.48s ±2.13 80.6K / 44.8K 6.34s ±1.90 78.1K / 44.0K 5.88s ±0.08 6.91s ±0.28
p4_warm_link 2.12s ±0.06 18.3K / 12.1K 2.21s ±0.04 18.4K / 12.1K 2.79s ±0.58 3.59s ±0.02

Conclusion: reject as standalone. Depth-order prefetch does not improve p3 and p4 only matches the #2974 baseline. Keep #2975 as the balanced candidate; #2976 remains a p3-only idea with p4 regression.

@elrrrrrrr elrrrrrrr added A-Pkg Manager Area: Package Manager benchmark Run pm-bench on PR labels May 18, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a prefetching mechanism for package downloads and refactors the installation scheduler to perform disk-intensive operations—such as cloning and extraction—synchronously within a Rayon worker pool. Review feedback suggests removing panic recovery logic in worker threads to ensure unrecoverable bugs propagate correctly, deduplicating package filtering logic into a shared helper, and optimizing registry URL validation by pre-calculating trimmed strings to reduce allocations during lockfile processing.

Comment on lines +479 to +492
rayon::spawn(move || {
let result = std::panic::catch_unwind(std::panic::AssertUnwindSafe(|| {
clone_package_from_cache_sync(
&job.spec.package.name,
&job.spec.package.version,
&job.spec.package.tarball_url,
&job.cache_path,
&job.spec.target,
)
.map_err(|e| format!("{e:#}"))
}))
.unwrap_or_else(|_| Err("install clone worker panicked".to_string()));
let _ = done_tx.send(OpDone::Clone { target, result });
});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The use of std::panic::catch_unwind here violates the general rule: "Do not implement recovery logic for panics. Panics should be treated as unrecoverable bugs that need to be fixed, not as transient, recoverable errors." If a panic occurs in a worker thread, it should be allowed to propagate as an unrecoverable failure rather than being caught and reported as a standard error.

            rayon::spawn(move || {
                let result = clone_package_from_cache_sync(
                    &job.spec.package.name,
                    &job.spec.package.version,
                    &job.spec.package.tarball_url,
                    &job.cache_path,
                    &job.spec.target,
                )
                .map_err(|e| format!("{e:#}"));
                let _ = done_tx.send(OpDone::Clone { target, result });
            });
References
  1. Do not implement recovery logic for panics. Panics should be treated as unrecoverable bugs that need to be fixed, not as transient, recoverable errors.

Comment on lines +95 to +122
for (path, package) in packages {
if should_omit_package(package, omit) || package.link.is_some() {
continue;
}

let Some(resolved) = package.resolved.as_deref() else {
continue;
};
if !is_prefetchable_registry_tarball(resolved, &registry) {
continue;
}

if package
.cpu
.as_ref()
.is_some_and(|cpu| !is_cpu_compatible(cpu))
|| package.os.as_ref().is_some_and(|os| !is_os_compatible(os))
{
continue;
}

let Some(version) = package.version.as_deref() else {
continue;
};
let name = package.get_name(path);
if name.is_empty() || name == "root" || name == "unknown" {
continue;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The filtering logic for packages (checking omit status, link status, platform compatibility, and name validity) is largely duplicated between prefetch_lock_downloads and install_packages. This increases the risk of logic drift where prefetching might skip packages that the installer expects, or vice versa. Consider extracting this into a shared helper function (e.g., should_process_package) to ensure consistency and improve maintainability.

Comment on lines +68 to +75
fn is_prefetchable_registry_tarball(resolved: &str, registry: &str) -> bool {
[registry, REGISTRY_NPMJS, REGISTRY_NPMMIRROR]
.into_iter()
.map(|registry| registry.trim_end_matches('/'))
.any(|registry| {
resolved.starts_with(registry) && resolved[registry.len()..].starts_with('/')
})
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This function is called for every package in the lockfile. Trimming the registry strings and allocating an iterator on every call is inefficient, especially for large dependency trees. Since the registry URLs are constant or fixed for the duration of the command, they should be pre-trimmed outside the loop in prefetch_lock_downloads and passed in as a slice of pre-calculated strings.

@github-actions
Copy link
Copy Markdown

📊 pm-bench-phases · 572a962 · linux (ubuntu-latest)

Workflow run — ant-design

PMs: utoo (this branch) · utoo-npm (latest published) · bun (latest)

npmjs.org

p0_full_cold

PM wall ±σ user sys RSS pgMinor
bun 9.57s 0.33s 10.74s 10.81s 759M 342.3K
utoo-next 8.24s 0.31s 10.79s 12.69s 966M 124.4K
utoo-npm 8.22s 0.11s 11.06s 12.82s 982M 117.8K
utoo 7.65s 0.12s 11.36s 12.38s 921M 145.3K
PM vCtx iCtx netRX netTX cache node_mod lock
bun 14.4K 18.5K 1.21G 6M 1.90G 1.77G 1M
utoo-next 121.7K 79.7K 1.17G 4M 1.73G 1.72G 2M
utoo-npm 124.3K 88.7K 1.17G 5M 1.73G 1.72G 2M
utoo 71.2K 46.6K 641M 3M 1.72G 1.72G 2M

p1_resolve

PM wall ±σ user sys RSS pgMinor
bun 1.92s 0.09s 4.13s 1.06s 510M 158.2K
utoo-next 3.05s 0.05s 5.65s 2.10s 627M 77.1K
utoo-npm 3.15s 0.05s 5.77s 2.13s 626M 84.5K
utoo 2.46s 0.04s 6.20s 1.77s 661M 122.4K
PM vCtx iCtx netRX netTX cache node_mod lock
bun 8.0K 4.7K 204M 3M 108M - 1M
utoo-next 71.3K 89.9K 201M 2M 7M 3M 2M
utoo-npm 71.4K 92.4K 201M 2M 7M 3M 2M
utoo 14.3K 19.2K 204M 3M 7M 3M 3M

p3_cold_install

PM wall ±σ user sys RSS pgMinor
bun 6.72s 0.22s 6.48s 10.29s 629M 210.8K
utoo-next 5.93s 0.08s 5.08s 10.95s 482M 60.1K
utoo-npm 6.25s 0.95s 5.15s 11.24s 487M 61.3K
utoo 5.32s 0.11s 4.88s 10.40s 500M 57.2K
PM vCtx iCtx netRX netTX cache node_mod lock
bun 4.1K 6.9K 1.00G 4M 1.77G 1.77G 1M
utoo-next 96.6K 47.8K 999M 2M 1.72G 1.72G 2M
utoo-npm 106.1K 50.8K 999M 3M 1.72G 1.72G 2M
utoo 58.5K 39.5K 998M 2M 1.72G 1.72G 2M

p4_warm_link

PM wall ±σ user sys RSS pgMinor
bun 3.19s 0.06s 0.18s 2.42s 134M 31.6K
utoo-next 2.51s 0.59s 0.55s 3.85s 78M 18.0K
utoo-npm 2.05s 0.02s 0.52s 3.86s 80M 18.5K
utoo 1.85s 0.04s 0.40s 3.54s 50M 11.3K
PM vCtx iCtx netRX netTX cache node_mod lock
bun 276 22 4K 28K 1.88G 1.77G 1M
utoo-next 40.9K 18.2K 3K 10K 1.72G 1.72G 2M
utoo-npm 42.3K 20.2K 3K 24K 1.72G 1.72G 2M
utoo 17.4K 11.4K 2K 9K 1.72G 1.72G 2M

npmmirror.com: no output captured.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-Pkg Manager Area: Package Manager benchmark Run pm-bench on PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant