perf(pm): prefetch lockfile downloads by depth#2977
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a prefetching mechanism for package downloads and refactors the installation scheduler to perform disk-intensive operations—such as cloning and extraction—synchronously within a Rayon worker pool. Review feedback suggests removing panic recovery logic in worker threads to ensure unrecoverable bugs propagate correctly, deduplicating package filtering logic into a shared helper, and optimizing registry URL validation by pre-calculating trimmed strings to reduce allocations during lockfile processing.
| rayon::spawn(move || { | ||
| let result = std::panic::catch_unwind(std::panic::AssertUnwindSafe(|| { | ||
| clone_package_from_cache_sync( | ||
| &job.spec.package.name, | ||
| &job.spec.package.version, | ||
| &job.spec.package.tarball_url, | ||
| &job.cache_path, | ||
| &job.spec.target, | ||
| ) | ||
| .map_err(|e| format!("{e:#}")) | ||
| })) | ||
| .unwrap_or_else(|_| Err("install clone worker panicked".to_string())); | ||
| let _ = done_tx.send(OpDone::Clone { target, result }); | ||
| }); |
There was a problem hiding this comment.
The use of std::panic::catch_unwind here violates the general rule: "Do not implement recovery logic for panics. Panics should be treated as unrecoverable bugs that need to be fixed, not as transient, recoverable errors." If a panic occurs in a worker thread, it should be allowed to propagate as an unrecoverable failure rather than being caught and reported as a standard error.
rayon::spawn(move || {
let result = clone_package_from_cache_sync(
&job.spec.package.name,
&job.spec.package.version,
&job.spec.package.tarball_url,
&job.cache_path,
&job.spec.target,
)
.map_err(|e| format!("{e:#}"));
let _ = done_tx.send(OpDone::Clone { target, result });
});References
- Do not implement recovery logic for panics. Panics should be treated as unrecoverable bugs that need to be fixed, not as transient, recoverable errors.
| for (path, package) in packages { | ||
| if should_omit_package(package, omit) || package.link.is_some() { | ||
| continue; | ||
| } | ||
|
|
||
| let Some(resolved) = package.resolved.as_deref() else { | ||
| continue; | ||
| }; | ||
| if !is_prefetchable_registry_tarball(resolved, ®istry) { | ||
| continue; | ||
| } | ||
|
|
||
| if package | ||
| .cpu | ||
| .as_ref() | ||
| .is_some_and(|cpu| !is_cpu_compatible(cpu)) | ||
| || package.os.as_ref().is_some_and(|os| !is_os_compatible(os)) | ||
| { | ||
| continue; | ||
| } | ||
|
|
||
| let Some(version) = package.version.as_deref() else { | ||
| continue; | ||
| }; | ||
| let name = package.get_name(path); | ||
| if name.is_empty() || name == "root" || name == "unknown" { | ||
| continue; | ||
| } |
There was a problem hiding this comment.
The filtering logic for packages (checking omit status, link status, platform compatibility, and name validity) is largely duplicated between prefetch_lock_downloads and install_packages. This increases the risk of logic drift where prefetching might skip packages that the installer expects, or vice versa. Consider extracting this into a shared helper function (e.g., should_process_package) to ensure consistency and improve maintainability.
| fn is_prefetchable_registry_tarball(resolved: &str, registry: &str) -> bool { | ||
| [registry, REGISTRY_NPMJS, REGISTRY_NPMMIRROR] | ||
| .into_iter() | ||
| .map(|registry| registry.trim_end_matches('/')) | ||
| .any(|registry| { | ||
| resolved.starts_with(registry) && resolved[registry.len()..].starts_with('/') | ||
| }) | ||
| } |
There was a problem hiding this comment.
This function is called for every package in the lockfile. Trimming the registry strings and allocating an iterator on every call is inefficient, especially for large dependency trees. Since the registry URLs are constant or fixed for the duration of the command, they should be pre-trimmed outside the loop in prefetch_lock_downloads and passed in as a slice of pre-calculated strings.
📊 pm-bench-phases ·
|
| PM | wall | ±σ | user | sys | RSS | pgMinor |
|---|---|---|---|---|---|---|
| bun | 9.57s | 0.33s | 10.74s | 10.81s | 759M | 342.3K |
| utoo-next | 8.24s | 0.31s | 10.79s | 12.69s | 966M | 124.4K |
| utoo-npm | 8.22s | 0.11s | 11.06s | 12.82s | 982M | 117.8K |
| utoo | 7.65s | 0.12s | 11.36s | 12.38s | 921M | 145.3K |
| PM | vCtx | iCtx | netRX | netTX | cache | node_mod | lock |
|---|---|---|---|---|---|---|---|
| bun | 14.4K | 18.5K | 1.21G | 6M | 1.90G | 1.77G | 1M |
| utoo-next | 121.7K | 79.7K | 1.17G | 4M | 1.73G | 1.72G | 2M |
| utoo-npm | 124.3K | 88.7K | 1.17G | 5M | 1.73G | 1.72G | 2M |
| utoo | 71.2K | 46.6K | 641M | 3M | 1.72G | 1.72G | 2M |
p1_resolve
| PM | wall | ±σ | user | sys | RSS | pgMinor |
|---|---|---|---|---|---|---|
| bun | 1.92s | 0.09s | 4.13s | 1.06s | 510M | 158.2K |
| utoo-next | 3.05s | 0.05s | 5.65s | 2.10s | 627M | 77.1K |
| utoo-npm | 3.15s | 0.05s | 5.77s | 2.13s | 626M | 84.5K |
| utoo | 2.46s | 0.04s | 6.20s | 1.77s | 661M | 122.4K |
| PM | vCtx | iCtx | netRX | netTX | cache | node_mod | lock |
|---|---|---|---|---|---|---|---|
| bun | 8.0K | 4.7K | 204M | 3M | 108M | - | 1M |
| utoo-next | 71.3K | 89.9K | 201M | 2M | 7M | 3M | 2M |
| utoo-npm | 71.4K | 92.4K | 201M | 2M | 7M | 3M | 2M |
| utoo | 14.3K | 19.2K | 204M | 3M | 7M | 3M | 3M |
p3_cold_install
| PM | wall | ±σ | user | sys | RSS | pgMinor |
|---|---|---|---|---|---|---|
| bun | 6.72s | 0.22s | 6.48s | 10.29s | 629M | 210.8K |
| utoo-next | 5.93s | 0.08s | 5.08s | 10.95s | 482M | 60.1K |
| utoo-npm | 6.25s | 0.95s | 5.15s | 11.24s | 487M | 61.3K |
| utoo | 5.32s | 0.11s | 4.88s | 10.40s | 500M | 57.2K |
| PM | vCtx | iCtx | netRX | netTX | cache | node_mod | lock |
|---|---|---|---|---|---|---|---|
| bun | 4.1K | 6.9K | 1.00G | 4M | 1.77G | 1.77G | 1M |
| utoo-next | 96.6K | 47.8K | 999M | 2M | 1.72G | 1.72G | 2M |
| utoo-npm | 106.1K | 50.8K | 999M | 3M | 1.72G | 1.72G | 2M |
| utoo | 58.5K | 39.5K | 998M | 2M | 1.72G | 1.72G | 2M |
p4_warm_link
| PM | wall | ±σ | user | sys | RSS | pgMinor |
|---|---|---|---|---|---|---|
| bun | 3.19s | 0.06s | 0.18s | 2.42s | 134M | 31.6K |
| utoo-next | 2.51s | 0.59s | 0.55s | 3.85s | 78M | 18.0K |
| utoo-npm | 2.05s | 0.02s | 0.52s | 3.86s | 80M | 18.5K |
| utoo | 1.85s | 0.04s | 0.40s | 3.54s | 50M | 11.3K |
| PM | vCtx | iCtx | netRX | netTX | cache | node_mod | lock |
|---|---|---|---|---|---|---|---|
| bun | 276 | 22 | 4K | 28K | 1.88G | 1.77G | 1M |
| utoo-next | 40.9K | 18.2K | 3K | 10K | 1.72G | 1.72G | 2M |
| utoo-npm | 42.3K | 20.2K | 3K | 24K | 1.72G | 1.72G | 2M |
| utoo | 17.4K | 11.4K | 2K | 9K | 1.72G | 1.72G | 2M |
npmmirror.com: no output captured.
Summary
Hypothesis
p3 cold install has an existing lockfile but an empty cache. The previous lockfile prefetch can enqueue deep tarballs ahead of shallow packages, so early
ensure_clonecalls may wait behind non-critical downloads. Shallow-first prefetch should preserve p4 behavior and improve p3 critical-path latency.Validation
Benchmark
GHA linux/npmjs, run 26020954979:
Conclusion: reject as standalone. Depth-order prefetch does not improve p3 and p4 only matches the #2974 baseline. Keep #2975 as the balanced candidate; #2976 remains a p3-only idea with p4 regression.