Skip to content

perf(pm): run install extract workers on rayon#2971

Draft
elrrrrrrr wants to merge 1 commit into
perf/pm-resolver-demand-bfsfrom
exp/pm-install-extract-rayon-b33d922
Draft

perf(pm): run install extract workers on rayon#2971
elrrrrrrr wants to merge 1 commit into
perf/pm-resolver-demand-bfsfrom
exp/pm-install-extract-rayon-b33d922

Conversation

@elrrrrrrr
Copy link
Copy Markdown
Contributor

Summary

AB experiment for p3 cold install scheduling.

The install scheduler currently spawns a tokio task for extraction, and that task immediately awaits the extractor rayon job. This change removes that outer tokio task for extract work:

  • add synchronous extract_and_write_sync / extract_to_cache_sync
  • run install extract jobs directly on rayon workers
  • return OpDone::Extract to the scheduler through an internal completion channel
  • keep download and clone scheduling unchanged in this experiment

Hypothesis

p3 cold install spends real time in tarball decompression and package-cache writes. The tokio wrapper around rayon extraction does not do useful async work; it only adds task scheduling and wakeups. Moving extract workers directly onto rayon should reduce ctx and possibly wall in p3 without affecting p1.

Validation

Passed:

  • cargo fmt
  • cargo test -p utoo-pm service::install_scheduler::tests
  • cargo test -p utoo-pm util::downloader::tests
  • cargo test -p utoo-pm test_extract_and_write_sync
  • cargo clippy -p utoo-pm --all-targets -- -D warnings --no-deps

Full workspace clippy is not repeated here; it is currently blocked in this worktree by the known pack-core/next.js API mismatch observed on #2969 validation.

Benchmark Plan

Label-trigger GHA bench. Main read is p3 wall and vCtx/iCtx vs same-run utoo-next, utoo-npm, and bun; p4 should be neutral unless extract cache probing changes show up.

@elrrrrrrr elrrrrrrr added A-Pkg Manager Area: Package Manager benchmark Run pm-bench on PR labels May 18, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the package extraction process to use synchronous operations executed on a dedicated thread pool via rayon, replacing the previous asynchronous tokio tasks. Key changes include the introduction of extract_to_cache_sync and extract_and_write_sync, along with updates to the InstallScheduler to handle task completion through a new internal channel. Feedback indicates that error handling in extract_to_cache_sync should be improved by propagating I/O errors from try_exists() rather than swallowing them.

pub fn extract_to_cache_sync(name: &str, version: &str, bytes: Bytes) -> Result<PathBuf> {
let cache_path = registry_cache_path(name, version);

if cache_path.join("_resolved").try_exists().unwrap_or(false) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The use of .unwrap_or(false) on the result of try_exists() swallows potential I/O errors. If try_exists() fails for a reason other than the file not existing (e.g., permission issues), the error is ignored, and the code proceeds as if the file doesn't exist. This can lead to redundant work and obscure the root cause of the failure.

It's better to propagate the error using the ? operator, which is consistent with how errors are handled elsewhere in this function and in extract_and_write_sync.

Suggested change
if cache_path.join("_resolved").try_exists().unwrap_or(false) {
if cache_path.join("_resolved").try_exists()? {

@github-actions
Copy link
Copy Markdown

📊 pm-bench-phases · 5d274c7 · linux (ubuntu-latest)

Workflow run — ant-design

PMs: utoo (this branch) · utoo-npm (latest published) · bun (latest)

npmjs.org

p0_full_cold

PM wall ±σ user sys RSS pgMinor
bun 9.31s 0.31s 10.66s 10.29s 671M 329.3K
utoo-next 8.02s 0.24s 10.55s 12.19s 940M 123.1K
utoo-npm 8.73s 0.97s 10.96s 12.47s 966M 121.6K
utoo 7.42s 0.17s 11.26s 11.89s 946M 148.4K
PM vCtx iCtx netRX netTX cache node_mod lock
bun 17.7K 19.3K 1.20G 7M 1.90G 1.77G 1M
utoo-next 126.1K 89.0K 1.17G 5M 1.73G 1.72G 2M
utoo-npm 138.1K 98.5K 1.17G 5M 1.73G 1.72G 2M
utoo 100.4K 66.1K 1.17G 6M 1.72G 1.72G 3M

p1_resolve

PM wall ±σ user sys RSS pgMinor
bun 2.10s 0.08s 4.08s 1.15s 518M 173.5K
utoo-next 3.03s 0.06s 5.38s 2.28s 624M 88.0K
utoo-npm 3.08s 0.05s 5.47s 2.21s 618M 79.0K
utoo 2.45s 0.08s 6.04s 1.79s 656M 127.2K
PM vCtx iCtx netRX netTX cache node_mod lock
bun 9.5K 4.7K 203M 3M 108M - 1M
utoo-next 76.7K 96.9K 201M 2M 7M 3M 2M
utoo-npm 76.2K 98.3K 201M 3M 7M 3M 2M
utoo 16.1K 21.8K 204M 3M 7M 3M 2M

p3_cold_install

PM wall ±σ user sys RSS pgMinor
bun 6.75s 0.13s 6.44s 9.71s 637M 205.5K
utoo-next 6.97s 1.40s 5.18s 10.69s 490M 64.1K
utoo-npm 6.16s 0.28s 5.32s 10.62s 484M 65.7K
utoo 6.55s 1.61s 5.18s 10.68s 518M 64.0K
PM vCtx iCtx netRX netTX cache node_mod lock
bun 5.7K 7.2K 1.00G 4M 1.77G 1.77G 1M
utoo-next 113.0K 50.2K 1000M 3M 1.72G 1.72G 2M
utoo-npm 111.3K 49.7K 1000M 3M 1.72G 1.72G 2M
utoo 101.3K 56.5K 1000M 3M 1.72G 1.72G 2M

p4_warm_link

PM wall ±σ user sys RSS pgMinor
bun 3.57s 0.04s 0.17s 2.51s 135M 31.6K
utoo-next 2.41s 0.28s 0.53s 3.88s 78M 18.3K
utoo-npm 2.42s 0.12s 0.51s 3.82s 80M 18.9K
utoo 2.33s 0.05s 0.54s 3.90s 62M 14.3K
PM vCtx iCtx netRX netTX cache node_mod lock
bun 251 17 47K 18K 1.88G 1.77G 1M
utoo-next 43.4K 19.6K 6K 26K 1.72G 1.72G 2M
utoo-npm 42.2K 20.3K 2K 6K 1.72G 1.72G 2M
utoo 53.7K 24.8K 2K 4K 1.73G 1.72G 2M

npmmirror.com: no output captured.

@elrrrrrrr
Copy link
Copy Markdown
Contributor Author

GHA run 1 read

Run: https://github.com/utooland/utoo/actions/runs/26017369914

phase utoo wall utoo ctx same-run utoo-next same-run utoo-npm same-run bun read
p3_cold_install 6.55s ±1.61 101.3K / 56.5K 6.97s, 113.0K / 50.2K 6.16s, 111.3K / 49.7K 6.75s, 5.7K / 7.2K mixed; vCtx improves, but iCtx still worse and wall does not beat utoo-npm
p4_warm_link 2.33s ±0.05 53.7K / 24.8K 2.41s, 43.4K / 19.6K 2.42s, 42.2K / 20.3K 3.57s, 251 / 17 p4 ctx regresses; not acceptable

Conclusion: reject as a standalone fold. Removing the outer tokio task around extract work helps part of p3 vCtx, but it does not reduce iCtx enough and hurts p4 ctx. Keep #2965 as the proven scheduling boundary; this extract change needs a different design before it is worth combining.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-Pkg Manager Area: Package Manager benchmark Run pm-bench on PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant