Skip to content

perf(pm): serialize scheduler extract writes#2984

Draft
elrrrrrrr wants to merge 1 commit into
exp/pm-gate-install-progress-non-ttyfrom
exp/pm-serial-scheduler-extract-writes
Draft

perf(pm): serialize scheduler extract writes#2984
elrrrrrrr wants to merge 1 commit into
exp/pm-gate-install-progress-non-ttyfrom
exp/pm-serial-scheduler-extract-writes

Conversation

@elrrrrrrr
Copy link
Copy Markdown
Contributor

@elrrrrrrr elrrrrrrr commented May 18, 2026

Summary

  • keep async standalone tarball extraction on the existing parallel file-write path
  • make the install scheduler's sync extraction write files serially inside each tarball
  • avoid nested rayon fan-out when package-level extract workers are already running concurrently

Hypothesis

p3 cold install already has package-level extraction concurrency from the install scheduler. Each package extract worker currently enters extract_tarball_sync, which then fans out per-file writes through rayon again. That nested parallelism can oversubscribe the filesystem path: many package workers all enqueue file-write chunks at once.

Serial file writes inside one tarball should reduce scheduler pressure and sys time while preserving cross-package extraction parallelism.

Validation

  • cargo fmt
  • CARGO_NET_OFFLINE=true cargo clippy -p utoo-pm --all-targets -- -D warnings --no-deps
  • CARGO_NET_OFFLINE=true cargo test -p utoo-pm util::downloader -- --nocapture

GHA Benchmark

Runs:

linux/npmjs phase bench, this PR column only:

phase run 1 run 2
p0 wall 9.91s 9.95s
p0 vCtx/iCtx 90.0K / 56.8K 90.8K / 51.1K
p1 wall 2.56s 2.82s
p1 vCtx/iCtx 14.9K / 18.4K 16.5K / 19.5K
p3 wall 6.34s 5.83s
p3 vCtx/iCtx 69.1K / 46.5K 67.6K / 42.5K
p4 wall 2.14s 2.16s
p4 vCtx/iCtx 13.4K / 9.7K 13.1K / 9.3K

N=2 comparison points:

  • p4 is neutral and stable, as expected, because warm-link does not extract tarballs.
  • p3 remains healthy and beats the same-run utoo-next / utoo-npm baselines on wall and vCtx.
  • The isolated contribution of serial per-tarball writes is still modest/uncertain because this PR is stacked on the install scheduler changes; compared with the previous stacked PR's historical GHA samples, p3 is in the same band rather than a clear step-change.

Benchmark Plan

Use the normal GHA linux/npmjs phase bench. The expected signal is p3 sys/ctx improvement. p4 should be neutral because warm-link does not extract tarballs.

This PR is stacked on #2982 so it can be reviewed and benchmarked independently from the progress fast path.

@elrrrrrrr elrrrrrrr added A-Pkg Manager Area: Package Manager benchmark Run pm-bench on PR labels May 18, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a FileWriteMode enum to crates/pm/src/util/extractor.rs, allowing for both parallel and serial file writing during tarball extraction. This change is intended to prevent nested Rayon fan-out when the package installer is already parallelizing at the package level. Reviewers suggested deriving Debug for the new enum, updating the documentation for extract_tarball_sync to reflect the new modes, and refactoring the file writing logic using try_for_each for improved conciseness.

mode: u32,
}

#[derive(Clone, Copy)]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

It is good practice to derive Debug for internal enums to facilitate logging and debugging in the future.

Suggested change
#[derive(Clone, Copy)]
#[derive(Clone, Copy, Debug)]

rx.await.with_context(|| "Extract task panicked")?
}

/// Synchronous extraction: decompress + parse + parallel write, all on rayon.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The documentation should be updated to reflect that the extraction process can now be either parallel or serial depending on the provided mode.

Suggested change
/// Synchronous extraction: decompress + parse + parallel write, all on rayon.
/// Synchronous extraction: decompress + parse + write (parallel or serial), all on rayon.

Comment on lines +191 to +207
match file_write_mode {
FileWriteMode::ParallelChunks => {
entries
.par_chunks(WRITE_CHUNK_SIZE)
.try_for_each(|chunk| -> Result<()> {
for entry in chunk {
write_entry(entry)?;
}
Ok(())
})?;
}
FileWriteMode::Serial => {
for entry in &entries {
write_entry(entry)?;
}
Ok(())
})?;
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The file writing logic can be simplified by using try_for_each on the iterators. This is more concise and consistent with the functional style used in the ParallelChunks arm.

    match file_write_mode {
        FileWriteMode::ParallelChunks => {
            entries
                .par_chunks(WRITE_CHUNK_SIZE)
                .try_for_each(|chunk| chunk.iter().try_for_each(write_entry))?;
        }
        FileWriteMode::Serial => {
            entries.iter().try_for_each(write_entry)?;
        }
    }

@elrrrrrrr elrrrrrrr added benchmark Run pm-bench on PR and removed benchmark Run pm-bench on PR labels May 18, 2026
@github-actions
Copy link
Copy Markdown

📊 pm-bench-phases · 3c0c21e · linux (ubuntu-latest)

Workflow run — ant-design

PMs: utoo (this branch) · utoo-npm (latest published) · bun (latest)

npmjs.org

p0_full_cold

PM wall ±σ user sys RSS pgMinor
bun 9.14s 0.20s 10.44s 10.36s 705M 340.1K
utoo-next 8.31s 0.03s 10.71s 12.86s 1.02G 130.1K
utoo-npm 8.42s 0.10s 10.81s 12.54s 1.02G 123.0K
utoo 7.84s 0.18s 11.07s 11.71s 960M 143.7K
PM vCtx iCtx netRX netTX cache node_mod lock
bun 15.6K 18.7K 1.20G 6M 1.88G 1.76G 1M
utoo-next 151.4K 106.4K 1.17G 6M 1.72G 1.72G 2M
utoo-npm 132.2K 101.0K 1.17G 5M 1.72G 1.72G 2M
utoo 77.4K 48.7K 1.09G 6M 1.72G 1.72G 2M

p1_resolve

PM wall ±σ user sys RSS pgMinor
bun 2.20s 0.05s 3.95s 1.07s 487M 186.0K
utoo-next 3.18s 0.07s 5.46s 2.13s 622M 84.2K
utoo-npm 4.05s 1.53s 5.48s 2.13s 618M 81.3K
utoo 2.38s 0.05s 5.96s 1.65s 646M 119.6K
PM vCtx iCtx netRX netTX cache node_mod lock
bun 10.9K 3.7K 203M 3M 108M - 1M
utoo-next 75.2K 91.8K 201M 2M 7M 3M 2M
utoo-npm 76.9K 90.8K 201M 3M 7M 3M 2M
utoo 14.5K 18.6K 204M 3M 7M 3M 2M

p3_cold_install

PM wall ±σ user sys RSS pgMinor
bun 7.05s 0.20s 6.37s 9.98s 580M 198.2K
utoo-next 9.77s 5.14s 5.04s 11.21s 448M 64.9K
utoo-npm 8.92s 3.54s 5.24s 11.42s 541M 68.0K
utoo 5.84s 0.46s 4.81s 10.28s 475M 56.7K
PM vCtx iCtx netRX netTX cache node_mod lock
bun 5.8K 8.1K 1.00G 4M 1.77G 1.77G 1M
utoo-next 124.3K 50.2K 998M 3M 1.71G 1.71G 2M
utoo-npm 134.4K 52.7K 999M 4M 1.71G 1.71G 2M
utoo 70.7K 43.6K 998M 3M 1.71G 1.71G 2M

p4_warm_link

PM wall ±σ user sys RSS pgMinor
bun 3.42s 0.00s 0.21s 2.35s 134M 32.7K
utoo-next 2.39s 0.10s 0.51s 3.80s 79M 18.6K
utoo-npm 2.31s 0.04s 0.49s 3.77s 79M 18.3K
utoo 2.15s 0.11s 0.37s 3.42s 50M 10.7K
PM vCtx iCtx netRX netTX cache node_mod lock
bun 452 21 5M 21K 1.93G 1.76G 1M
utoo-next 41.4K 18.4K 2K 22K 1.71G 1.71G 2M
utoo-npm 40.9K 18.6K 4K 10K 1.71G 1.71G 2M
utoo 13.7K 9.5K 1K 4K 1.72G 1.71G 2M

npmmirror.com: no output captured.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-Pkg Manager Area: Package Manager benchmark Run pm-bench on PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant