perf(pm): serialize scheduler extract writes#2984
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a FileWriteMode enum to crates/pm/src/util/extractor.rs, allowing for both parallel and serial file writing during tarball extraction. This change is intended to prevent nested Rayon fan-out when the package installer is already parallelizing at the package level. Reviewers suggested deriving Debug for the new enum, updating the documentation for extract_tarball_sync to reflect the new modes, and refactoring the file writing logic using try_for_each for improved conciseness.
| mode: u32, | ||
| } | ||
|
|
||
| #[derive(Clone, Copy)] |
| rx.await.with_context(|| "Extract task panicked")? | ||
| } | ||
|
|
||
| /// Synchronous extraction: decompress + parse + parallel write, all on rayon. |
There was a problem hiding this comment.
The documentation should be updated to reflect that the extraction process can now be either parallel or serial depending on the provided mode.
| /// Synchronous extraction: decompress + parse + parallel write, all on rayon. | |
| /// Synchronous extraction: decompress + parse + write (parallel or serial), all on rayon. |
| match file_write_mode { | ||
| FileWriteMode::ParallelChunks => { | ||
| entries | ||
| .par_chunks(WRITE_CHUNK_SIZE) | ||
| .try_for_each(|chunk| -> Result<()> { | ||
| for entry in chunk { | ||
| write_entry(entry)?; | ||
| } | ||
| Ok(()) | ||
| })?; | ||
| } | ||
| FileWriteMode::Serial => { | ||
| for entry in &entries { | ||
| write_entry(entry)?; | ||
| } | ||
| Ok(()) | ||
| })?; | ||
| } | ||
| } |
There was a problem hiding this comment.
The file writing logic can be simplified by using try_for_each on the iterators. This is more concise and consistent with the functional style used in the ParallelChunks arm.
match file_write_mode {
FileWriteMode::ParallelChunks => {
entries
.par_chunks(WRITE_CHUNK_SIZE)
.try_for_each(|chunk| chunk.iter().try_for_each(write_entry))?;
}
FileWriteMode::Serial => {
entries.iter().try_for_each(write_entry)?;
}
}
📊 pm-bench-phases ·
|
| PM | wall | ±σ | user | sys | RSS | pgMinor |
|---|---|---|---|---|---|---|
| bun | 9.14s | 0.20s | 10.44s | 10.36s | 705M | 340.1K |
| utoo-next | 8.31s | 0.03s | 10.71s | 12.86s | 1.02G | 130.1K |
| utoo-npm | 8.42s | 0.10s | 10.81s | 12.54s | 1.02G | 123.0K |
| utoo | 7.84s | 0.18s | 11.07s | 11.71s | 960M | 143.7K |
| PM | vCtx | iCtx | netRX | netTX | cache | node_mod | lock |
|---|---|---|---|---|---|---|---|
| bun | 15.6K | 18.7K | 1.20G | 6M | 1.88G | 1.76G | 1M |
| utoo-next | 151.4K | 106.4K | 1.17G | 6M | 1.72G | 1.72G | 2M |
| utoo-npm | 132.2K | 101.0K | 1.17G | 5M | 1.72G | 1.72G | 2M |
| utoo | 77.4K | 48.7K | 1.09G | 6M | 1.72G | 1.72G | 2M |
p1_resolve
| PM | wall | ±σ | user | sys | RSS | pgMinor |
|---|---|---|---|---|---|---|
| bun | 2.20s | 0.05s | 3.95s | 1.07s | 487M | 186.0K |
| utoo-next | 3.18s | 0.07s | 5.46s | 2.13s | 622M | 84.2K |
| utoo-npm | 4.05s | 1.53s | 5.48s | 2.13s | 618M | 81.3K |
| utoo | 2.38s | 0.05s | 5.96s | 1.65s | 646M | 119.6K |
| PM | vCtx | iCtx | netRX | netTX | cache | node_mod | lock |
|---|---|---|---|---|---|---|---|
| bun | 10.9K | 3.7K | 203M | 3M | 108M | - | 1M |
| utoo-next | 75.2K | 91.8K | 201M | 2M | 7M | 3M | 2M |
| utoo-npm | 76.9K | 90.8K | 201M | 3M | 7M | 3M | 2M |
| utoo | 14.5K | 18.6K | 204M | 3M | 7M | 3M | 2M |
p3_cold_install
| PM | wall | ±σ | user | sys | RSS | pgMinor |
|---|---|---|---|---|---|---|
| bun | 7.05s | 0.20s | 6.37s | 9.98s | 580M | 198.2K |
| utoo-next | 9.77s | 5.14s | 5.04s | 11.21s | 448M | 64.9K |
| utoo-npm | 8.92s | 3.54s | 5.24s | 11.42s | 541M | 68.0K |
| utoo | 5.84s | 0.46s | 4.81s | 10.28s | 475M | 56.7K |
| PM | vCtx | iCtx | netRX | netTX | cache | node_mod | lock |
|---|---|---|---|---|---|---|---|
| bun | 5.8K | 8.1K | 1.00G | 4M | 1.77G | 1.77G | 1M |
| utoo-next | 124.3K | 50.2K | 998M | 3M | 1.71G | 1.71G | 2M |
| utoo-npm | 134.4K | 52.7K | 999M | 4M | 1.71G | 1.71G | 2M |
| utoo | 70.7K | 43.6K | 998M | 3M | 1.71G | 1.71G | 2M |
p4_warm_link
| PM | wall | ±σ | user | sys | RSS | pgMinor |
|---|---|---|---|---|---|---|
| bun | 3.42s | 0.00s | 0.21s | 2.35s | 134M | 32.7K |
| utoo-next | 2.39s | 0.10s | 0.51s | 3.80s | 79M | 18.6K |
| utoo-npm | 2.31s | 0.04s | 0.49s | 3.77s | 79M | 18.3K |
| utoo | 2.15s | 0.11s | 0.37s | 3.42s | 50M | 10.7K |
| PM | vCtx | iCtx | netRX | netTX | cache | node_mod | lock |
|---|---|---|---|---|---|---|---|
| bun | 452 | 21 | 5M | 21K | 1.93G | 1.76G | 1M |
| utoo-next | 41.4K | 18.4K | 2K | 22K | 1.71G | 1.71G | 2M |
| utoo-npm | 40.9K | 18.6K | 4K | 10K | 1.71G | 1.71G | 2M |
| utoo | 13.7K | 9.5K | 1K | 4K | 1.72G | 1.71G | 2M |
npmmirror.com: no output captured.
Summary
Hypothesis
p3 cold install already has package-level extraction concurrency from the install scheduler. Each package extract worker currently enters
extract_tarball_sync, which then fans out per-file writes through rayon again. That nested parallelism can oversubscribe the filesystem path: many package workers all enqueue file-write chunks at once.Serial file writes inside one tarball should reduce scheduler pressure and sys time while preserving cross-package extraction parallelism.
Validation
cargo fmtCARGO_NET_OFFLINE=true cargo clippy -p utoo-pm --all-targets -- -D warnings --no-depsCARGO_NET_OFFLINE=true cargo test -p utoo-pm util::downloader -- --nocaptureGHA Benchmark
Runs:
linux/npmjs phase bench, this PR column only:
N=2 comparison points:
utoo-next/utoo-npmbaselines on wall and vCtx.Benchmark Plan
Use the normal GHA linux/npmjs phase bench. The expected signal is p3 sys/ctx improvement. p4 should be neutral because warm-link does not extract tarballs.
This PR is stacked on #2982 so it can be reviewed and benchmarked independently from the progress fast path.