perf(pm): run install clone workers on rayon#2965
Conversation
There was a problem hiding this comment.
Code Review
This pull request transitions the package cloning process from an asynchronous model using tokio::spawn to a synchronous one utilizing rayon::spawn and a worker pool. It introduces several synchronous utility functions in cloner.rs and updates the install_scheduler to manage task completion through an internal MPSC channel. Feedback includes identifying redundant cleanup and validation logic in clone_package_sync, improving error context for metadata retrieval, and simplifying directory creation and JSON parsing logic to remove unnecessary checks.
| if dst.try_exists()? { | ||
| if validate_name_version_sync(dst, name, version) { | ||
| return Ok(false); | ||
| } | ||
| if let Err(e) = std::fs::remove_dir_all(dst) { | ||
| tracing::warn!("Failed to clean target directory {}: {}", dst.display(), e); | ||
| } | ||
| } |
There was a problem hiding this comment.
There is redundant cleanup logic here. clone_sync (called on line 492) already contains logic to check for the existence of dst and perform remove_dir_all if it exists. It's cleaner to let the internal helper handle the destructive phase. Additionally, per repository rules for linking-like operations, only the package name should be validated, not the version.
if dst.try_exists()? && validate_name_sync(dst, name) {
return Ok(false);
}References
- For package linking operations (similar to 'npm link'), only the package name should be validated as required. Do not enforce a version check if the version is not used in the linking logic.
| pub fn clone_dir_sync(src: &Path, dst: &Path) -> Result<()> { | ||
| let err_msg = format!("Failed to clone {} to {}", src.display(), dst.display()); | ||
|
|
||
| if !fs::metadata(src)?.is_dir() { |
There was a problem hiding this comment.
The error returned here when fs::metadata fails (e.g., if the source directory does not exist) will not have the err_msg context applied, unlike other errors in this function. This makes debugging harder as the specific source/destination paths won't be included in the error message.
| if !fs::metadata(src)?.is_dir() { | |
| if !fs::metadata(src).with_context(|| err_msg.clone())?.is_dir() { |
| let mut created_dirs = HashSet::new(); | ||
| for dir in &dirs { | ||
| if created_dirs.insert(dir.clone()) | ||
| && let Err(e) = fs::create_dir_all(dir) | ||
| && e.kind() != io::ErrorKind::AlreadyExists | ||
| { | ||
| return Err(e).with_context(|| err_msg.clone()); | ||
| } | ||
| } |
There was a problem hiding this comment.
The HashSet and the AlreadyExists check are redundant here. dirs is populated by collect_entries, which performs a recursive walk of a file system tree, ensuring each destination path is unique. Furthermore, std::fs::create_dir_all is idempotent and returns Ok if the directory already exists. If it returns an error because the path exists but is not a directory, that error should be propagated rather than ignored.
// Phase 2: Create all directories
for dir in &dirs {
if let Err(e) = fs::create_dir_all(dir) {
return Err(e).with_context(|| err_msg.clone());
}
}| match serde_json::from_str(&content) { | ||
| Ok(v) => Ok(v), | ||
| Err(original_err) => match serde_json::from_str::<serde_json::Value>(&content) { | ||
| Ok(value) => serde_json::from_value(value) | ||
| .with_context(|| format!("Failed to deserialize {pkg_path:?}")), | ||
| Err(_) => { | ||
| Err(original_err).with_context(|| format!("Failed to parse JSON from {pkg_path:?}")) | ||
| } | ||
| }, | ||
| } |
There was a problem hiding this comment.
The fallback logic of parsing into serde_json::Value and then using from_value is redundant. serde_json::from_str::<T> already handles the parsing and deserialization in one step. If it fails, it's either a syntax error or a schema mismatch, both of which are captured by the original error. Simplifying this improves readability and slightly reduces overhead.
serde_json::from_str(&content)
.with_context(|| format!("Failed to parse JSON from {pkg_path:?}"))
GHA run 1 readRun: https://github.com/utooland/utoo/actions/runs/26015669415
Conclusion: keep this candidate. This is the first p4 result that is both faster than same-run utoo-next/utoo-npm and materially lower in ctx. Next step is to combine only with other positive single-factor results after #2967 finishes. |
📊 pm-bench-phases ·
|
| PM | wall | ±σ | user | sys | RSS | pgMinor |
|---|---|---|---|---|---|---|
| bun | 8.93s | 0.73s | 10.49s | 6.27s | 596M | 305.3K |
| utoo-next | 8.88s | 1.10s | 11.01s | 8.38s | 1019M | 124.1K |
| utoo-npm | 10.82s | 3.91s | 11.36s | 8.57s | 1003M | 112.2K |
| utoo | 7.25s | 0.27s | 11.64s | 8.11s | 929M | 145.5K |
| PM | vCtx | iCtx | netRX | netTX | cache | node_mod | lock |
|---|---|---|---|---|---|---|---|
| bun | 24.7K | 15.7K | 1.20G | 7M | 1.89G | 1.76G | 1M |
| utoo-next | 170.5K | 126.6K | 1.17G | 6M | 1.73G | 1.72G | 2M |
| utoo-npm | 175.7K | 125.3K | 1.17G | 6M | 1.73G | 1.72G | 2M |
| utoo | 125.7K | 79.5K | 1.18G | 7M | 1.73G | 1.72G | 3M |
p1_resolve
| PM | wall | ±σ | user | sys | RSS | pgMinor |
|---|---|---|---|---|---|---|
| bun | 2.83s | 0.16s | 4.41s | 0.87s | 484M | 188.8K |
| utoo-next | 3.26s | 0.01s | 5.86s | 1.57s | 628M | 81.9K |
| utoo-npm | 5.27s | 3.19s | 5.84s | 1.59s | 613M | 79.8K |
| utoo | 0.00s | 0.00s | 6.11s | 1.17s | 643M | 122.2K |
| PM | vCtx | iCtx | netRX | netTX | cache | node_mod | lock |
|---|---|---|---|---|---|---|---|
| bun | 14.8K | 3.6K | 205M | 3M | 108M | - | 1M |
| utoo-next | 87.5K | 107.3K | 202M | 3M | 7M | 3M | 2M |
| utoo-npm | 88.3K | 112.1K | 202M | 3M | 7M | 3M | 2M |
| utoo | 19.3K | 19.8K | 204M | 3M | 6M | - | - |
p3_cold_install
| PM | wall | ±σ | user | sys | RSS | pgMinor |
|---|---|---|---|---|---|---|
| bun | 6.99s | 2.70s | 6.03s | 6.31s | 574M | 207.7K |
| utoo-next | 8.67s | 2.91s | 5.22s | 7.36s | 497M | 59.3K |
| utoo-npm | 8.44s | 3.13s | 5.31s | 7.30s | 503M | 62.1K |
| utoo | 9.26s | 4.26s | 5.14s | 7.14s | 464M | 58.0K |
| PM | vCtx | iCtx | netRX | netTX | cache | node_mod | lock |
|---|---|---|---|---|---|---|---|
| bun | 10.6K | 7.4K | 1.00G | 4M | 1.77G | 1.77G | 1M |
| utoo-next | 125.8K | 57.2K | 1000M | 3M | 1.72G | 1.72G | 2M |
| utoo-npm | 130.5K | 56.3K | 1000M | 3M | 1.72G | 1.72G | 2M |
| utoo | 123.2K | 63.8K | 1001M | 4M | 1.72G | 1.72G | 2M |
p4_warm_link
| PM | wall | ±σ | user | sys | RSS | pgMinor |
|---|---|---|---|---|---|---|
| bun | 2.13s | 0.04s | 0.14s | 1.20s | 137M | 33.5K |
| utoo-next | 1.78s | 0.03s | 0.53s | 2.40s | 80M | 18.8K |
| utoo-npm | 1.84s | 0.01s | 0.52s | 2.43s | 80M | 18.5K |
| utoo | 1.73s | 0.07s | 0.39s | 2.15s | 52M | 11.6K |
| PM | vCtx | iCtx | netRX | netTX | cache | node_mod | lock |
|---|---|---|---|---|---|---|---|
| bun | 241 | 15 | 5M | 35K | 1.93G | 1.74G | 1M |
| utoo-next | 43.3K | 18.5K | 573B | 2K | 1.72G | 1.72G | 2M |
| utoo-npm | 44.1K | 18.9K | 2K | 971B | 1.72G | 1.72G | 2M |
| utoo | 22.1K | 13.8K | 2K | 5K | 1.73G | 1.72G | 2M |
npmmirror.com: no output captured.
GHA run 2 readRun: https://github.com/utooland/utoo/actions/runs/26016347609
Conclusion: keep as a p4 candidate. The useful boundary is still clone worker scheduling: run filesystem clone/hardlink work on rayon workers and let the scheduler receive completion events. It should not be used as evidence for p3 yet; p3 remains noisy and needs a separate lever. |
What
AB experiment for p3/p4 install/link ctx: move scheduler-owned clone workers from the nested
tokio::spawn -> spawn_blockingpath to a single rayon worker that reports completion back to the install scheduler main loop.Why
Current p4 is mostly warm-cache clone/link work. Each package clone currently enters a tokio task and then hops again into the blocking pool for sync hardlink/copy. This tests whether removing that extra async/blocking-pool hop reduces context switches without changing scheduler ownership, clone ordering, cache lookup, or package semantics.
Notes
OpDone::Cloneresult back.clone_packageremains for non-scheduler call sites.Validation
cargo fmtcargo test -p utoo-pmcargo clippy -p utoo-pm --all-targets -- -D warnings --no-depsBench plan
Trigger linux/npmjs phase bench and compare mainly:
7.62s, ctx142.2K/68.3K4.83s, ctx54.6K/23.5K; recent ABs around2.2s, ctx~52K/~24K