Skip to content

perf(pm): prioritize clone unblockers#2991

Draft
elrrrrrrr wants to merge 1 commit into
exp/pm-install-clone-batch-completionsfrom
exp/pm-install-prioritize-clone-unblockers
Draft

perf(pm): prioritize clone unblockers#2991
elrrrrrrr wants to merge 1 commit into
exp/pm-install-clone-batch-completionsfrom
exp/pm-install-prioritize-clone-unblockers

Conversation

@elrrrrrrr
Copy link
Copy Markdown
Contributor

@elrrrrrrr elrrrrrrr commented May 19, 2026

Summary

This is a p3/p4 scheduler experiment on top of #2989.

#2989 batches clone completions to reduce scheduler wakeups, but that can delay parent package completion inside a batch. A parent clone completion is on the materialize critical path because it releases blocked_by_parent children. This PR keeps ordinary leaf clones batched, while prioritizing ready parent unblockers and sending them as a single-job batch.

Hypothesis

  • p4 warm link should improve or stay neutral if nested placements enter the queue earlier.
  • p3 cold install may improve when extract/download work and clone materialization overlap, because clone critical-path parents no longer wait behind unrelated hardlink work in the same batch.
  • ctx may rise slightly if many parent clones become single-item batches; this is the tradeoff to validate.

Validation

  • cargo fmt
  • cargo test -p utoo-pm install_scheduler
  • cargo clippy -p utoo-pm --all-targets -- -D warnings --no-deps

Bench Result

GHA Linux npmjs run 26093544293:

phase utoo wall vCtx iCtx
p3 cold install 5.21s 48.4K 34.3K
p4 warm link 1.86s 7.6K 4.4K

Compared with #2989 latest npmjs reference (p3 5.29s, 53.0K/32.9K; p4 2.23s, 7.8K/4.5K), this is directionally positive. p4 is the strongest signal because this PR directly changes warm materialize scheduling.

Conclusion: keep as a candidate; the hypothesis that batched parent clone completions delay nested placement is supported.

@elrrrrrrr elrrrrrrr added A-Pkg Manager Area: Package Manager benchmark Run pm-bench on PR labels May 19, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the clone batching logic in install_scheduler.rs to prioritize 'unblocker' jobs—parent clones that unblock nested dependencies. It introduces take_clone_batch and pop_next_clone_job to handle this prioritization, ensuring that unblockers are processed individually to allow their children to enter the queue sooner. Unit tests were added to verify the prioritization and batching behavior. Review feedback highlights a potential performance bottleneck in pop_next_clone_job due to its O(N) complexity and suggests using separate queues for unblocker and leaf jobs to achieve O(1) performance.

Comment on lines +551 to +563
fn pop_next_clone_job(&mut self) -> Option<ReadyClone> {
let unblocker_index = if self.blocked_by_parent.is_empty() {
None
} else {
self.clone_queue
.iter()
.position(|job| self.blocked_by_parent.contains_key(&job.spec.target))
};
match unblocker_index {
Some(index) => self.clone_queue.remove(index),
None => self.clone_queue.pop_front(),
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The current implementation of pop_next_clone_job iterates through the clone_queue to find a priority job. This is an O(N) operation, and VecDeque::remove() is also O(N) on average. This could introduce a performance bottleneck if the queue becomes large.

For a more efficient implementation, consider using two separate queues: one for high-priority "unblocker" jobs and one for regular "leaf" jobs.

  1. In SchedulerState, replace clone_queue: VecDeque<ReadyClone> with:

    unblocker_clone_queue: VecDeque<ReadyClone>,
    leaf_clone_queue: VecDeque<ReadyClone>,
  2. When a job is ready to be cloned (e.g., in handle_done for SeededCache and in complete_download), classify it and push it to the appropriate queue:

    let ready_clone = ReadyClone { spec, cache };
    if self.blocked_by_parent.contains_key(&ready_clone.spec.target) {
        self.unblocker_clone_queue.push_back(ready_clone);
    } else {
        self.leaf_clone_queue.push_back(ready_clone);
    }
  3. Then, pop_next_clone_job can be simplified to an O(1) operation. It could also return whether the job is an unblocker to avoid a second lookup in take_clone_batch:

    fn pop_next_clone_job(&mut self) -> Option<(ReadyClone, bool)> {
        if let Some(job) = self.unblocker_clone_queue.pop_front() {
            return Some((job, true));
        }
        if let Some(job) = self.leaf_clone_queue.pop_front() {
            return Some((job, false));
        }
        None
    }

    This would require adjusting take_clone_batch to handle the new return type.

This approach would eliminate the linear scan and make the scheduling logic more efficient, which could be important for the performance experiment you are running.

@github-actions
Copy link
Copy Markdown

📊 pm-bench-phases · 39262ac · linux (ubuntu-latest)

Workflow run — ant-design

PMs: utoo (this branch) · utoo-npm (latest published) · bun (latest)

npmjs.org

p0_full_cold

PM wall ±σ user sys RSS pgMinor
bun 8.99s 0.25s 10.36s 10.29s 753M 337.1K
utoo-next 8.95s 0.88s 10.47s 12.57s 1007M 125.2K
utoo-npm 7.92s 0.06s 10.51s 12.07s 980M 117.5K
utoo 8.52s 1.65s 11.20s 11.20s 924M 143.5K
PM vCtx iCtx netRX netTX cache node_mod lock
bun 14.5K 18.4K 1.19G 6M 1.86G 1.75G 1M
utoo-next 141.1K 98.9K 1.16G 5M 1.71G 1.70G 2M
utoo-npm 118.3K 79.6K 1.16G 5M 1.71G 1.70G 2M
utoo 81.1K 53.9K 1.16G 7M 1.71G 1.70G 2M

p1_resolve

PM wall ±σ user sys RSS pgMinor
bun 1.88s 0.03s 4.03s 1.04s 507M 163.7K
utoo-next 3.00s 0.01s 5.37s 2.09s 612M 88.7K
utoo-npm 2.99s 0.01s 5.53s 2.03s 604M 78.3K
utoo 2.37s 0.03s 5.95s 1.63s 644M 118.4K
PM vCtx iCtx netRX netTX cache node_mod lock
bun 8.2K 4.6K 203M 3M 107M - 1M
utoo-next 70.8K 87.2K 200M 2M 7M 3M 2M
utoo-npm 71.0K 91.1K 200M 2M 7M 3M 2M
utoo 14.0K 17.0K 202M 3M 7M 3M 2M

p3_cold_install

PM wall ±σ user sys RSS pgMinor
bun 6.90s 0.30s 6.28s 9.95s 609M 203.8K
utoo-next 7.22s 2.10s 4.95s 10.93s 501M 63.8K
utoo-npm 6.65s 1.45s 5.08s 11.12s 501M 65.0K
utoo 5.12s 0.18s 4.88s 9.44s 517M 58.1K
PM vCtx iCtx netRX netTX cache node_mod lock
bun 4.1K 7.0K 1018M 3M 1.76G 1.76G 1M
utoo-next 110.2K 48.4K 989M 3M 1.70G 1.70G 2M
utoo-npm 115.3K 53.0K 989M 3M 1.70G 1.70G 2M
utoo 50.3K 33.1K 989M 2M 1.70G 1.70G 2M

p4_warm_link

PM wall ±σ user sys RSS pgMinor
bun 3.28s 0.03s 0.21s 2.35s 134M 32.9K
utoo-next 2.33s 0.12s 0.48s 3.76s 78M 18.4K
utoo-npm 2.26s 0.01s 0.49s 3.79s 80M 18.7K
utoo 2.08s 0.04s 0.33s 3.30s 50M 11.1K
PM vCtx iCtx netRX netTX cache node_mod lock
bun 320 29 5M 33K 1.91G 1.75G 1M
utoo-next 40.0K 18.6K 2K 8K 1.70G 1.70G 2M
utoo-npm 44.5K 20.2K 1K 5K 1.70G 1.70G 2M
utoo 7.7K 4.6K 3K 24K 1.71G 1.70G 2M

npmmirror.com: no output captured.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-Pkg Manager Area: Package Manager benchmark Run pm-bench on PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant