Skip to content

perf(pm): batch install clone completions#2989

Draft
elrrrrrrr wants to merge 2 commits into
exp/pm-install-inline-download-futuresfrom
exp/pm-install-clone-batch-completions
Draft

perf(pm): batch install clone completions#2989
elrrrrrrr wants to merge 2 commits into
exp/pm-install-inline-download-futuresfrom
exp/pm-install-clone-batch-completions

Conversation

@elrrrrrrr
Copy link
Copy Markdown
Contributor

@elrrrrrrr elrrrrrrr commented May 19, 2026

Summary

  • Batch install clone completions so the scheduler receives one completion message for a small group of clone jobs instead of one per package.
  • Keep clone materialization on rayon workers; the scheduler only dispatches work and records completion state.
  • Use a bounded clone worker count derived from the existing clone concurrency limit, with a small fixed batch size to avoid long serialized hardlink runs.
  • Integrate perf(pm): prioritize clone unblockers #2991: prioritize parent clone unblockers and send them as single-job batches so nested dependency placement is not delayed behind unrelated hardlink work in the same batch.

Why

p3/p4 are dominated by materialization pressure after the earlier install scheduler changes. Batching clone completions reduces scheduler wakeups, but parent packages are on the nested placement critical path. If a parent clone completes early inside a batch but its completion is reported only after other leaf clones finish, its children cannot enter the queue. The #2991 follow-up fixes that without disabling batching for ordinary leaf clones.

Validation

  • cargo fmt
  • cargo test -p utoo-pm install_scheduler
  • cargo clippy -p utoo-pm --all-targets -- -D warnings --no-deps

Note: full workspace cargo clippy --all-targets -- -D warnings --no-deps is currently blocked in this worktree by pack-core/next.js submodule API mismatch, unrelated to the PM scheduler change.

Benchmark

Latest #2989 reference before #2991 integration, GHA Linux npmjs run 26089858682:

phase utoo wall utoo vCtx/iCtx
p0_full_cold 7.48s 66.5K / 43.2K
p1_resolve 2.37s 15.2K / 18.2K
p3_cold_install 5.29s 53.0K / 32.9K
p4_warm_link 2.23s 7.8K / 4.5K

#2991 AB before integration, GHA Linux npmjs run 26093544293:

phase utoo wall utoo vCtx/iCtx
p3_cold_install 5.21s 48.4K / 34.3K
p4_warm_link 1.86s 7.6K / 4.4K

Integrated #2989 head (fd38cfff), GHA Linux npmjs run 26094747554:

phase utoo wall utoo vCtx/iCtx
p0_full_cold 7.28s 67.4K / 44.2K
p1_resolve 2.39s 14.5K / 19.7K
p3_cold_install 5.30s 52.1K / 35.0K
p4_warm_link 2.07s 7.5K / 4.5K

Latest full green integrated run 26095851789:

phase utoo wall utoo vCtx/iCtx
p0_full_cold 10.51s 89.5K / 57.4K
p1_resolve 2.43s 15.0K / 21.2K
p3_cold_install 5.66s 54.7K / 37.6K
p4_warm_link 2.16s 7.8K / 4.6K

Conclusion: #2991 remains directionally positive for p4/warm materialize (2.23s -> 2.07s/2.16s, ctx stable). The latest full-green run has p0/p3 noise, so p4 is the clearer signal for this change. p3 is neutral to slightly noisy, which is acceptable because this change targets clone/materialize ordering rather than cold extract/cache population.

Rejected / Inconclusive Follow-ups

PR idea result
#2992 Pump clones before extracts p3 regressed (6.22s, 56.8K/36.5K); reject for now.
#2993 Halve extract slots while clone work exists p3 wall improved (4.84s) but ctx regressed (59.0K/39.2K) and p4 no-op control moved heavily; inconclusive.

Additional GHA rerun: p0 noise check

Label-triggered Linux npmjs rerun 26102023925:

phase bun utoo-next utoo-npm utoo utoo ctx
p0 full cold 9.15s 11.98s 8.32s 7.55s 69.8K / 45.2K
p1 resolve 2.09s 3.12s 3.10s 2.41s 14.1K / 18.5K
p3 cold install 6.68s 6.46s 6.13s 8.07s 71.9K / 42.5K
p4 warm link 3.38s 2.44s 2.35s 2.08s 7.6K / 4.4K

This confirms the previous p0 10.51s result was runner noise; p0 returns to the expected fast range. p3 was noisy in the opposite direction on this rerun, so p3 should continue to be judged from multiple runs rather than this single sample.

@elrrrrrrr elrrrrrrr added A-Pkg Manager Area: Package Manager benchmark Run pm-bench on PR labels May 19, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces batching for clone operations in the installation scheduler to improve performance by reducing scheduler wakeups. It replaces individual clone tasks with batches of up to three operations and introduces a worker-based concurrency limit. Feedback indicates that the heuristic used to calculate the worker limit causes total concurrency to scale non-linearly and potentially exceed intended limits. It is suggested to either refine the formula to account for the batch size or document the rationale for the current implementation.

Comment on lines +634 to +639
fn clone_worker_limit(clone_limit: usize) -> usize {
clone_limit
.saturating_div(2)
.saturating_add(2)
.clamp(1, clone_limit.max(1))
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The clone_worker_limit calculation uses a heuristic (limit / 2 + 2) that doesn't explicitly account for CLONE_BATCH_LIMIT. This results in a total potential concurrency (workers * batch size) that scales non-linearly with the original clone_limit. For example, a clone_limit of 4 results in 4 workers (up to 12 concurrent clones), while a clone_limit of 16 results in 10 workers (up to 30 concurrent clones).

If the intention is to maintain a total concurrency close to the original clone_limit while batching, consider a formula like (clone_limit / CLONE_BATCH_LIMIT).max(1). If the increased concurrency is intentional to saturate the Rayon pool, documenting the rationale for this specific heuristic would improve maintainability.

@elrrrrrrr elrrrrrrr added benchmark Run pm-bench on PR and removed benchmark Run pm-bench on PR labels May 19, 2026
@github-actions
Copy link
Copy Markdown

📊 pm-bench-phases · e4bbe06 · linux (ubuntu-latest)

Workflow run — ant-design

PMs: utoo (this branch) · utoo-npm (latest published) · bun (latest)

npmjs.org

p0_full_cold

PM wall ±σ user sys RSS pgMinor
bun 9.30s 0.27s 10.44s 10.07s 751M 345.7K
utoo-next 8.06s 0.17s 10.73s 12.23s 973M 121.7K
utoo-npm 8.32s 0.10s 11.00s 12.48s 1008M 123.8K
utoo 7.34s 0.18s 11.50s 11.04s 976M 143.6K
PM vCtx iCtx netRX netTX cache node_mod lock
bun 13.9K 17.1K 1.19G 6M 1.86G 1.75G 1M
utoo-next 115.4K 81.4K 1.16G 4M 1.71G 1.70G 2M
utoo-npm 130.4K 94.8K 1.16G 5M 1.71G 1.70G 2M
utoo 65.1K 45.7K 1.16G 6M 1.71G 1.70G 2M

p1_resolve

PM wall ±σ user sys RSS pgMinor
bun 1.97s 0.10s 4.10s 1.01s 521M 166.2K
utoo-next 3.05s 0.04s 5.53s 1.99s 620M 83.6K
utoo-npm 3.04s 0.03s 5.60s 2.00s 610M 83.5K
utoo 2.39s 0.01s 6.14s 1.64s 631M 122.5K
PM vCtx iCtx netRX netTX cache node_mod lock
bun 7.8K 4.5K 202M 3M 108M - 1M
utoo-next 68.3K 85.8K 199M 2M 7M 3M 2M
utoo-npm 68.7K 87.2K 199M 2M 7M 3M 2M
utoo 14.9K 19.3K 202M 3M 7M 3M 2M

p3_cold_install

PM wall ±σ user sys RSS pgMinor
bun 6.68s 0.19s 6.29s 9.85s 606M 206.3K
utoo-next 9.25s 2.74s 5.11s 11.25s 458M 63.1K
utoo-npm 6.95s 1.69s 5.24s 11.13s 512M 63.0K
utoo 5.30s 0.25s 4.99s 9.62s 564M 60.2K
PM vCtx iCtx netRX netTX cache node_mod lock
bun 4.1K 6.4K 1018M 3M 1.76G 1.76G 1M
utoo-next 110.4K 47.3K 988M 3M 1.70G 1.70G 3M
utoo-npm 106.8K 50.2K 988M 3M 1.70G 1.70G 3M
utoo 52.4K 33.8K 988M 2M 1.70G 1.70G 3M

p4_warm_link

PM wall ±σ user sys RSS pgMinor
bun 3.34s 0.09s 0.21s 2.39s 134M 32.8K
utoo-next 2.37s 0.02s 0.56s 3.93s 79M 18.2K
utoo-npm 2.37s 0.05s 0.54s 3.91s 79M 18.7K
utoo 2.20s 0.04s 0.36s 3.38s 50M 11.0K
PM vCtx iCtx netRX netTX cache node_mod lock
bun 278 21 5M 22K 1.91G 1.73G 1M
utoo-next 44.8K 21.1K 3K 25K 1.70G 1.70G 2M
utoo-npm 44.3K 20.3K 3K 23K 1.70G 1.70G 2M
utoo 7.9K 4.6K 2K 6K 1.71G 1.70G 2M

npmmirror.com: no output captured.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-Pkg Manager Area: Package Manager benchmark Run pm-bench on PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant