Commit 45ed802
committed
feat(supervisor): add node affinity rules for large machine worker pool scheduling
**Background**
Runs with `large-1x` or `large-2x` machine presets are disproportionally
affected by scheduling delays during peak times. This is in part caused
by the fact that the worker pool is shared for all runs, meaning large
runs compete with smaller runs for available capacity. Because large runs require
significantly more CPU and memory, they are harder for the scheduler to bin-pack onto existing
nodes, often requiring a node with a significant amount of free resources or
waiting for a new node to spin up entirely. This effect is amplified during peak times
when nodes are already densely packed with smaller workloads, leaving insufficient contiguous resources for large runs.
**Changes**
This PR adds Kubernetes node affinity settings to separate large and standard machine workloads across node pools.
- Controlled via KUBERNETES_LARGE_MACHINE_POOL_LABEL env var (disabled when not set)
- Large machine presets (large-*) get a soft preference to schedule on the large pool, with fallback to standard nodes
- Non-large machines are excluded from the large pool via required anti-affinity
- This ensures the large machine pool is reserved for large workloads while allowing large workloads to spill over to standard nodes if needed1 parent 839d5e8 commit 45ed802
2 files changed
+53
-0
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
91 | 91 | | |
92 | 92 | | |
93 | 93 | | |
| 94 | + | |
94 | 95 | | |
95 | 96 | | |
96 | 97 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
95 | 95 | | |
96 | 96 | | |
97 | 97 | | |
| 98 | + | |
98 | 99 | | |
99 | 100 | | |
100 | 101 | | |
| |||
356 | 357 | | |
357 | 358 | | |
358 | 359 | | |
| 360 | + | |
| 361 | + | |
| 362 | + | |
| 363 | + | |
| 364 | + | |
| 365 | + | |
| 366 | + | |
| 367 | + | |
| 368 | + | |
| 369 | + | |
| 370 | + | |
| 371 | + | |
| 372 | + | |
| 373 | + | |
| 374 | + | |
| 375 | + | |
| 376 | + | |
| 377 | + | |
| 378 | + | |
| 379 | + | |
| 380 | + | |
| 381 | + | |
| 382 | + | |
| 383 | + | |
| 384 | + | |
| 385 | + | |
| 386 | + | |
| 387 | + | |
| 388 | + | |
| 389 | + | |
| 390 | + | |
| 391 | + | |
| 392 | + | |
| 393 | + | |
| 394 | + | |
| 395 | + | |
| 396 | + | |
| 397 | + | |
| 398 | + | |
| 399 | + | |
| 400 | + | |
| 401 | + | |
| 402 | + | |
| 403 | + | |
| 404 | + | |
| 405 | + | |
| 406 | + | |
| 407 | + | |
| 408 | + | |
| 409 | + | |
| 410 | + | |
359 | 411 | | |
0 commit comments