Skip to content

feature: Opportunistic drain balancing#68

Merged
seanjnkns merged 1 commit into
mainfrom
sjenkins/features/drain
May 1, 2026
Merged

feature: Opportunistic drain balancing#68
seanjnkns merged 1 commit into
mainfrom
sjenkins/features/drain

Conversation

@seanjnkns
Copy link
Copy Markdown
Contributor

@seanjnkns seanjnkns commented Apr 28, 2026

Introduces opportunistic balancing of pgs during Drain

From testing, and also having 124 remapped pgs due to osd.23 reweight values temporarily adjusted, we have the following pg -> osd spread:

Projected PG load spread (drain off OSD 0 → OSDs 23–46)

Comparison

Reservation-only (prior behavior) PG count + reservations + tie-break (current)
Final min / max 158 / 193 185 / 188
Final spread 35 3

Per-OSD projected Final (new algorithm)

OSD Baseline +Upmap Final
23 158 27 185
24 164 23 187
25 171 14 185
26 173 13 186
27 174 13 187
28 173 13 186
29 173 13 186
30 172 13 185
31 172 15 187
32 175 12 187
33 173 13 186
34 173 13 186
35 173 13 186
36 174 12 186
37 174 12 186
38 173 13 186
39 173 13 186
40 173 14 187
41 173 13 186
42 173 13 186
43 173 13 186
44 173 13 186
45 173 15 188
46 174 13 187

Tie-breaking can swap a single PG between targets with identical rank (e.g. OSDs 32 vs 35); spread stays in the same band.


Prior behavior — per-OSD Final (reservation-only, from old algorithm)

OSD Baseline +Upmap Final
23 158 0 158
24 164 13 177
25 171 17 188
26 173 18 191
27 174 13 187
28 173 17 190
29 173 18 191
30 172 13 185
31 172 17 189
32 175 18 193
33 173 13 186
34 173 17 190
35 173 18 191
36 174 13 187
37 174 17 191
38 173 18 191
39 173 13 186
40 173 17 190
41 173 18 191
42 173 13 186
43 173 17 190
44 173 18 191
45 173 13 186
46 174 17 191

@seanjnkns seanjnkns linked an issue Apr 28, 2026 that may be closed by this pull request
@seanjnkns
Copy link
Copy Markdown
Contributor Author

This algorithm change will also benefit undo-umap, achieving a more balanced distribution.

@seanjnkns seanjnkns force-pushed the sjenkins/features/drain branch from 403dc58 to ed41293 Compare April 30, 2026 23:28
@Matt1360
Copy link
Copy Markdown
Member

Matt1360 commented May 1, 2026

To be clear, this does change the priority of the drain command from concurrency focus to balance focus, which is a tradeoff that I don't think matters for our use case today since we can control that better if we need to. This effectively removes the need for data to move twice, albeit slower, which is likely a win to most.

@seanjnkns seanjnkns merged commit 3a87b03 into main May 1, 2026
4 checks passed
@seanjnkns seanjnkns deleted the sjenkins/features/drain branch May 3, 2026 23:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

drain host is not filling target OSDs evenly

2 participants