-
Notifications
You must be signed in to change notification settings - Fork 263
A113: pick_first: Weighted Random Shuffling #535
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
+107
−0
Merged
Changes from all commits
Commits
Show all changes
9 commits
Select commit
Hold shift + click to select a range
62f5028
A113: pick_first: Weighted Random Shuffling
apolcyn 88e4950
update status
apolcyn ac00a0a
respond comments
apolcyn b3ae299
comments
apolcyn d7e4a89
correction
apolcyn 58f255c
reviewer comments
apolcyn f293726
add tag back in
apolcyn 102ff25
reworded again
apolcyn 12bcad5
change citation wording
apolcyn File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,107 @@ | ||
| A113: pick_first: Weighted Random Shuffling | ||
| ---- | ||
| * Author(s): Alex Polcyn (@apolcyn), Eric Anderson (@ejona86) | ||
| * Approver: Mark Roth (@markdroth), Eric Anderson (@ejona86), Doug Fawley (@dfawley), Easwar Swaminathan (@easwars) | ||
| * Status: In Review | ||
| * Implemented in: <language, ...> | ||
| * Last updated: Jan 26, 2026 | ||
| * Discussion at: https://groups.google.com/g/grpc-io/c/iCsweGDmUU4 | ||
|
|
||
| ## Abstract | ||
|
|
||
| Support weighted random shuffling in the pick first LB policy. | ||
|
|
||
| ## Background | ||
|
|
||
| The pick first LB policy currently supports random shuffling. A primary intention of the feature | ||
| is for load balancing, however it does not take (possibly present) locality or endpoint weights | ||
| into account. Naturally this can lead to skewed load distribution and hotspots, when the load | ||
| balancing control plane delivers varied weights and expects them to be followed. | ||
|
|
||
|
|
||
| ### Related Proposals: | ||
| * [A62](https://github.com/grpc/proposal/blob/master/A62-pick-first.md): pick_first: sticky TRANSIENT_FAILURE and address order randomization | ||
| * [A42](https://github.com/grpc/proposal/blob/master/A42-xds-ring-hash-lb-policy.md) xDS Ring Hash LB Policy | ||
|
|
||
| ## Proposal | ||
|
|
||
| ### Changes within Pick First | ||
|
|
||
| Modify behavior of pick_first when the `shuffle_address_list` option is set, and | ||
| perform a weighted random sort *based on per-endpoint weights*. To do this, we will | ||
| use the [Weighted Random Sampling](https://utopia.duth.gr/~pefraimi/research/data/2007EncOfAlg.pdf) algorithm | ||
| proposed by Efraimidis, Spirakis: | ||
|
|
||
| 1) Assign a key to each endpoint, `u ^ (1 / weight)`, where `u` is a uniform random number in `(0, 1)` and weight | ||
| is the weight of the endpoint (as present in a weight attribute). Default `weight` to 1 if no weight attribute is | ||
| present. | ||
|
|
||
| 2) Sort endpoints by key in *descending* order. | ||
|
|
||
| Note: the paper suggests `u` be in `(0, 1)` *exclusive*. Random numbers *on* zero or one effectively | ||
| drop their weight. Also, technically zero will not transform to the exponential distribution that we are trying | ||
| to create. However, load balancing skew introduced by such edge cases is unlikely to be noticeable, and so | ||
| implementations are free to include these bounds so long as it does not cause other problems | ||
| (e.g. crashes). | ||
|
|
||
|
|
||
| ### CDS LB Policy changes: Computing Endpoint Weights | ||
|
|
||
| In XDS, we have a notion of both locality and endpoint weights. The expectation of the load balancing | ||
| control plane is to *first* pick locality and *second* pick endpoint. The total probability distribution | ||
| reflected by per-endpoint weights must reflect this. As such, we need to normalize locality weights within | ||
| each priority and endpoint weights within locality; the final weight provided to `pick_first` should be a | ||
| product of the two normalized weights (i.e. a logical AND of the two selection events). | ||
|
|
||
| The CDS LB policy currently calculates per-endpoint weight attributes, and it will continue to do so. | ||
| However, we need to fix the mechanics: an endpoint's final weight should be the product of its *normalized* | ||
| locality weight and *normalized* endpoint weight, rather than their product outright. | ||
|
|
||
| Note: as a side effect this will fix per-endpoint weights in Ring Hash LB, which | ||
| [currently](https://github.com/grpc/proposal/blob/master/A42-xds-ring-hash-lb-policy.md#change-child-policy-config-generation-in-xds_cluster_resolver-policy) are a product of the initial *raw* locality and endpoint weights. | ||
| This "fix" will not require any changes within Ring Hash LB itself. | ||
|
|
||
| We can continue to represent weights as integers if we represent their normalized values in | ||
| fixed point UQ1.31 format. Math as follows: | ||
|
|
||
| ``` | ||
| // To normalize: | ||
| uint32_t ONE = 1 << 31; | ||
| uint32_t weight = (uint64_t) weight * ONE / weight_sum; | ||
|
|
||
| // To multiply the weights for an endpoint: | ||
| weight = ((uint64_t) locality_weight * weight) >> 31; | ||
| if (weight == 0) weight = 1; | ||
| ``` | ||
|
|
||
| Note: currently we round down to zero (and then up if we hit zero). | ||
| We *could* use more accurate rounding schemes. However, rounding down | ||
| is simple and should provide enough precision for load balancing | ||
| purposes. For example, we only round down to zero if the product of | ||
| two normalized weight probabilities is less than `2 ^ -31`, this kind | ||
| of error is unlikely to cause noticeable skew in load balancing. | ||
|
|
||
| ### Temporary environment variable protection | ||
|
|
||
| CDS LB policy and Pick First LB policy behavior changes will be guarded by `GRPC_EXPERIMENTAL_PF_WEIGHTED_SHUFFLING`. | ||
|
easwars marked this conversation as resolved.
|
||
|
|
||
| This should be enabled by default, after testing. | ||
|
|
||
| ## Rationale | ||
|
|
||
| CDS LB policy changes are needed to generate correct weight distributions, not only for Pick First but | ||
| also for Ring Hash. | ||
|
|
||
| Reasons for UQ1.31 fixed point integers: | ||
|
|
||
| - Predictable and acceptable bounds on precision. | ||
| - Allows us to continue representing weights as integers internally. | ||
| - Avoids risk of overflow bugs by preserving the (XDS) property that the sum of all weights within | ||
| a "grouping" does not exceed max uint32. For example note how if we used UQ32, *after* | ||
| normalization and multiplication a subsequent summation of endpoint weights in a locality may | ||
| result in uint32 overflow due to contributions of rounding errors. | ||
|
|
||
| ## Implementation | ||
|
|
||
| TBD | ||
|
|
||
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.