Early yield on 429 throttling on barrier requests#48914
Open
mbhaskar wants to merge 6 commits into
Open
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
This PR updates Cosmos direct connectivity quorum/barrier logic to “yield early” when replica reads are uniformly throttled (HTTP 429), allowing the existing ResourceThrottleRetryPolicy to apply appropriate backoff instead of progressing into additional quorum/primary/barrier attempts.
Changes:
- Add
StoreResult.isThrottledExceptionto cheaply detect 429 responses. - In
QuorumReader, propagate 429 immediately when all collected replica results are throttled (including barrier paths). - In
ConsistencyWriter, track throttling during write barriers and, when retries are exhausted and the last attempt was fully throttled, throw aRequestTimeoutExceptionwith a new substatus code; add unit tests for the new behaviors.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/directconnectivity/StoreResult.java | Adds a computed flag to identify throttling (429) on replica results. |
| sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/directconnectivity/QuorumReader.java | Early-yields on replica-wide throttling to let throttle retry policy handle backoff. |
| sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/directconnectivity/ConsistencyWriter.java | Tracks throttling during write barriers and surfaces a distinct timeout substatus when retries are exhausted. |
| sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/HttpConstants.java | Introduces a new substatus code for write-barrier throttling exhaustion. |
| sdk/cosmos/azure-cosmos-tests/src/test/java/com/azure/cosmos/implementation/directconnectivity/QuorumReaderTest.java | Adds unit tests covering 429 propagation and Gone+429 interactions. |
| sdk/cosmos/azure-cosmos-tests/src/test/java/com/azure/cosmos/implementation/directconnectivity/ConsistencyWriterTest.java | Adds unit tests for write-barrier behavior under sustained throttling and mixed outcomes. |
xinlian12
reviewed
Apr 23, 2026
Member
|
@sdkReviewAgent |
Member
|
✅ Review complete (49:16) Posted 1 inline comment(s). Steps: ✓ context, correctness, cross-sdk, design, history, past-prs, synthesis, test-coverage |
Member
Author
|
/azp run java - cosmos - tests |
|
No pipelines are associated with this pull request. |
Member
Author
|
/azp run java - cosmos - tests |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Port of .NET PR #1667829: When receiving repeated 429 (Too Many Requests) responses with strong consistency, QuorumReader and ConsistencyWriter now handle throttling more efficiently. QuorumReader (reads): - waitForReadBarrierAsync: yield early when all replicas return 429 in both single-region and multi-region barrier loops - ensureQuorumSelectedStoreResponse: yield early when all replicas throttled during initial quorum read - All cases throw the 429 exception to let ResourceThrottleRetryPolicy handle retry with appropriate backoff ConsistencyWriter (writes): - waitForWriteBarrierAsync: track lastAttemptWasThrottled flag per iteration - Do NOT yield early (preserves idempotency guarantees) - When all retries exhausted due to consistent throttling, throw RequestTimeoutException (408) with substatus SERVER_WRITE_BARRIER_THROTTLED (21013) instead of returning barrier-not-met Other changes: - Added isThrottledException field to StoreResult - Added SERVER_WRITE_BARRIER_THROTTLED (21013) substatus code - Unit tests for all throttling scenarios Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ed replica yields early Port of .NET test ValidatesReadMultipleReplicaAsyncExcludesGoneReplicas. Validates that when replicas return a mix of 410 (Gone) and 429 (TooManyRequests): - Gone replicas are excluded from results by StoreReader (isValid=false for GONE) - The 429 replica with valid LSN headers is kept (isValid=true for non-GONE with lsn>=0) - Since all remaining replicas are throttled, early yield triggers - The 429 exception propagates to ResourceThrottleRetryPolicy Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Fixes from xinlian12 (blocking): - Fix lastAttemptWasThrottled stale state: reset flag before avoidQuorumSelection early return to prevent incorrect 408 when prior iteration was throttled but current iteration hits 410 - Fix readStrong_AllReplicasThrottled_Returns429 false positive: set LSN on exception so StoreReader marks isValid=true, ensuring the early yield path is actually exercised. Add transport invocation count assertions to verify primary read is NOT attempted. - Add readStrong_BarrierRequestsThrottled_Returns429 test covering the waitForReadBarrierAsync barrier path (quorum succeeds, then barrier HEAD requests return 429) Fixes from Copilot review: - Fix checkstyle: add missing spaces around = operator (2 places) - Fix log wording: 'All replicas' -> 'All contacted replicas' (more accurate since not all replicas may be contacted per attempt) - Fix ConsistencyWriter log: 'consistent throttling' -> 'last attempt was throttled' (flag only tracks last attempt, not all attempts) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…tling Unit tests (4 new, 10 total throttling tests): - writeBarrier_AvoidQuorumSelectionAfterThrottling_NoFalse408: validates lastAttemptWasThrottled reset on avoidQuorumSelection path (stale state fix) - writeBarrier_NRegionCommit_AllReplicasThrottled_Returns408: N-region synchronous commit barrier throttling produces 408/21013 - readStrong_QuorumNotSelected_PrimaryThrottled_Returns429: primary 429 propagates correctly through QuorumNotSelected → readPrimary path - readStrong_BarrierPartialThrottle_StillSucceeds: barrier succeeds when one replica is throttled but other meets LSN (no false-negative yield) Fault injection E2E tests (3 new, require strong consistency account): - faultInjection_readBarrierThrottled_yieldsEarly: inject 429 on HEAD_COLLECTION + GCLSN interceptor → verify early yield on reads - faultInjection_writeBarrierThrottled_returns408: inject 429 on HEAD_COLLECTION + GCLSN interceptor → verify 408 on writes - faultInjection_readBarrierThrottled_thenRecovers: inject 429 with hitLimit(2) → verify read succeeds after throttle clears Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Read and write barrier requests are only triggered on multi-region strong consistency accounts (numberOfReadRegions > 0). The emulator is single-region, so the GCLSN interceptor never triggers barriers and the tests fail with empty supplementalResponseStatisticsList. Added accountLevelReadRegions.size() > 1 skip check to all three E2E fault injection tests so they correctly skip on single-region environments. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
28b1fae to
3cd6163
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
This PR introduces early yield on 429s during barrier requests.
When receiving 429s with strong consistency, quorum reader/ writer code does not yield early enough creating multiple stack traces resulting into resource constraints on the client side.
All SDK Contribution checklist:
General Guidelines and Best Practices
Testing Guidelines