Skip to content

cosmos: Enable endToEndTimeout for queryDocumentChangeFeed operation#48144

Open
mbhaskar wants to merge 2 commits into
Azure:mainfrom
mbhaskar:e2etimeout-on-changefeed
Open

cosmos: Enable endToEndTimeout for queryDocumentChangeFeed operation#48144
mbhaskar wants to merge 2 commits into
Azure:mainfrom
mbhaskar:e2etimeout-on-changefeed

Conversation

@mbhaskar
Copy link
Copy Markdown
Member

Description

Fixes #40507

endToEndTimeout is supported for read and query operations via CosmosEndToEndOperationLatencyPolicyConfig, but the queryDocumentChangeFeed operation previously ignored this configuration entirely.

Changes

  • CosmosChangeFeedRequestOptionsImpl: Added endToEndOperationLatencyPolicyConfig field and implemented the previously stubbed getCosmosEndToEndLatencyPolicyConfig() getter. Added corresponding setter.
  • CosmosChangeFeedRequestOptions (public API): Added setCosmosEndToEndOperationLatencyPolicyConfig() method, mirroring the same API on CosmosQueryRequestOptions.
  • RxDocumentClientImpl: Added getChangeFeedResponseFluxWithTimeout() helper and wired it into queryDocumentChangeFeed() - extracting the e2e policy config from request options and wrapping the change feed flux with timeout when enabled.
  • EndToEndTimeOutValidationTests: Added queryChangeFeedWithEndToEndTimeoutPolicyInOptionsShouldTimeout() test using READ_FEED_ITEM fault injection, verifying OperationCancelledException with CLIENT_OPERATION_TIMEOUT substatus is thrown.

- Add endToEndOperationLatencyPolicyConfig field to CosmosChangeFeedRequestOptionsImpl
  and implement the previously stubbed getCosmosEndToEndLatencyPolicyConfig() method
- Add public setCosmosEndToEndOperationLatencyPolicyConfig() API to
  CosmosChangeFeedRequestOptions mirroring CosmosQueryRequestOptions
- Add getChangeFeedResponseFluxWithTimeout() helper in RxDocumentClientImpl
  for wrapping change feed flux with e2e timeout logic
- Apply e2e timeout in queryDocumentChangeFeed() by extracting the config
  from request options and wrapping the flux when enabled
- Add queryChangeFeedWithEndToEndTimeoutPolicyInOptionsShouldTimeout test
  to EndToEndTimeOutValidationTests using READ_FEED_ITEM fault injection

Fixes Azure#40507

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings February 26, 2026 20:25
@mbhaskar mbhaskar requested review from a team and kirankumarkolli as code owners February 26, 2026 20:25
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR enables end-to-end timeout support for change feed query operations by implementing the previously stubbed getCosmosEndToEndLatencyPolicyConfig() method in CosmosChangeFeedRequestOptionsImpl and wiring the timeout logic into the queryDocumentChangeFeed() method in RxDocumentClientImpl.

Changes:

  • Added endToEndOperationLatencyPolicyConfig field and public setter to enable end-to-end timeout configuration on change feed requests
  • Implemented timeout wrapping for change feed response flux using a new helper method that mirrors the pattern used for query operations
  • Added test coverage for timeout behavior using fault injection to verify OperationCancelledException with CLIENT_OPERATION_TIMEOUT substatus

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.

File Description
CosmosChangeFeedRequestOptions.java Added public API method setCosmosEndToEndOperationLatencyPolicyConfig() to allow configuring end-to-end timeout policy on a per-request basis
CosmosChangeFeedRequestOptionsImpl.java Added private field, getter, setter, and copy constructor support for endToEndOperationLatencyPolicyConfig
RxDocumentClientImpl.java Added getChangeFeedResponseFluxWithTimeout() helper method and integrated it into queryDocumentChangeFeed() to apply timeout wrapping when enabled
EndToEndTimeOutValidationTests.java Added queryChangeFeedWithEndToEndTimeoutPolicyInOptionsShouldTimeout() test using READ_FEED_ITEM fault injection to verify timeout behavior

Copy link
Copy Markdown
Member

@FabianMeiswinkel FabianMeiswinkel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - Thx

Copy link
Copy Markdown
Member

@kushagraThapar kushagraThapar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing changelog

jeet1995

This comment was marked as outdated.

@xinlian12
Copy link
Copy Markdown
Member

@sdkReviewAgent-2

@xinlian12
Copy link
Copy Markdown
Member

PR Review Agent — Starting review...

.getImpl(requestOptions);

CosmosEndToEndOperationLatencyPolicyConfig endToEndPolicyConfig =
this.getEffectiveEndToEndOperationLatencyPolicyConfig(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Recommendation — Behavioral Change: PPAF/client-level e2e timeout now applies to Change Feed Processor

With getEffectiveEndToEndOperationLatencyPolicyConfig resolving the effective policy, change feed operations are now subject to:

  1. Client-level cosmosEndToEndOperationLatencyPolicyConfig (if set on the builder)
  2. PPAF-enforced defaults (ppafEnforcedE2ELatencyPolicyConfigForReads) — since OperationType.ReadFeed returns true for isReadOnlyOperation()

Previously, CosmosChangeFeedRequestOptionsImpl.getCosmosEndToEndLatencyPolicyConfig() returned null, so queryDocumentChangeFeed never applied timeout. Now, even when the user doesn't explicitly set a timeout on CosmosChangeFeedRequestOptions, the effective config resolution can produce a non-null timeout.

Why this matters: The Change Feed Processor (CFP) creates its own CosmosChangeFeedRequestOptions without setting e2e timeout. PartitionedByIdCollectionRequestOptionsFactory explicitly disables e2e timeout for lease operations (CosmosItemRequestOptions, CosmosQueryRequestOptions) but does not create a disabled config for the data-path change feed queries. If a user configures client-level e2e timeout (e.g., 5s for point reads) and also uses the Change Feed Processor, CFP's change feed queries could now receive unexpected OperationCancelledException.

Suggested action: Consider whether CFP should explicitly set a disabled/null e2e config on its change feed options, or document this behavioral change so users can adjust their client-level config accordingly.

⚠️ AI-generated review — may be incorrect. Agree? → resolve the conversation. Disagree? → reply with your reasoning.

- Add CHANGELOG entry for end-to-end timeout on queryChangeFeed.
- Propagate endToEndOperationLatencyPolicyConfig in
  CosmosChangeFeedRequestOptionsImpl.override(CosmosRequestOptions) so
  CosmosOperationPolicy-set timeouts are honored for change feed
  operations (mirrors CosmosQueryRequestOptionsBase.override()).
- Generalize getFeedResponseFluxWithTimeout to accept nullable
  CosmosQueryRequestOptions / isQueryCancelledOnTimeout and delete the
  duplicate getChangeFeedResponseFluxWithTimeout helper. Guard
  applyExceptionToMergedDiagnosticsForQuery against null requestOptions.
- Add queryChangeFeedWithEndToEndTimeoutPolicyAndAvailabilityStrategyShouldTimeout
  test to cover change feed with a ThresholdBasedAvailabilityStrategy.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE REQ]Enable e2e timeout for changeFeed query

6 participants