[GOBBLIN-2247] Tune RPC retry policy to tolerate 1-2 min throttling on Temporal gRPC calls#4176
Open
DaisyModi wants to merge 1 commit intoapache:masterfrom
Conversation
… calls Add configurable RpcRetryOptions to WorkflowServiceStubsOptions in TemporalWorkflowClientFactory. The defaults (initialInterval=500ms, backoffCoefficient=2.0, maximumInterval=30s, maximumAttempts=10) provide ~151s of cumulative retry budget, enough to ride out a 2-minute throttle burst without failing worker status reporting. New config keys under gobblin.temporal.rpc.retry.options.* allow per-environment tuning without code changes. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Dear Gobblin maintainers,
Please accept this PR. I understand that it will not be reviewed until I have checked off all the steps below!
JIRA
Description
Worker status reporting uses gRPC calls to the Temporal service.
TemporalWorkflowClientFactory.createServiceInstance()previously builtWorkflowServiceStubsOptionswithoutsetRpcRetryOptions(), leaving the SDK'sDefaultStubServiceOperationRpcRetryOptionsin effect (~10s expiry). This is insufficient to survive 1-2 minute throttle bursts.Changes:
GobblinTemporalConfigurationKeys: Added 4 config keys undergobblin.temporal.rpc.retry.options.*, following the same pattern as the existinggobblin.temporal.activity.retry.options.*keys.TemporalWorkflowClientFactory: AddedbuildRpcRetryOptions(Config)helper wired intoWorkflowServiceStubsOptionsvia.setRpcRetryOptions().Default values provide ~151.5s of cumulative retry budget (initialInterval=500ms, coefficient=2.0, maximumInterval=30s, maximumAttempts=10), covering a 2-minute throttle burst with buffer. All values are configurable via Typesafe Config.
Tests
buildRpcRetryOptionsis a straightforward config-to-SDK-object mapping with no branching logic to unit test. Integration-level throttling behavior is exercised by existing Temporal end-to-end tests.Commits