Skip to content

[GOBBLIN-2247] Tune RPC retry policy to tolerate 1-2 min throttling on Temporal gRPC calls#4176

Open
DaisyModi wants to merge 1 commit intoapache:masterfrom
DaisyModi:dmodi/tune-rpc-retry-policy-for-throttling-resilience
Open

[GOBBLIN-2247] Tune RPC retry policy to tolerate 1-2 min throttling on Temporal gRPC calls#4176
DaisyModi wants to merge 1 commit intoapache:masterfrom
DaisyModi:dmodi/tune-rpc-retry-policy-for-throttling-resilience

Conversation

@DaisyModi
Copy link

Dear Gobblin maintainers,

Please accept this PR. I understand that it will not be reviewed until I have checked off all the steps below!

JIRA

Description

  • Here are some details about my PR, including screenshots (if applicable):

Worker status reporting uses gRPC calls to the Temporal service. TemporalWorkflowClientFactory.createServiceInstance() previously built WorkflowServiceStubsOptions without setRpcRetryOptions(), leaving the SDK's DefaultStubServiceOperationRpcRetryOptions in effect (~10s expiry). This is insufficient to survive 1-2 minute throttle bursts.

Changes:

  • GobblinTemporalConfigurationKeys: Added 4 config keys under gobblin.temporal.rpc.retry.options.*, following the same pattern as the existing gobblin.temporal.activity.retry.options.* keys.
  • TemporalWorkflowClientFactory: Added buildRpcRetryOptions(Config) helper wired into WorkflowServiceStubsOptions via .setRpcRetryOptions().

Default values provide ~151.5s of cumulative retry budget (initialInterval=500ms, coefficient=2.0, maximumInterval=30s, maximumAttempts=10), covering a 2-minute throttle burst with buffer. All values are configurable via Typesafe Config.

Tests

  • My PR adds the following unit tests OR does not need testing for this extremely good reason:

buildRpcRetryOptions is a straightforward config-to-SDK-object mapping with no branching logic to unit test. Integration-level throttling behavior is exercised by existing Temporal end-to-end tests.

Commits

  • My commits all reference JIRA issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "How to write a good git commit message".

… calls

Add configurable RpcRetryOptions to WorkflowServiceStubsOptions in
TemporalWorkflowClientFactory. The defaults (initialInterval=500ms,
backoffCoefficient=2.0, maximumInterval=30s, maximumAttempts=10)
provide ~151s of cumulative retry budget, enough to ride out a 2-minute
throttle burst without failing worker status reporting.

New config keys under gobblin.temporal.rpc.retry.options.* allow
per-environment tuning without code changes.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant