Skip to content

[GOBBLIN-2254] Add timeout in gobblintaskrunner shutdown and move metrics reporting …#4173

Merged
Blazer-007 merged 2 commits intoapache:masterfrom
Blazer-007:virai_reorder_shutdown_statements_gobblin_cluster
Mar 11, 2026
Merged

[GOBBLIN-2254] Add timeout in gobblintaskrunner shutdown and move metrics reporting …#4173
Blazer-007 merged 2 commits intoapache:masterfrom
Blazer-007:virai_reorder_shutdown_statements_gobblin_cluster

Conversation

@Blazer-007
Copy link
Member

@Blazer-007 Blazer-007 commented Mar 10, 2026

…shutdown after stopping helix task closing

Dear Gobblin maintainers,

Please accept this PR. I understand that it will not be reviewed until I have checked off all the steps below!

JIRA

Description

  • Here are some details about my PR, including screenshots (if applicable):
    This pull request introduces a configurable shutdown timeout for the TaskStateModelFactory in the Gobblin cluster, enhancing the shutdown behavior of the GobblinTaskRunner. The main goal is to allow the system to wait for a graceful termination of the task state model factory, with a configurable timeout, before shutting down metrics and other services.

Key changes:

Configuration Enhancements:

  • Added new configuration key taskStateModelFactory.shutdownTimeoutSeconds with a default value of 300 seconds to GobblinClusterConfigurationKeys, allowing users to specify the shutdown timeout for the TaskStateModelFactory.

GobblinTaskRunner Improvements:

  • Introduced a new field taskStateModelFactoryShutdownTimeoutSeconds in GobblinTaskRunner and initialized it from the configuration, enabling the use of the new shutdown timeout setting. [1] [2]
  • Refactored the shutdown sequence in the stop() method to:
    • Move metrics shutdown after task completion to ensure all final events are emitted.
    • Replace direct shutdown of taskStateModelFactory with a new shutdownTaskStateModelFactory() method that waits up to the configured timeout for graceful termination, logging a warning if the timeout is exceeded.
  • Added the shutdownTaskStateModelFactory() helper method, which implements the timeout logic and handles thread interruption gracefully.

Tests

  • My PR adds the following unit tests OR does not need testing for this extremely good reason:
    Test already exist GobblinTaskRunnerTest::testSendReceiveShutdownMessage

Commits

  • My commits all reference JIRA issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a configurable timeout for TaskStateModelFactory shutdown in GobblinTaskRunner and reorders the shutdown sequence so that metrics reporting is stopped after tasks have completed and emitted their final Gobblin Tracking Events (GTEs).

Changes:

  • Added a new configuration key gobblin.cluster.taskStateModelFactory.shutdownTimeoutSeconds (default: 300s) in GobblinClusterConfigurationKeys.
  • Refactored GobblinTaskRunner.stop() to move metrics shutdown after task state model factory shutdown and Helix disconnection, and introduced a shutdownTaskStateModelFactory() method that polls for graceful termination with a configurable timeout.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
gobblin-cluster/src/main/java/org/apache/gobblin/cluster/GobblinClusterConfigurationKeys.java Added new configuration constants for task state model factory shutdown timeout
gobblin-cluster/src/main/java/org/apache/gobblin/cluster/GobblinTaskRunner.java Added timeout field, reordered shutdown sequence to delay metrics stop, and added shutdownTaskStateModelFactory() helper

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@Blazer-007 Blazer-007 merged commit b594188 into apache:master Mar 11, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants