Skip to content

[Fix-17817][Master]Add workflow timeout event and handle#18063

Open
njnu-seafish wants to merge 9 commits intoapache:devfrom
njnu-seafish:Fix-17817-2
Open

[Fix-17817][Master]Add workflow timeout event and handle#18063
njnu-seafish wants to merge 9 commits intoapache:devfrom
njnu-seafish:Fix-17817-2

Conversation

@njnu-seafish
Copy link
Contributor

Was this PR generated or assisted by AI?

NO

Purpose of the pull request

close #17817

This bug involves a critical feature and has been stagnant for several months. Since I actually resolved the task timeout and workflow timeout issue quite some time ago, I would like to finalize the process and get it closed.
image

Brief change log

Add workflow timeout event and handle

Verify this pull request

This pull request is code cleanup without any test coverage.

(or)

This pull request is already covered by existing tests, such as (please describe tests).

(or)

This change added tests and can be verified as follows:

(or)

Pull Request Notice

Pull Request Notice

If your pull request contains incompatible change, you should also add it to docs/docs/en/guide/upgrade/incompatible.md

@SbloodyS SbloodyS added the bug Something isn't working label Mar 13, 2026
@SbloodyS SbloodyS added this to the 3.4.2 milestone Mar 13, 2026
@AllArgsConstructor(access = AccessLevel.PRIVATE)
public class WorkflowTimeoutLifecycleEvent extends AbstractWorkflowLifecycleLifecycleEvent {

private IWorkflowExecutionRunnable workflowExecutionRunnable;

Check notice

Code scanning / CodeQL

Missing Override annotation Note

This method overrides
AbstractWorkflowLifecycleLifecycleEvent.getWorkflowExecutionRunnable
; it is advisable to add an Override annotation.
@sonarqubecloud
Copy link

Quality Gate Failed Quality Gate failed

Failed conditions
23.2% Coverage on New Code (required ≥ 60%)

See analysis details on SonarQube Cloud

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses issue #17817 by introducing a workflow-level timeout lifecycle event/handler so that configured workflow timeout alerts are actually emitted by the master engine.

Changes:

  • Add a new WorkflowTimeoutLifecycleEvent + WorkflowTimeoutLifecycleEventHandler, and wire publishing from WorkflowStartLifecycleEventHandler.
  • Add TIMEOUT to WorkflowLifecycleEventType to support routing the new event.
  • Add service/DAO plumbing for sending workflow-timeout alerts and refactor alert persistence helper naming; tighten task-timeout validation.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
dolphinscheduler-service/src/main/java/org/apache/dolphinscheduler/service/alert/WorkflowAlertManager.java Adds sendWorkflowTimeoutAlert entrypoint to reach DAO alert creation.
dolphinscheduler-master/src/main/java/org/apache/dolphinscheduler/server/master/engine/workflow/lifecycle/handler/WorkflowTimeoutLifecycleEventHandler.java New handler to send workflow timeout alerts when the timeout event fires.
dolphinscheduler-master/src/main/java/org/apache/dolphinscheduler/server/master/engine/workflow/lifecycle/handler/WorkflowStartLifecycleEventHandler.java Publishes workflow timeout event at workflow start (if timeout configured).
dolphinscheduler-master/src/main/java/org/apache/dolphinscheduler/server/master/engine/workflow/lifecycle/event/WorkflowTimeoutLifecycleEvent.java New delay event representing workflow timeout.
dolphinscheduler-master/src/main/java/org/apache/dolphinscheduler/server/master/engine/workflow/lifecycle/WorkflowLifecycleEventType.java Adds TIMEOUT event type.
dolphinscheduler-master/src/main/java/org/apache/dolphinscheduler/server/master/engine/task/lifecycle/event/TaskTimeoutLifecycleEvent.java Tightens validation to require timeout minutes > 0.
dolphinscheduler-dao/src/main/java/org/apache/dolphinscheduler/dao/AlertDao.java Adds validation for workflow timeout alert inputs and consolidates timeout-alert saving helper.
dolphinscheduler-common/src/main/java/org/apache/dolphinscheduler/common/enums/AlertType.java Updates comment to reflect actual enum values.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Comment on lines +53 to +54
long delayTime = System.currentTimeMillis() - workflowInstance.getStartTime().getTime()
+ TimeUnit.MINUTES.toMillis(timeout);

public static WorkflowTimeoutLifecycleEvent of(IWorkflowExecutionRunnable workflowExecutionRunnable) {
final WorkflowInstance workflowInstance = workflowExecutionRunnable.getWorkflowInstance();
checkState(workflowInstance != null, "The workflow instance must be initialized before retrying.");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

private void workflowTimeoutMonitor(final IWorkflowExecutionRunnable workflowExecutionRunnable) {
final WorkflowInstance workflowInstance = workflowExecutionRunnable.getWorkflowInstance();
if (workflowInstance.getTimeout() <= 0) {
log.debug("The workflow {} timeout {} is invalided, so the timeout monitor will not be started.",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okok

Copy link
Member

@ruanwenjun ruanwenjun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please review your PR first, too many of these comments have been raised repeatedly in previous PRs.

final int timeout = workflowInstance.getTimeout();
checkState(timeout > 0, "The workflow timeout: %s must > 0 minutes", timeout);

long delayTime = System.currentTimeMillis() - workflowInstance.getStartTime().getTime()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the workflow instance is rerun the startTime will not change. You should use restartTime .

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the workflow instance is rerun the startTime will not change. You should use restartTime .

good

private void workflowTimeoutMonitor(final IWorkflowExecutionRunnable workflowExecutionRunnable) {
final WorkflowInstance workflowInstance = workflowExecutionRunnable.getWorkflowInstance();
if (workflowInstance.getTimeout() <= 0) {
log.debug("The workflow {} timeout {} is invalided, so the timeout monitor will not be started.",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The log is incorrect.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The log is incorrect.

fix it

Copy link
Member

@ruanwenjun ruanwenjun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should add IT case.

* Pause the workflow instance
*/
PAUSE,

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't change the unrelated code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't change the unrelated code.

ok

@github-actions github-actions bot added the test label Mar 20, 2026
@njnu-seafish
Copy link
Contributor Author

You should add IT case.

added

@njnu-seafish njnu-seafish requested a review from ruanwenjun March 20, 2026 08:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend bug Something isn't working test

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] [Master] Missing workflow timeout alerts

4 participants