Skip to content

Comments

[KYUUBI #7323] Support timeout at PlanOnlyMode#7324

Open
ruanwenjun wants to merge 2 commits intoapache:masterfrom
ruanwenjun:dev-wenjun_fix7323
Open

[KYUUBI #7323] Support timeout at PlanOnlyMode#7324
ruanwenjun wants to merge 2 commits intoapache:masterfrom
ruanwenjun:dev-wenjun_fix7323

Conversation

@ruanwenjun
Copy link
Member

@ruanwenjun ruanwenjun commented Feb 11, 2026

Why are the changes needed?

close #7323

Support to cancel the PlanOnlyStatement when timeout.

How was this patch tested?

Test by manual, the job group has been canceled, and the driver memory stop increase.

26/02/12 10:54:11 INFO PlanOnlyStatement: Processing hive's query[bc48bfb9-2294-4987-aa81-c5ae3f3b39b5]: RUNNING_STATE -> TIMEDOUT_STATE, time taken: 10.007 seconds
26/02/12 10:54:11 INFO PlanOnlyStatement: statementId=bc48bfb9-2294-4987-aa81-c5ae3f3b39b5, operationRunTime=0 ms, operationCpuTime=0 ms
26/02/12 10:54:11 INFO DAGScheduler: Asked to cancel job group bc48bfb9-2294-4987-aa81-c5ae3f3b39b5
26/02/12 10:54:11 INFO DAGScheduler: Asked to cancel job group bc48bfb9-2294-4987-aa81-c5ae3f3b39b5

Was this patch authored or co-authored using generative AI tooling?

NO

@codecov-commenter
Copy link

codecov-commenter commented Feb 11, 2026

Codecov Report

❌ Patch coverage is 0% with 24 lines in your changes missing coverage. Please review.
✅ Project coverage is 0.00%. Comparing base (69e8e95) to head (fde836a).
⚠️ Report is 3 commits behind head on master.

Files with missing lines Patch % Lines
...ubi/engine/spark/operation/PlanOnlyStatement.scala 0.00% 19 Missing ⚠️
...org/apache/kyuubi/operation/OperationManager.scala 0.00% 4 Missing ⚠️
...ine/spark/operation/SparkSQLOperationManager.scala 0.00% 1 Missing ⚠️
Additional details and impacted files
@@          Coverage Diff           @@
##           master   #7324   +/-   ##
======================================
  Coverage    0.00%   0.00%           
======================================
  Files         698     698           
  Lines       43649   43667   +18     
  Branches     5894    5896    +2     
======================================
- Misses      43649   43667   +18     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@ruanwenjun ruanwenjun marked this pull request as draft February 11, 2026 11:28
}
if (queryTimeout > 0) {
val timeoutExecutor =
ThreadUtils.newDaemonSingleThreadScheduledExecutor("query-timeout-thread", false)
Copy link
Member

@pan3793 pan3793 Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this creates a new thread for each operation, it's too expensive. let's follow the idea of #7121 to create a global ThreadScheduledExecutor shared by all operations.

Q: why plan-only stmt was not covered by that? I suppose it should be, though I didn't take a deeper look yet

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tes, this is expensive. I will add a new method

def runWithTimeoutFallback(task: Runnable,
                             onTimeout: Runnable,
                             timeoutSeconds: Long): Unit = {
    val future = timeoutScheduler.submit(task)
    try {
      future.get(timeoutSeconds, TimeUnit.SECONDS)
    } catch {
      case _: TimeoutException =>
        future.cancel(true)
        onTimeout.run()
    }
  }

at OperationManager to deal with this.

And I had also assumed that plan-only statements could respond to timeouts, but I discovered that they cannot.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In our production environment, we discovered that in the current 1.9.x version, when setting a timeout in plan-only mode, although the session in the Kyuubi server has timed out and closed, the driver in the Spark engine continues to parse and execute the plan. This causes memory to remain unavailable.

}
}

test("KYUUBI-7323: Support timeout at PlanOnlyMode") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kyuubi uses GitHub Issues instead of JIRA, use KYUUBI #XXXX instead of KYUUBI-XXXX for issue/PR reference.

Also, we don't require creating an issue before sending a PR, GitHub Issues and PRs share the sequence number, and the link can jump to each other, so XXXX can either be an issue number or a PR number. What really matters here, clear description of the context of your change - especially WHY, and TEST. WHAT is also important in most cases, but for cases where the code itself is self-explanatory, WHAT is optional.

Copy link
Member Author

@ruanwenjun ruanwenjun Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, please forgive me for mistakenly copying the template from another test method above.

And I removed this test case here, since I find that the added test method actually fails to verify whether the engine has truly cancel operation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Has added the solution at issue.

However, I don't fully understand why interrupting the thread can stop the analysis. There might be thread state detection logic during plan analysis, but I haven't found the specific code.
During testing, I discovered that canceling the jobgroup via cleanup didn't halt the plan analy.

@ruanwenjun ruanwenjun marked this pull request as ready for review February 11, 2026 12:10
@ruanwenjun ruanwenjun marked this pull request as draft February 12, 2026 02:10
@ruanwenjun ruanwenjun marked this pull request as ready for review February 12, 2026 02:59
@ruanwenjun ruanwenjun requested a review from pan3793 February 12, 2026 03:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE] Support timeout at PlanOnlyMode

3 participants