[KYUUBI #7323] Support timeout at PlanOnlyMode#7324
[KYUUBI #7323] Support timeout at PlanOnlyMode#7324ruanwenjun wants to merge 2 commits intoapache:masterfrom
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #7324 +/- ##
======================================
Coverage 0.00% 0.00%
======================================
Files 698 698
Lines 43649 43667 +18
Branches 5894 5896 +2
======================================
- Misses 43649 43667 +18 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| } | ||
| if (queryTimeout > 0) { | ||
| val timeoutExecutor = | ||
| ThreadUtils.newDaemonSingleThreadScheduledExecutor("query-timeout-thread", false) |
There was a problem hiding this comment.
this creates a new thread for each operation, it's too expensive. let's follow the idea of #7121 to create a global ThreadScheduledExecutor shared by all operations.
Q: why plan-only stmt was not covered by that? I suppose it should be, though I didn't take a deeper look yet
There was a problem hiding this comment.
Tes, this is expensive. I will add a new method
def runWithTimeoutFallback(task: Runnable,
onTimeout: Runnable,
timeoutSeconds: Long): Unit = {
val future = timeoutScheduler.submit(task)
try {
future.get(timeoutSeconds, TimeUnit.SECONDS)
} catch {
case _: TimeoutException =>
future.cancel(true)
onTimeout.run()
}
}at OperationManager to deal with this.
And I had also assumed that plan-only statements could respond to timeouts, but I discovered that they cannot.
There was a problem hiding this comment.
In our production environment, we discovered that in the current 1.9.x version, when setting a timeout in plan-only mode, although the session in the Kyuubi server has timed out and closed, the driver in the Spark engine continues to parse and execute the plan. This causes memory to remain unavailable.
| } | ||
| } | ||
|
|
||
| test("KYUUBI-7323: Support timeout at PlanOnlyMode") { |
There was a problem hiding this comment.
Kyuubi uses GitHub Issues instead of JIRA, use KYUUBI #XXXX instead of KYUUBI-XXXX for issue/PR reference.
Also, we don't require creating an issue before sending a PR, GitHub Issues and PRs share the sequence number, and the link can jump to each other, so XXXX can either be an issue number or a PR number. What really matters here, clear description of the context of your change - especially WHY, and TEST. WHAT is also important in most cases, but for cases where the code itself is self-explanatory, WHAT is optional.
There was a problem hiding this comment.
Thanks, please forgive me for mistakenly copying the template from another test method above.
And I removed this test case here, since I find that the added test method actually fails to verify whether the engine has truly cancel operation.
There was a problem hiding this comment.
Has added the solution at issue.
However, I don't fully understand why interrupting the thread can stop the analysis. There might be thread state detection logic during plan analysis, but I haven't found the specific code.
During testing, I discovered that canceling the jobgroup via cleanup didn't halt the plan analy.
8cfd423 to
fde836a
Compare
Why are the changes needed?
close #7323
Support to cancel the
PlanOnlyStatementwhen timeout.How was this patch tested?
Test by manual, the job group has been canceled, and the driver memory stop increase.
Was this patch authored or co-authored using generative AI tooling?
NO