[CELEBORN-1577][FOLLOWUP] Fix backward compatiblity issue with interrupt shuffle#3675
[CELEBORN-1577][FOLLOWUP] Fix backward compatiblity issue with interrupt shuffle#3675s0nskar wants to merge 1 commit into
Conversation
|
Using Since If we do need robust backward compatibility, a better approach would be to add an explicit capability field (e.g., — Reviewed by Claude Code |
There was a problem hiding this comment.
Pull request overview
Updates the client-side quota interrupt handling in ApplicationHeartbeater to be backward compatible with older masters that don’t populate checkQuotaResponse in HeartbeatFromApplicationResponse, preventing accidental job cancellation due to protobuf default values.
Changes:
- Gate shuffle interruption on
CheckQuotaResponse.reason.nonEmptyin addition to!isAvailable. - Avoid treating protobuf default
isAvailable=false, reason=""(from missing message field) as a real quota-exceeded signal.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| private def checkQuotaExceeds(response: CheckQuotaResponse): Unit = { | ||
| if (conf.quotaInterruptShuffleEnabled && !response.isAvailable) { | ||
| if (conf.quotaInterruptShuffleEnabled && !response.isAvailable && response.reason.nonEmpty) { |
There was a problem hiding this comment.
It's better to introduce new field like quotaCheckSupported to fix backward compatiblity.
|
@s0nskar, please rebase the latest main branch for Grafana Dashboard CI. |
What changes were proposed in this pull request?
Fix backward compatibility issue with interrupt shuffle by checking if the reason is nonEmpty.
Why are the changes needed?
If someone uses a new client with old server which is not sending
CheckQuotaResponseinHeartbeatFromApplicationResponsethen proto uses default value to build CheckQuotaResponse with isAvailable=false and reason="". In this case the job will always fails without breaching the quota, we should not fail the job if the reason is empty to make it backward compatible.Does this PR resolve a correctness bug?
No
Does this PR introduce any user-facing change?
No
How was this patch tested?
Tested in local setup.