fix(scheduler): guard grouped resume reapply#199
Draft
aparajon wants to merge 19 commits into
Draft
Conversation
There was a problem hiding this comment.
Pull request overview
This PR guards grouped resume so SchemaBot re-plans immediately before reapplying and completes the apply without invoking the engine when no schema changes remain.
Changes:
- Adds a final grouped-resume schema check before engine apply.
- Short-circuits grouped resume to mark the apply completed when all tasks are already done.
- Extends local tests and fake engine behavior to cover the no-remaining-work path.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
pkg/tern/local_control_resume.go |
Adds final grouped resume re-plan, engine drain, and completed short-circuit path. |
pkg/tern/local_client_test.go |
Updates fake control engine and adds coverage for the new short-circuit behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
ecc622a to
9516df2
Compare
9516df2 to
a775594
Compare
Co-authored-by: Amp <amp@ampcode.com> Amp-Thread-ID: https://ampcode.com/threads/T-019e7456-b8ee-77aa-89bc-178cfab1140d
a775594 to
b47f8a8
Compare
Amp-Thread-ID: https://ampcode.com/threads/T-019e7456-b8ee-77aa-89bc-178cfab1140d Co-authored-by: Amp <amp@ampcode.com>
Amp-Thread-ID: https://ampcode.com/threads/T-019e7456-b8ee-77aa-89bc-178cfab1140d Co-authored-by: Amp <amp@ampcode.com>
Amp-Thread-ID: https://ampcode.com/threads/T-019e7456-b8ee-77aa-89bc-178cfab1140d Co-authored-by: Amp <amp@ampcode.com>
Amp-Thread-ID: https://ampcode.com/threads/T-019e7456-b8ee-77aa-89bc-178cfab1140d Co-authored-by: Amp <amp@ampcode.com>
Amp-Thread-ID: https://ampcode.com/threads/T-019e7456-b8ee-77aa-89bc-178cfab1140d Co-authored-by: Amp <amp@ampcode.com>
Amp-Thread-ID: https://ampcode.com/threads/T-019e7456-b8ee-77aa-89bc-178cfab1140d Co-authored-by: Amp <amp@ampcode.com>
Amp-Thread-ID: https://ampcode.com/threads/T-019e7456-b8ee-77aa-89bc-178cfab1140d Co-authored-by: Amp <amp@ampcode.com>
Amp-Thread-ID: https://ampcode.com/threads/T-019e7456-b8ee-77aa-89bc-178cfab1140d Co-authored-by: Amp <amp@ampcode.com>
Amp-Thread-ID: https://ampcode.com/threads/T-019e7456-b8ee-77aa-89bc-178cfab1140d Co-authored-by: Amp <amp@ampcode.com>
Amp-Thread-ID: https://ampcode.com/threads/T-019e7456-b8ee-77aa-89bc-178cfab1140d Co-authored-by: Amp <amp@ampcode.com>
Amp-Thread-ID: https://ampcode.com/threads/T-019e7456-b8ee-77aa-89bc-178cfab1140d Co-authored-by: Amp <amp@ampcode.com>
Amp-Thread-ID: https://ampcode.com/threads/T-019e7456-b8ee-77aa-89bc-178cfab1140d Co-authored-by: Amp <amp@ampcode.com>
Amp-Thread-ID: https://ampcode.com/threads/T-019e7456-b8ee-77aa-89bc-178cfab1140d Co-authored-by: Amp <amp@ampcode.com>
Amp-Thread-ID: https://ampcode.com/threads/T-019e7456-b8ee-77aa-89bc-178cfab1140d Co-authored-by: Amp <amp@ampcode.com>
Amp-Thread-ID: https://ampcode.com/threads/T-019e7456-b8ee-77aa-89bc-178cfab1140d Co-authored-by: Amp <amp@ampcode.com>
Amp-Thread-ID: https://ampcode.com/threads/T-019e7456-b8ee-77aa-89bc-178cfab1140d Co-authored-by: Amp <amp@ampcode.com>
Amp-Thread-ID: https://ampcode.com/threads/T-019e7456-b8ee-77aa-89bc-178cfab1140d Co-authored-by: Amp <amp@ampcode.com>
Amp-Thread-ID: https://ampcode.com/threads/T-019e7456-b8ee-77aa-89bc-178cfab1140d Co-authored-by: Amp <amp@ampcode.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
Grouped resume can be retried after the target schema has already advanced, and remote stop→start can race with stale stop reconciliation. SchemaBot should verify there is still work left before reapplying and keep an accepted start request from being undone by an older stopped remote progress report.
What
Grouped resume final-check
flowchart LR A[Grouped resume] --> B[Final schema check] B --> C{Changes remain?} C -- No --> D[Complete apply without reapplying] C -- Yes --> E[Call engine Apply]gRPC stop/start reconciliation
/startto proceed when a pending remote stop request has already reached stopped remotely/startis a separate operator/caller request after the stop path has recorded stopped state; SchemaBot does not automatically start after completing a stop request.Common setup: an operator requests
/stop; SchemaBot records the stop request, the scheduler stops remote Tern, and stopped progress is stored./startis a later, separate operator request.Start succeeds
/startstart request acceptedStartEnd state: apply is running or completed; the start request is completed.
Start RPC fails or times out
/startstart request acceptedStart/startafter investigatingEnd state: apply remains stopped; the error is visible; the start request is failed, so the scheduler does not retry forever.
Start accepted, but remote still reports stopped
/startstart request acceptedStartEnd state: either running/completed with the start request completed, or stopped with a visible error and the start request failed, so the scheduler does not retry forever.
Risk Assessment
Medium — scoped to grouped resume dispatch and immediate remote stop→start reconciliation. The unchanged path still calls the engine when the re-plan finds active table changes, stop priority remains intact, and start only proceeds after stored or remote state proves the stop has resolved.
Generated with Amp