feat: Add MigrateSchedule RPC for V1 to V2 scheduler migration#9261
Open
chaptersix wants to merge 8 commits intotemporalio:mainfrom
Open
feat: Add MigrateSchedule RPC for V1 to V2 scheduler migration#9261chaptersix wants to merge 8 commits intotemporalio:mainfrom
chaptersix wants to merge 8 commits intotemporalio:mainfrom
Conversation
Add the infrastructure for migrating schedules from the workflow-backed scheduler (V1) to the CHASM-backed scheduler (V2): - Add MigrateSchedule RPC to CHASM scheduler service proto - Add MigrateScheduleRequest/Response messages with migration state - Implement AdminHandler.MigrateSchedule to signal V1 workflow - Add migrate signal handler in V1 scheduler workflow - Add MigrateSchedule activity to call CHASM scheduler service - Update migration function to accept proto types directly - Wire up SchedulerServiceClient in worker service fx module
chaptersix
commented
Feb 9, 2026
Add handler and logic in chasm/lib/scheduler to create a CHASM scheduler from migrated V1 state: - CreateSchedulerFromMigration initializes scheduler with migrated state - MigrateSchedule handler uses StartExecution with reject duplicate policy - Tests for migration functionality
chaptersix
commented
Feb 9, 2026
chaptersix
commented
Feb 9, 2026
chaptersix
commented
Feb 9, 2026
| if errors.As(err, &alreadyStartedErr) { | ||
| return nil, serviceerror.NewWorkflowExecutionAlreadyStarted( | ||
| "CHASM schedule already exists", | ||
| "", |
Contributor
Author
There was a problem hiding this comment.
should we include a this info n the request?
Contributor
Author
There was a problem hiding this comment.
will circle back to this.
…migration test LegacyToSchedulerMigrationState was returning *SchedulerMigrationState but the MigrateSchedule activity expects *MigrateScheduleRequest. Rename to LegacyToMigrateScheduleRequest and return the full request with NamespaceId populated. Also fix the migrate signal channel (was incorrectly using SignalNameForceCAN instead of SignalNameMigrate), add TestScheduleMigrationV1ToV2 integration test, expose SchedulerClient from test cluster, and fix staticcheck SA4006 lint errors in scheduler_test.go.
chaptersix
commented
Feb 11, 2026
service/worker/scheduler/workflow.go
Outdated
Comment on lines
1005
to
1008
| // inc the sequence number to prevent to invalidate signals to | ||
| // this workflow after the migration has started. | ||
| // they should target the chasm scheduler after this point | ||
| s.incSeqNo() |
Contributor
Author
There was a problem hiding this comment.
the conflict token is not checked until a signal is processed. So this likely has no benefit.
chaptersix
commented
Feb 11, 2026
| "namespace", s.State.Namespace, | ||
| "schedule-id", s.State.ScheduleId, | ||
| ) | ||
| return nil |
Contributor
Author
There was a problem hiding this comment.
anything else that should be done before closing the workflow?
Contributor
Author
There was a problem hiding this comment.
we will drop any signals received during the migration.
Contributor
There was a problem hiding this comment.
we will drop any signals received during the migration.
yep, regrettably true, though updates are already accepted (or rejected) asynchronously, so it's not new behavior.
Add activity-level tests for MigrateSchedule covering success, already-exists (idempotent), and error paths. Add workflow-level tests for migrate signal handling: success terminates workflow, failure continues, and signals are still processed after a failed migration. Cap migration local activity to 1 attempt with 60s schedule-to-close timeout instead of inheriting the default 1h with unlimited retries. Remove unnecessary incSeqNo() before migration -- the conflict token change is never visible externally since it's in-memory only, and queued signals are dropped on workflow termination regardless.
Resolved merge conflict in service/worker/fx.go by including both: - dummy.Module from upstream - schedulerpb.NewSchedulerServiceLayeredClient from current branch
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changed
MigrateScheduleRPC to admin handler and CHASM scheduler servicemigratesignal, runs a local activity that callsMigrateScheduleto create the CHASM schedule from V1 stateCreateSchedulerFromMigrationinitializes a full CHASM scheduler tree (generator, invoker, backfillers, visibility) from the migrated V1 state, preserving the conflict token for client compatibilityLegacyToMigrateScheduleRequestconverts V1InternalState+ScheduleInfointo the migration request format, including running/completed workflows as buffered starts and ongoing backfillsWhy
Support migrating from workflow-backed (V1) schedulers to CHASM (V2) schedulers. The admin API (
MigrateSchedule) signals the V1 workflow, which snapshots its state and creates the V2 schedule in a single local activity.Signals during migration
Signals received while the migration local activity is executing are dropped if the migration succeeds (the workflow terminates without consuming them).
Migration activity retry policy
The migration local activity uses a restricted retry policy (1 attempt, 60s schedule-to-close) rather than the default (unlimited retries, 1h). A persistent failure should fail fast and let the workflow continue, rather than blocking for up to an hour.
Follow-up PRs
tdbgcommand for triggering migration