Skip to content

Replace SuggestContinueAsNew + SuggestContinueAsNewReasonTargetVersionChanged with new TargetVersionChanged flag#709

Merged
carlydf merged 10 commits intomasterfrom
target-version-changed
Feb 12, 2026
Merged

Replace SuggestContinueAsNew + SuggestContinueAsNewReasonTargetVersionChanged with new TargetVersionChanged flag#709
carlydf merged 10 commits intomasterfrom
target-version-changed

Conversation

@carlydf
Copy link
Contributor

@carlydf carlydf commented Feb 5, 2026

READ BEFORE MERGING: All PRs require approval by both Server AND SDK teams before merging! This is why the number of required approvals is "2" and not "1"--two reviewers from the same team is NOT sufficient. If your PR is not approved by someone in BOTH teams, it may be summarily reverted.

  • Replace SuggestContinueAsNew + SuggestContinueAsNewReasonTargetVersionChanged with new TargetVersionChanged flag
  • Add new target_worker_deployment_version_changed flag to be sent in WorkflowTaskStartedEvent and exposed next to continue-as-new-suggested in the workflow info.

Why?
Setting SuggestContinueAsNew=true for Pinned workflows whenever their is a new Target Version available for that workflow causes Pinned workflows to hit that condition much more frequently than they expect. Users who are currently doing: if workflow_info.suggestContinueAsNew{ do continue-as-new } in their Pinned workflow code would need to change that code to protect themselves from running into an infinite-CaN-loop, because the default CaN behavior for a Pinned workflow is to stay Pinned.

We should not force users to protect themselves from such a situation.

Because upgrading on continue-as-new is opt-in, receiving the suggestion to continue-as-new-onto-new-target-version should be opt-in as well. If people are forced to check the new suggest-continue-as-new-reasons field to "opt out," that is unsafe, because inevitably some people will forget to do so or misunderstand, and then get hit by this unexpected footgun.

Much safer and still ergonomical to let upgrade-on-can be opt-in on both fronts, as proposed here. With this change, the people who are currently doing if workflow_info.suggestContinueAsNew{ do continue-as-new } won't see any change in semantics, regardless of their versioning behavior.

People who consciously know that they want to do upgrade-on-can / Trampolining will have to change their CaN options anyway, so it's easy enough to teach them to pay attention to this new TargetWorkerDeploymentVersionChanged flag.

I am proposing to completely remove the "new target version" reason from the can-suggested reasons list. Some SDK PRs use it, and they should stop. We could also simply deprecate the reason and just tell people that it's not used anymore, but I'd prefer to flat out delete it. Server is already patched to not send that flag by default.

Server PR
temporalio/temporal#9239

@carlydf carlydf changed the title proposal Replace SuggestContinueAsNew + SuggestContinueAsNewReasonTargetVersionChanged with new TargetVersionChanged flag Feb 5, 2026
@carlydf carlydf changed the title Replace SuggestContinueAsNew + SuggestContinueAsNewReasonTargetVersionChanged with new TargetVersionChanged flag Replace SuggestContinueAsNew + SuggestContinueAsNewReasonTargetVersionChanged with new TargetVersionChanged flag Feb 5, 2026
@carlydf carlydf marked this pull request as ready for review February 5, 2026 23:50
@carlydf carlydf requested review from a team as code owners February 5, 2026 23:50
// suggest_continue_as_new is not true when the Target Version Changed.
// See target_worker_deployment_version_changed to find out if Target Version Changed.
reserved 4;
reserved "SUGGEST_CONTINUE_AS_NEW_REASON_TARGET_WORKER_DEPLOYMENT_VERSION_CHANGED";
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could also deprecate instead of deleting, either works

Copy link
Member

@cretz cretz Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(was just asked to weigh on this, I was attempting to defer, sorry for late review)

I was originally going to strongly suggest deprecate because this is a compile-time break, but looking at temporalio/sdk-go#2171 it is not. But IMO we should still deprecate instead of remove because it is tagged API that has consequences for our other code-generating users, and should be harmless to do so.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cretz - if we do deprecate this, and not remove it completely, it would also mean that the server would have to keep on offering support for Trampolining via this manner, right?

Following the same line of thought, I was wondering if reserved would make the above idea come to life without completely just removing the field (as just removing it right now would be breaking).

Copy link
Member

@cretz cretz Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we do deprecate this, and not remove it completely, it would also mean that the server would have to keep on offering support for Trampolining via this manner, right?

I think in this case, since y'all got lucky no SDK code was merged referencing this enum, can probably be removed. The deprecation I meant was more not removing the enum just the functionality, not deprecate-but-working. But here, we're lucky the SDKs didn't hurry up and implement/merge the original form.

Following the same line of thought, I was wondering if reserved would make the above idea come to life without completely just removing the field (as just removing it right now would be breaking).

Yeah, it'd all be about compile time breaks. But we didn't ever use this, so doesn't really matter tbh, can leave reserved, I'm ok with that, so long as no SDK starts referencing this enum at all before their protos are updated referencing this newer one (which for Go means a tagged API release).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would just remove it, since we know it wasn't used. No reason to keep it around and possibly confuse someone later

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already removed, but ok removing the reserving of the enum name, though would like to preserve the reserving of the enum number

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make-ci-build prevents us from removing the field without reserving the name. I think it's fine to leave the name reserved, I can make the comment more clear if that helps

// suggest_continue_as_new is not true when the Target Version Changed.
// See target_worker_deployment_version_changed to find out if Target Version Changed.
reserved 4;
reserved "SUGGEST_CONTINUE_AS_NEW_REASON_TARGET_WORKER_DEPLOYMENT_VERSION_CHANGED";
Copy link
Member

@cretz cretz Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(was just asked to weigh on this, I was attempting to defer, sorry for late review)

I was originally going to strongly suggest deprecate because this is a compile-time break, but looking at temporalio/sdk-go#2171 it is not. But IMO we should still deprecate instead of remove because it is tagged API that has consequences for our other code-generating users, and should be harmless to do so.

// True if Workflow's Target Worker Deployment Version is different from its Pinned Version and
// the workflow is Pinned.
// Experimental.
bool target_worker_deployment_version_changed = 9;
Copy link
Member

@cretz cretz Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so it's easy enough to teach them to pay attention to this new TargetWorkerDeploymentVersionChanged flag.

To confirm, this is asking every SDK to add a new workflow getter? And once seen will this field need to always stay true or does it go back and forth?

I tried reading the description, but can you confirm why this is not acceptable as a CAN reason? Specifically, can you confirm will we not set suggest_continue_as_new to true once this is set? Because I would expect if suggest_continue_as_new is true, the reason would be present. And I would expect the whole reason for suggest_continue_as_new_reasons is so people can decide if the reasons represent ones they consider to be for continue as new.

Overall, the concept of suggest_continue_as_new == len(suggest_continue_as_new_reasons) > 0 is best from a comprehension standpoint, and users can choose to not adopt the suggestion. Had the reasons come before the bool, we probably would have never had the bool.

Copy link
Member

@Shivs11 Shivs11 Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To confirm, this is asking every SDK to add a new workflow getter?

yes

And once seen will this field need to always stay true or does it go back and forth?

I believe this field would have to be set to true, once set, for the entire duration of a workflow run. However, once it does CAN, this value would be initialized to a false. Folks could have multiple deployments of worker-versions and the idea is that on every new worker-version becoming the new "current", this getter could be set to true.

I tried reading the description, but can you confirm why this is not acceptable as a CAN reason

I think the main problem with this is simply that people were not looking at the CAN reasons when deciding if their Pinned workflows should CAN or not. When we rolled out this change, folks who had if workflow_info.suggestContinueAsNew{ do continue-as-new } in their Pinned workflows experienced non-required CAN'ing on a new deployment.

Specifically, can you confirm will we not set suggest_continue_as_new to true once this is set

Seeing this in the PR description: "I am proposing to completely remove the "new target version" reason from the can-suggested reasons list." which makes me believe that the main intention was to get rid of this reason from the list itself. I think Carly's idea is to just not use this and deviate away from this initial idea of using suggest_continue_as_new being true to using the new getter on TargetWorkerDeploymentVersionChanged.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, yes, looking for confirmation that this new getter for target version deployment changed will now be completely divorced from CAN. If it is completely separate, do want to confirm with SDK team in general that we're ok with this new workflow surface area and confirmation that once set true, it remains true for lifetime of workflow from user POV.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

confirmation that once set true, it remains true for lifetime of workflow from user POV.

Is that a requirement from SDK to work well? In principle, it does not have to remain true for ever because user might manually move the workflow to the new build after which the "target version changed" event becomes out of date.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also thinking what if we just sent the wfInfo.get_target_deployment_version and user could compare it with the wfInfo.get_current_deployment_version to make CaN decisions?

Copy link
Member

@cretz cretz Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that a requirement from SDK to work well?

Not for the SDK to work well, but for user understanding. If it's a string that gets changed on successive tasks, no problem. But if it's a bool as proposed here, I would recommend the concept of changed being sticky for the rest of the workflow without having to set it to true on each successive task (meaning we memoize it as true first time we see it as true). But a bool on every successive task is probably fine too, we just need to know.

If using strings instead, feel free to only set those when changed and ask SDK to memoize first seen (we don't want duplicate data continually resent on successive history tasks).

Copy link
Contributor Author

@carlydf carlydf Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also thinking what if we just sent the wfInfo.get_target_deployment_version and user could compare it with the wfInfo.get_current_deployment_version to make CaN decisions?

On first thought, I like this. But, thinking about it more, doing this would make it harder to protect people from the infinite CaN loop problem with our "only tell a workflow that its Target Version changed if the Target Version changed after the workflow started" solution.

So I'm going to go ahead with the current proposal.

Copy link
Member

@cretz cretz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved with the understanding that this new boolean is wholly unrelated to suggest_continue_as_new

Copy link
Member

@Sushisource Sushisource left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This approach works for me

@carlydf carlydf merged commit f1dd4cd into master Feb 12, 2026
4 checks passed
@carlydf carlydf deleted the target-version-changed branch February 12, 2026 00:08
carlydf added a commit to temporalio/temporal that referenced this pull request Feb 12, 2026
…ged (#9239)

## What changed?
Set TargetVersionChanged instead of SuggestCaN when TargetVersionChanged
temporalio/api#709

## Why?
Setting SuggestContinueAsNew=true for Pinned workflows whenever their is
a new Target Version available for that workflow causes Pinned workflows
to hit that condition much more frequently than they expect. Users who
are currently doing: if workflow_info.suggestContinueAsNew{ do
continue-as-new } in their Pinned workflow code would need to change
that code to protect themselves from running into an infinite-CaN-loop,
because the default CaN behavior for a Pinned workflow is to stay
Pinned.

We should not force users to protect themselves from such a situation.

Because upgrading on continue-as-new is opt-in, receiving the suggestion
to continue-as-new-onto-new-target-version should be opt-in as well. If
people are forced to check the new suggest-continue-as-new-reasons field
to "opt out," that is unsafe, because inevitably some people will forget
to do so or misunderstand, and then get hit by this unexpected footgun.

Much safer and still ergonomical to let upgrade-on-can be opt-in on both
fronts, as proposed here. With this change, the people who are currently
doing if workflow_info.suggestContinueAsNew{ do continue-as-new } won't
see any change in semantics, regardless of their versioning behavior.

People who consciously know that they want to do upgrade-on-can /
Trampolining will have to change their CaN options anyway, so it's easy
enough to teach them to pay attention to this new
TargetWorkerDeploymentVersionChanged flag.

## How did you test it?
- [ ] built
- [ ] run locally and tested manually
- [x] covered by existing tests
- [ ] added new unit test(s)
- [x] added new functional test(s)

## Potential risks
None

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> **Medium Risk**
> Touches workflow task started event generation/persistence and
versioning-related signaling, which can affect worker behavior and
history compatibility; changes are gated by dynamic config and covered
by tests.
> 
> **Overview**
> Stops using `SuggestContinueAsNew` (and its reason tags) to signal
pinned workflows that a newer target worker deployment version exists,
and instead introduces an explicit
`TargetWorkerDeploymentVersionChanged` boolean on `WorkflowTaskStarted`
events and persisted `WorkflowExecutionInfo`.
> 
> Adds namespace dynamic config `EnableSendTargetVersionChanged`
(default on) and a new metric `workflow_target_version_changed_count`
emitted when this flag is set; updates the workflow task state machine,
mutable state plumbing/mocks, proto/pb persistence, and functional tests
accordingly. Also bumps `go.temporal.io/api` to pick up the new event
attribute.
> 
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
3886491. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants