Conversation
SuggestContinueAsNew + SuggestContinueAsNewReasonTargetVersionChanged with new TargetVersionChanged flag
| // suggest_continue_as_new is not true when the Target Version Changed. | ||
| // See target_worker_deployment_version_changed to find out if Target Version Changed. | ||
| reserved 4; | ||
| reserved "SUGGEST_CONTINUE_AS_NEW_REASON_TARGET_WORKER_DEPLOYMENT_VERSION_CHANGED"; |
There was a problem hiding this comment.
could also deprecate instead of deleting, either works
There was a problem hiding this comment.
(was just asked to weigh on this, I was attempting to defer, sorry for late review)
I was originally going to strongly suggest deprecate because this is a compile-time break, but looking at temporalio/sdk-go#2171 it is not. But IMO we should still deprecate instead of remove because it is tagged API that has consequences for our other code-generating users, and should be harmless to do so.
There was a problem hiding this comment.
@cretz - if we do deprecate this, and not remove it completely, it would also mean that the server would have to keep on offering support for Trampolining via this manner, right?
Following the same line of thought, I was wondering if reserved would make the above idea come to life without completely just removing the field (as just removing it right now would be breaking).
There was a problem hiding this comment.
if we do deprecate this, and not remove it completely, it would also mean that the server would have to keep on offering support for Trampolining via this manner, right?
I think in this case, since y'all got lucky no SDK code was merged referencing this enum, can probably be removed. The deprecation I meant was more not removing the enum just the functionality, not deprecate-but-working. But here, we're lucky the SDKs didn't hurry up and implement/merge the original form.
Following the same line of thought, I was wondering if reserved would make the above idea come to life without completely just removing the field (as just removing it right now would be breaking).
Yeah, it'd all be about compile time breaks. But we didn't ever use this, so doesn't really matter tbh, can leave reserved, I'm ok with that, so long as no SDK starts referencing this enum at all before their protos are updated referencing this newer one (which for Go means a tagged API release).
There was a problem hiding this comment.
I would just remove it, since we know it wasn't used. No reason to keep it around and possibly confuse someone later
There was a problem hiding this comment.
Already removed, but ok removing the reserving of the enum name, though would like to preserve the reserving of the enum number
There was a problem hiding this comment.
make-ci-build prevents us from removing the field without reserving the name. I think it's fine to leave the name reserved, I can make the comment more clear if that helps
| // suggest_continue_as_new is not true when the Target Version Changed. | ||
| // See target_worker_deployment_version_changed to find out if Target Version Changed. | ||
| reserved 4; | ||
| reserved "SUGGEST_CONTINUE_AS_NEW_REASON_TARGET_WORKER_DEPLOYMENT_VERSION_CHANGED"; |
There was a problem hiding this comment.
(was just asked to weigh on this, I was attempting to defer, sorry for late review)
I was originally going to strongly suggest deprecate because this is a compile-time break, but looking at temporalio/sdk-go#2171 it is not. But IMO we should still deprecate instead of remove because it is tagged API that has consequences for our other code-generating users, and should be harmless to do so.
| // True if Workflow's Target Worker Deployment Version is different from its Pinned Version and | ||
| // the workflow is Pinned. | ||
| // Experimental. | ||
| bool target_worker_deployment_version_changed = 9; |
There was a problem hiding this comment.
so it's easy enough to teach them to pay attention to this new TargetWorkerDeploymentVersionChanged flag.
To confirm, this is asking every SDK to add a new workflow getter? And once seen will this field need to always stay true or does it go back and forth?
I tried reading the description, but can you confirm why this is not acceptable as a CAN reason? Specifically, can you confirm will we not set suggest_continue_as_new to true once this is set? Because I would expect if suggest_continue_as_new is true, the reason would be present. And I would expect the whole reason for suggest_continue_as_new_reasons is so people can decide if the reasons represent ones they consider to be for continue as new.
Overall, the concept of suggest_continue_as_new == len(suggest_continue_as_new_reasons) > 0 is best from a comprehension standpoint, and users can choose to not adopt the suggestion. Had the reasons come before the bool, we probably would have never had the bool.
There was a problem hiding this comment.
To confirm, this is asking every SDK to add a new workflow getter?
yes
And once seen will this field need to always stay true or does it go back and forth?
I believe this field would have to be set to true, once set, for the entire duration of a workflow run. However, once it does CAN, this value would be initialized to a false. Folks could have multiple deployments of worker-versions and the idea is that on every new worker-version becoming the new "current", this getter could be set to true.
I tried reading the description, but can you confirm why this is not acceptable as a CAN reason
I think the main problem with this is simply that people were not looking at the CAN reasons when deciding if their Pinned workflows should CAN or not. When we rolled out this change, folks who had if workflow_info.suggestContinueAsNew{ do continue-as-new } in their Pinned workflows experienced non-required CAN'ing on a new deployment.
Specifically, can you confirm will we not set suggest_continue_as_new to true once this is set
Seeing this in the PR description: "I am proposing to completely remove the "new target version" reason from the can-suggested reasons list." which makes me believe that the main intention was to get rid of this reason from the list itself. I think Carly's idea is to just not use this and deviate away from this initial idea of using suggest_continue_as_new being true to using the new getter on TargetWorkerDeploymentVersionChanged.
There was a problem hiding this comment.
Ok, yes, looking for confirmation that this new getter for target version deployment changed will now be completely divorced from CAN. If it is completely separate, do want to confirm with SDK team in general that we're ok with this new workflow surface area and confirmation that once set true, it remains true for lifetime of workflow from user POV.
There was a problem hiding this comment.
confirmation that once set true, it remains true for lifetime of workflow from user POV.
Is that a requirement from SDK to work well? In principle, it does not have to remain true for ever because user might manually move the workflow to the new build after which the "target version changed" event becomes out of date.
There was a problem hiding this comment.
I'm also thinking what if we just sent the wfInfo.get_target_deployment_version and user could compare it with the wfInfo.get_current_deployment_version to make CaN decisions?
There was a problem hiding this comment.
Is that a requirement from SDK to work well?
Not for the SDK to work well, but for user understanding. If it's a string that gets changed on successive tasks, no problem. But if it's a bool as proposed here, I would recommend the concept of changed being sticky for the rest of the workflow without having to set it to true on each successive task (meaning we memoize it as true first time we see it as true). But a bool on every successive task is probably fine too, we just need to know.
If using strings instead, feel free to only set those when changed and ask SDK to memoize first seen (we don't want duplicate data continually resent on successive history tasks).
There was a problem hiding this comment.
I'm also thinking what if we just sent the wfInfo.get_target_deployment_version and user could compare it with the wfInfo.get_current_deployment_version to make CaN decisions?
On first thought, I like this. But, thinking about it more, doing this would make it harder to protect people from the infinite CaN loop problem with our "only tell a workflow that its Target Version changed if the Target Version changed after the workflow started" solution.
So I'm going to go ahead with the current proposal.
cretz
left a comment
There was a problem hiding this comment.
Approved with the understanding that this new boolean is wholly unrelated to suggest_continue_as_new
Sushisource
left a comment
There was a problem hiding this comment.
This approach works for me
…ged (#9239) ## What changed? Set TargetVersionChanged instead of SuggestCaN when TargetVersionChanged temporalio/api#709 ## Why? Setting SuggestContinueAsNew=true for Pinned workflows whenever their is a new Target Version available for that workflow causes Pinned workflows to hit that condition much more frequently than they expect. Users who are currently doing: if workflow_info.suggestContinueAsNew{ do continue-as-new } in their Pinned workflow code would need to change that code to protect themselves from running into an infinite-CaN-loop, because the default CaN behavior for a Pinned workflow is to stay Pinned. We should not force users to protect themselves from such a situation. Because upgrading on continue-as-new is opt-in, receiving the suggestion to continue-as-new-onto-new-target-version should be opt-in as well. If people are forced to check the new suggest-continue-as-new-reasons field to "opt out," that is unsafe, because inevitably some people will forget to do so or misunderstand, and then get hit by this unexpected footgun. Much safer and still ergonomical to let upgrade-on-can be opt-in on both fronts, as proposed here. With this change, the people who are currently doing if workflow_info.suggestContinueAsNew{ do continue-as-new } won't see any change in semantics, regardless of their versioning behavior. People who consciously know that they want to do upgrade-on-can / Trampolining will have to change their CaN options anyway, so it's easy enough to teach them to pay attention to this new TargetWorkerDeploymentVersionChanged flag. ## How did you test it? - [ ] built - [ ] run locally and tested manually - [x] covered by existing tests - [ ] added new unit test(s) - [x] added new functional test(s) ## Potential risks None <!-- CURSOR_SUMMARY --> --- > [!NOTE] > **Medium Risk** > Touches workflow task started event generation/persistence and versioning-related signaling, which can affect worker behavior and history compatibility; changes are gated by dynamic config and covered by tests. > > **Overview** > Stops using `SuggestContinueAsNew` (and its reason tags) to signal pinned workflows that a newer target worker deployment version exists, and instead introduces an explicit `TargetWorkerDeploymentVersionChanged` boolean on `WorkflowTaskStarted` events and persisted `WorkflowExecutionInfo`. > > Adds namespace dynamic config `EnableSendTargetVersionChanged` (default on) and a new metric `workflow_target_version_changed_count` emitted when this flag is set; updates the workflow task state machine, mutable state plumbing/mocks, proto/pb persistence, and functional tests accordingly. Also bumps `go.temporal.io/api` to pick up the new event attribute. > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit 3886491. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup> <!-- /CURSOR_SUMMARY -->
READ BEFORE MERGING: All PRs require approval by both Server AND SDK teams before merging! This is why the number of required approvals is "2" and not "1"--two reviewers from the same team is NOT sufficient. If your PR is not approved by someone in BOTH teams, it may be summarily reverted.
target_worker_deployment_version_changedflag to be sent in WorkflowTaskStartedEvent and exposed next tocontinue-as-new-suggestedin the workflow info.Why?
Setting
SuggestContinueAsNew=truefor Pinned workflows whenever their is a new Target Version available for that workflow causes Pinned workflows to hit that condition much more frequently than they expect. Users who are currently doing:if workflow_info.suggestContinueAsNew{ do continue-as-new }in their Pinned workflow code would need to change that code to protect themselves from running into an infinite-CaN-loop, because the default CaN behavior for a Pinned workflow is to stay Pinned.We should not force users to protect themselves from such a situation.
Because upgrading on continue-as-new is opt-in, receiving the suggestion to continue-as-new-onto-new-target-version should be opt-in as well. If people are forced to check the new
suggest-continue-as-new-reasonsfield to "opt out," that is unsafe, because inevitably some people will forget to do so or misunderstand, and then get hit by this unexpected footgun.Much safer and still ergonomical to let upgrade-on-can be opt-in on both fronts, as proposed here. With this change, the people who are currently doing
if workflow_info.suggestContinueAsNew{ do continue-as-new }won't see any change in semantics, regardless of their versioning behavior.People who consciously know that they want to do upgrade-on-can / Trampolining will have to change their CaN options anyway, so it's easy enough to teach them to pay attention to this new
TargetWorkerDeploymentVersionChangedflag.I am proposing to completely remove the "new target version" reason from the can-suggested reasons list. Some SDK PRs use it, and they should stop. We could also simply deprecate the reason and just tell people that it's not used anymore, but I'd prefer to flat out delete it. Server is already patched to not send that flag by default.
Server PR
temporalio/temporal#9239