Skip to content

Updated proto with API changes#710

Open
saad1998 wants to merge 1 commit intotemporalio:masterfrom
saad1998:feature/issue-778-schedule-to-start-timeout
Open

Updated proto with API changes#710
saad1998 wants to merge 1 commit intotemporalio:masterfrom
saad1998:feature/issue-778-schedule-to-start-timeout

Conversation

@saad1998
Copy link

@saad1998 saad1998 commented Feb 9, 2026

READ BEFORE MERGING: All PRs require approval by both Server AND SDK teams before merging! This is why the number of required approvals is "2" and not "1"--two reviewers from the same team is NOT sufficient. If your PR is not approved by someone in BOTH teams, it may be summarily reverted.

What changed?
Added optional schedule_to_start_timeout (google.protobuf.Duration, field number 4) to WorkflowTaskScheduledEventAttributes in temporal/api/history/v1/message.proto.
Regenerated Go and OpenAPI in this repo (via make proto / project tooling).

Why?

Fixes temporalio/temporal#778: the server applies a schedule-to-start timeout (e.g. 5s for speculative workflow tasks, or the sticky timeout) but the event did not record it, so UIs could show a different value (e.g. 10s) and users were confused when the task timed out earlier. Storing the timeout on the event makes the server’s behavior visible and consistent for clients and UIs.

Breaking changes

None. New field is optional; existing events and clients remain valid.

Server PR

Server PR (to temporalio/temporal) that populates and uses this field: temporalio/temporal#9257

@CLAassistant
Copy link

CLAassistant commented Feb 9, 2026

CLA assistant check
All committers have signed the CLA.

@saad1998 saad1998 marked this pull request as ready for review February 9, 2026 10:20
@saad1998 saad1998 requested review from a team as code owners February 9, 2026 10:20
//
// (-- api-linter: core::0140::prepositions=disabled
// aip.dev/not-precedent: "to" is used to indicate interval. --)
google.protobuf.Duration schedule_to_start_timeout = 4;
Copy link
Member

@cretz cretz Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you clarify whether this has a specific recent need, or was this just something found on the backlog at temporalio/temporal#778? History fields are pretty scrutinized, so would want to confirm there is a pressing reason for this.

Also, the PR description is missing the accompanying server PR showing how this will be implemented.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change came from the backlog.
There isn’t a new production incident driving it.
The original issue was observability: the UI could show a user-configured schedule-to-start timeout (e.g. 10s) while the server used a different (e.g. hardcoded) value, so users couldn’t see what was actually in effect from history. The goal is to make history reflect the schedule-to-start timeout when the server enforces one, so there’s no pressing reason beyond that observability/consistency improvement. If you’d rather not add history fields for that alone, I’m happy to close or repurpose the PR.

Also thanks for pointing out the missing server PR I thought I added it in the description.

//
// (-- api-linter: core::0140::prepositions=disabled
// aip.dev/not-precedent: "to" is used to indicate interval. --)
google.protobuf.Duration schedule_to_start_timeout = 4;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would like to understand details of how this value is present and/or updated or not on successive task failures or timeouts. I wouldn't expect speculative task failure is ever recorded, or is it? And as for sticky, I suppose it can only occur once because rescheduled task does not have a schedule to start?

Any reason not to add this field to WorkflowTaskTimedOutEventAttributes instead?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what I think is happening so far the server only sets schedule_to_start_timeout when it actually enforces a schedule-to-start timeout i.e. for sticky workflow tasks and for speculative workflow tasks on the normal queue. For a normal task on the normal queue the field stays unset, so it isn’t “updated” across successive timeouts in the usual retry sense; it’s only present on the scheduled events where that timeout applies.

For speculative tasks, the schedule-to-start timeout is recorded: when the timer fires we create the WORKFLOW_TASK_SCHEDULED event at that moment (in AddWorkflowTaskScheduleToStartTimeoutEvent) and then add the WORKFLOW_TASK_TIMED_OUT event, so the scheduled event and its timeout are written together.

For sticky, schedule-to-start only applies to the sticky task; after it times out we reschedule on the normal queue, and that new scheduled event doesn’t have a schedule-to-start timeout, so the field wouldn’t be set there.

I’m happy to put the duration on WorkflowTaskTimedOutEventAttributes instead it would make the timeout event self-contained (you’d see both the timeout type and the duration that fired), avoid ambiguity about when the field is set on the scheduled event, and better match “this is what timed out.” I can switch the change to add the field there (and only populate it when timeout_type is SCHEDULE_TO_START) if that’s the direction you prefer; otherwise I can keep it on the scheduled event and add a short comment documenting this lifecycle.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add ScheduleToStart timeout to WorkflowTaskScheduledEvent

3 participants