Include transient and speculative WFT events in GetWorkflowExecutionHistoryResponse#9325
Merged
Include transient and speculative WFT events in GetWorkflowExecutionHistoryResponse#9325
Conversation
…-premature-end-stream
…-premature-end-stream
…-premature-end-stream
…-premature-end-stream
…-premature-end-stream
spkane31
commented
Feb 18, 2026
| FirstEventId: firstEventID, | ||
| NextEventId: nextEventID, | ||
| PersistenceToken: persistenceToken, | ||
| TransientWorkflowTask: response.GetTransientWorkflowTask(), |
Contributor
Author
There was a problem hiding this comment.
The TransientWorkflowTask here is the main change from #9138
Contributor
Author
There was a problem hiding this comment.
The same change is made in respondworkflowtaskcompleted. The reason for the change is the continuation token only fetches the transient tasks on the first page, if there are multiple pages and the workflow updates during the pagination then the continuation token will have outdate information. GetWorkflowExecutionHistory now handles transient events by querying mutable state so this is unnecessary.
stephanos
approved these changes
Feb 19, 2026
| for range 3 { | ||
| events := s.GetHistory(s.Namespace().String(), workflowExecution) | ||
| if len(events) == 8 { | ||
| if len(events) >= 8 { |
Contributor
There was a problem hiding this comment.
nit: should this be == 9?
02strich
added a commit
that referenced
this pull request
Feb 23, 2026
* origin/main: CHASM: improve support for implementing Terminate method (#9351) Add testhooks package documentation (#9373) Improve re-usability of ringpop membership & PerNamespaceWorker (#9321) Fairness counter: fix heap bug in map counter (#9370) Avoid finalGC when ack level is zero (#9371) Fairness counter: persist top K keys (#9188) Flake Fix: In Reactivation Cache tests, wait for appropriate delays when confirming expected drainage status (#9352) Include transient and speculative WFT events in GetWorkflowExecutionHistoryResponse (#9325) Fix flaky test TestTransitionDuringTransientTask (#9356) Add per-workflow scheduler for history task processing (#9141) Populate currentAttemptScheduledTime on PollActivityTaskQueueResponse for standalone activities (#9333) Standalone activity heartbeating bug fix (#9354) Revert "Last part of making Nexus work OOTB" (#9343) Convert flake report from Python to Go (#9334) Do not enforce payload limits for system nexus endpoint (#9344)
stephanos
pushed a commit
to stephanos/temporal
that referenced
this pull request
Feb 23, 2026
…istoryResponse (temporalio#9325) ## What changed? Re-does temporalio#9138 which was incidentally merged. Include transient and speculative WFT events in `GetWorkflowExecutionHistoryReponse` response, unless UI or CLI made request. * Adds `transient_or_speculative_events` back to `GetMutableStateResponse` * Reserve `transient_workflow_task` in `HisotryCOntinuation` token * Add validation helpers * Add query-compare-query for transient events at request start and end Re-implements temporalio#7732 ## Why? Fix "premature end of stream" errors when workers request history after cache eviction w/ transient/speculative workflow tasks present. This adds transient & speculative WFT events in `GetWorkflowExecutionHistory` (already in `PollWorkflowTask`). Worker cache eviction w/ speculative workflow tasks causes the expected and actual event counts to be different. temporalio#7732 passed transient events through continuation tokens, which could become stale during pagination. This PR implements mutable state querying at both start and end of pagination and compares transient event IDs to detect if WFT state changed during pagination and return a retryable error. ## How did you test it? - [X] built - [X] run locally and tested manually - [X] covered by existing tests - [ ] added new unit test(s) - [ ] added new functional test(s) ## Potential risks Same risks from temporalio#7732
dandavison
added a commit
to dandavison/temporalio-temporal
that referenced
this pull request
Feb 25, 2026
Three test categories: 1. TestTransientWFTEventsInGetHistory: TaskPoller-based diagnostic that validates GetWorkflowExecutionHistory includes transient events for a pending transient WFT (confirms PR temporalio#9325 works for the simple case). 2. TestPrematureEndOfStreamStress: parameterized stress test using real SDK workers with speculative WFTs (Updates) + sticky cache miss + gRPC interceptor + concurrent mutations. Explores parameter space (cache eviction, concurrent signals, history fetch delays, WFT timeouts, workflow counts). 3. TestPrematureEndOfStreamShardClosure: reliably reproduces the bug by closing the shard between the SDK receiving the speculative WFT poll response and calling GetWorkflowExecutionHistory. The shard reopens with fresh mutable state that has lost the in-memory speculative events, causing the 2-event gap. Key finding: transient WFTs (from failures) cannot trigger this bug because the server clears sticky on WFT failure (failWorkflowTask in workflow_task_state_machine.go), routing the retry to the normal queue where full history is in the poll response. The bug requires speculative WFTs (Updates) on the sticky queue with a cache miss.
iw
added a commit
to iw/temporal
that referenced
this pull request
Mar 19, 2026
Merges temporalio/temporal main branch up to df2e384. Key upstream changes: - Customizable serialization (temporalio#8426): EncodingTypeFromEnv(), EncodingType() on Encoder - Per-workflow scheduler for history task processing (temporalio#9141) - Ringpop membership & PerNamespaceWorker reusability (temporalio#9321) - Transient/speculative WFT events in history response (temporalio#9325) - Per-check diagnostics in DeepHealthCheck API (temporalio#9350) - Fairness counter heap bug fix (temporalio#9370) - System nexus endpoint (temporalio#9002) - Mixed brain non-blocking (temporalio#9406) - interface{} → any across persistence layer - Various CI, test, and chasm improvements DSQL fork code preserved: - TxRetryPolicy/TxRetryMetrics/TxRetryPolicyProvider (common.go) - ExecutionStoreCreator interface and factory wrapping (factory.go) - OCC-aware lockShard bypass (shard.go) - PoolSizeHint ephemeral pool sizing (version_checker.go) - DSQL-safe InitializeSystemNamespaces (metadata_manager.go) - Snowflake ID generator (idgenerator.go) - RegisterPluginAlias (store.go) - Full DSQL plugin (sqlplugin/dsql/)
birme
pushed a commit
to eyevinn-osaas/temporal
that referenced
this pull request
Mar 23, 2026
…istoryResponse (temporalio#9325) ## What changed? Re-does temporalio#9138 which was incidentally merged. Include transient and speculative WFT events in `GetWorkflowExecutionHistoryReponse` response, unless UI or CLI made request. * Adds `transient_or_speculative_events` back to `GetMutableStateResponse` * Reserve `transient_workflow_task` in `HisotryCOntinuation` token * Add validation helpers * Add query-compare-query for transient events at request start and end Re-implements temporalio#7732 ## Why? Fix "premature end of stream" errors when workers request history after cache eviction w/ transient/speculative workflow tasks present. This adds transient & speculative WFT events in `GetWorkflowExecutionHistory` (already in `PollWorkflowTask`). Worker cache eviction w/ speculative workflow tasks causes the expected and actual event counts to be different. temporalio#7732 passed transient events through continuation tokens, which could become stale during pagination. This PR implements mutable state querying at both start and end of pagination and compares transient event IDs to detect if WFT state changed during pagination and return a retryable error. ## How did you test it? - [X] built - [X] run locally and tested manually - [X] covered by existing tests - [ ] added new unit test(s) - [ ] added new functional test(s) ## Potential risks Same risks from temporalio#7732
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changed?
Re-does #9138 which was incidentally merged.
Include transient and speculative WFT events in
GetWorkflowExecutionHistoryReponseresponse, unless UI or CLI made request.transient_or_speculative_eventsback toGetMutableStateResponsetransient_workflow_taskinHisotryCOntinuationtokenRe-implements #7732
Why?
Fix "premature end of stream" errors when workers request history after cache eviction w/ transient/speculative workflow tasks present. This adds transient & speculative WFT events in
GetWorkflowExecutionHistory(already inPollWorkflowTask). Worker cache eviction w/ speculative workflow tasks causes the expected and actual event counts to be different. #7732 passed transient events through continuation tokens, which could become stale during pagination. This PR implements mutable state querying at both start and end of pagination and compares transient event IDs to detect if WFT state changed during pagination and return a retryable error.How did you test it?
Potential risks
Same risks from #7732