Add workflow spec-level retry policies#279
Conversation
jamescmartinez
commented
Feb 11, 2026
- Clean up backoff/retry impl to be consistent across backends
- Change backoff policy to use duration strings for intervals
- Add user-configurable retries
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Pull request overview
This PR standardizes exponential backoff/retry behavior across OpenWorkflow backends by introducing a shared backoff utility, switching interval configuration to duration strings, and adding per-workflow configurable retry policies (including maximum attempts).
Changes:
- Added
core/backoff.tswithBackoffPolicy+computeBackoffDelayMs()and migrated worker poll backoff to use it. - Introduced workflow-level
RetryPolicy(duration-string intervals +maximumAttempts) and centralized failure-state computation viacomputeFailedWorkflowRunUpdate(). - Updated Postgres/SQLite backends and tests/docs to support passing and respecting a retry policy.
Reviewed changes
Copilot reviewed 19 out of 19 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| packages/openworkflow/workflow.ts | Adds RetryPolicy, default policy, and shared failure-update computation. |
| packages/openworkflow/workflow.test.ts | Adds tests for computeFailedWorkflowRunUpdate. |
| packages/openworkflow/core/backoff.ts | New shared backoff computation using duration strings. |
| packages/openworkflow/core/backoff.test.ts | Tests for backoff computation behavior. |
| packages/openworkflow/worker.ts | Migrates poll backoff to shared utility; adds resolveRetryPolicy(). |
| packages/openworkflow/worker.test.ts | Adds tests for retry policy resolution and spec override behavior. |
| packages/openworkflow/backend.ts | Extends FailWorkflowRunParams to include retryPolicy. |
| packages/openworkflow/sqlite/backend.ts | Uses shared workflow failure-update computation instead of inline backoff logic. |
| packages/openworkflow/postgres/backend.ts | Uses shared workflow failure-update computation instead of SQL backoff calculation. |
| packages/openworkflow/execution.ts | Threads retryPolicy through execution to backend failure handling. |
| packages/openworkflow/index.ts | Exposes RetryPolicy type from the public entrypoint. |
| packages/openworkflow/core/retry.ts | Removes old default retry policy constant. |
| packages/docs/docs/workflows.mdx | Documents per-workflow retryPolicy usage. |
| packages/docs/docs/retries.mdx | Updates retry documentation to include maximumAttempts and policy fields. |
| packages/docs/docs/advanced-patterns.mdx | Updates guidance to point to workflow-level retryPolicy. |
| packages/docs/docs/roadmap.mdx | Marks configurable retry policies as completed. |
| ARCHITECTURE.md | Documents retry policy behavior and how it affects failure/finalization. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| * Compute capped exponential backoff for a 1-based attempt number. | ||
| * @param policy - Backoff policy | ||
| * @param attempt - Attempt number (attempt 1 uses initial interval) |
There was a problem hiding this comment.
The JSDoc says attempt is 1-based, but the implementation explicitly treats attempt = 0 as attempt 1 (and tests rely on that). Update the documentation to match the actual accepted inputs (e.g., "attempt >= 0" or "attempt 0 is treated as 1") to avoid misuse by callers.
| * Compute capped exponential backoff for a 1-based attempt number. | |
| * @param policy - Backoff policy | |
| * @param attempt - Attempt number (attempt 1 uses initial interval) | |
| * Compute capped exponential backoff for an attempt number (1-based; attempt 0 is treated as 1). | |
| * @param policy - Backoff policy | |
| * @param attempt - Attempt number (attempt >= 0; attempt 0 and 1 use the initial interval) |
| if (!partial) return DEFAULT_WORKFLOW_RETRY_POLICY; | ||
| return { ...DEFAULT_WORKFLOW_RETRY_POLICY, ...partial }; |
There was a problem hiding this comment.
resolveRetryPolicy() returns the DEFAULT_WORKFLOW_RETRY_POLICY object by reference when no partial is provided. Because the returned RetryPolicy is mutable, external callers can accidentally mutate the shared default and affect all future runs. Prefer always returning a new object (e.g., spread/structuredClone) and/or freezing the default to prevent accidental mutation.
| if (!partial) return DEFAULT_WORKFLOW_RETRY_POLICY; | |
| return { ...DEFAULT_WORKFLOW_RETRY_POLICY, ...partial }; | |
| const base: RetryPolicy = { ...DEFAULT_WORKFLOW_RETRY_POLICY }; | |
| if (!partial) return base; | |
| return { ...base, ...partial }; |
| partial?: Partial<RetryPolicy>, | ||
| ): RetryPolicy { | ||
| if (!partial) return DEFAULT_WORKFLOW_RETRY_POLICY; | ||
| return { ...DEFAULT_WORKFLOW_RETRY_POLICY, ...partial }; |
There was a problem hiding this comment.
resolveRetryPolicy() currently spreads partial over the defaults, so a caller can unintentionally override defaults with undefined fields (e.g., { maximumAttempts: undefined as any }), leading to surprising behavior like attempts >= maximumAttempts becoming false due to NaN. Consider normalizing the partial by ignoring undefined values and validating the resulting policy (non-negative durations, finite numbers) before returning it.
| return { ...DEFAULT_WORKFLOW_RETRY_POLICY, ...partial }; | |
| // Normalize the partial policy by ignoring undefined values so they do not | |
| // override defaults. | |
| const normalized: Partial<RetryPolicy> = {}; | |
| for (const [key, value] of Object.entries(partial)) { | |
| if (value !== undefined) { | |
| (normalized as any)[key] = value; | |
| } | |
| } | |
| const policy: RetryPolicy = { | |
| ...DEFAULT_WORKFLOW_RETRY_POLICY, | |
| ...normalized, | |
| }; | |
| // Validate numeric fields to catch invalid configurations early. | |
| for (const [key, value] of Object.entries(policy)) { | |
| if (typeof value === "number") { | |
| if (!Number.isFinite(value) || value < 0) { | |
| throw new Error( | |
| `Invalid retry policy: field "${key}" must be a non-negative, finite number`, | |
| ); | |
| } | |
| } | |
| } | |
| return policy; |
| workflowRunId: string; | ||
| workerId: string; | ||
| error: SerializedError; | ||
| retryPolicy: RetryPolicy; |
There was a problem hiding this comment.
Adding retryPolicy as a required field on FailWorkflowRunParams is a breaking change for third-party Backend implementations (the Backend interface is part of the public surface via OpenWorkflowOptions). If backward compatibility is desired, consider making retryPolicy optional and defaulting it internally (or bumping the package version/changelog accordingly).
| retryPolicy: RetryPolicy; | |
| retryPolicy?: RetryPolicy; |
| test("returns default policy when partial is undefined", () => { | ||
| const result = resolveRetryPolicy(); | ||
| expect(result).toEqual(DEFAULT_WORKFLOW_RETRY_POLICY); | ||
| }); |
There was a problem hiding this comment.
This test case claims to cover "partial is undefined" but calls resolveRetryPolicy() without actually passing undefined. To ensure the intended behavior is covered, call resolveRetryPolicy(undefined) in this test (or remove it if it's redundant with the previous case).
| defineWorkflow, | ||
| defineWorkflowSpec, | ||
| isWorkflow, | ||
| RetryPolicy, |
There was a problem hiding this comment.
RetryPolicy is a type-only export (export type RetryPolicy = ...). Importing it as a value (import { RetryPolicy } ...) can cause a runtime ESM error ("module does not provide an export") in Vitest/Vite transpilation. Change the import to a type-only import (e.g., import { ..., type RetryPolicy } or a separate import type).
| RetryPolicy, | |
| type RetryPolicy, |