Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 29 additions & 6 deletions ARCHITECTURE.md
Original file line number Diff line number Diff line change
Expand Up @@ -233,18 +233,41 @@ execution, the replay will reach the failed step and re-execute its function.
### 4.2. Workflow Failures & Retries

If an error is unhandled by the workflow code, the entire workflow run fails.
The workflow run is rescheduled with backoff. By default, retries continue
until canceled or until a configured deadline is reached. If the run can no
longer be retried (for example, because the next retry would exceed
`deadlineAt`), its status is set to `failed` permanently.
The workflow run is rescheduled with backoff according to its **retry policy**.
By default, retries continue until canceled or until a configured deadline is
reached. If the run can no longer be retried (for example, because the next
retry would exceed `deadlineAt` or `maximumAttempts` has been reached), its
status is set to `failed` permanently.

### 4.3. Workflow Deadlines
### 4.3. Retry Policy

A `RetryPolicy` controls the backoff and retry limits for a workflow run. They
are defined at the workflow spec level and apply to all runs of that workflow:

```ts
const workflow = ow.defineWorkflow(
{
name: "charge-customer",
retryPolicy: {
initialInterval: "1s",
backoffCoefficient: 2,
maximumInterval: "100s",
maximumAttempts: Infinity, // unlimited
},
},
async ({ step }) => {
// workflow implementation
},
);
```

### 4.4. Workflow Deadlines

Workflow runs can include an optional `deadlineAt` timestamp, specifying the
time by which the workflow must complete. Steps and retries are skipped if they
would exceed the deadline, making the run permanently `failed`.

### 4.4. Workflow Cancelation
### 4.5. Workflow Cancelation

Workflows can be explicitly canceled at any time via the Client API:

Expand Down
4 changes: 2 additions & 2 deletions packages/docs/docs/advanced-patterns.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -39,8 +39,8 @@ const data = await step.run({ name: "fetch-external-api" }, async () => {
```

The default retry behavior works well for most cases. Configure retry behavior
at the workflow or step level (coming soon) or handle errors explicitly in your
step functions for custom logic.
at the workflow level with `retryPolicy`, or handle errors explicitly in your
step functions for custom logic. See [Retries](/docs/retries) for details.

## Sleeping (Pausing) Workflows

Expand Down
44 changes: 39 additions & 5 deletions packages/docs/docs/retries.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ In any application, things fail sometimes - a third-party API returns a
These are transient failures: they go away on their own if you try again.

OpenWorkflow handles this automatically. When a step throws an error, the
workflow is rescheduled with a exponential backoff (increasing delays between
workflow is rescheduled with an exponential backoff (increasing delays between
retries). Previously completed steps aren't re-run - only the failed step is
retried.

Expand Down Expand Up @@ -60,8 +60,42 @@ Failed workflows are rescheduled with increasing delays:

This prevents overwhelming external services during outages.

By default, retries continue indefinitely. A workflow is marked `failed` when
its configured `deadlineAt` is reached (or the next retry would pass it).
By default, retries continue until canceled or until `deadlineAt` is reached (or
the next retry would pass it). If `maximumAttempts` is configured, the workflow
is also marked `failed` once that limit is reached.

## Retry Policy

Retry behavior is configured per workflow using `retryPolicy` in the workflow
spec:

```ts
import { defineWorkflow } from "openworkflow";

export const chargeCustomer = defineWorkflow(
{
name: "charge-customer",
retryPolicy: {
initialInterval: "1s",
backoffCoefficient: 2,
maximumInterval: "100s",
maximumAttempts: Infinity,
},
},
async ({ step }) => {
// workflow implementation
},
);
```

`retryPolicy` is optional. Any omitted fields use defaults.

| Field | Description |
| -------------------- | ----------------------------------------------------- |
| `initialInterval` | Delay before the first retry after a failed attempt |
| `backoffCoefficient` | Multiplier applied to each subsequent retry delay |
| `maximumInterval` | Upper bound for retry delay |
| `maximumAttempts` | Maximum total attempts, including the initial attempt |

## What Triggers a Retry

Expand Down Expand Up @@ -107,8 +141,8 @@ defineWorkflow({ name: "with-error-handling" }, async ({ input, step }) => {
## Permanent Failures

A workflow is marked as `failed` permanently when it can no longer be retried
(for example, because `deadlineAt` is reached, or the next retry would exceed
that deadline):
(for example, because `deadlineAt` is reached, the next retry would exceed that
deadline, or `maximumAttempts` has been reached):

- The error is stored in the workflow run record
- No more automatic retries occur
Expand Down
2 changes: 1 addition & 1 deletion packages/docs/docs/roadmap.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -15,12 +15,12 @@ description: What's coming next for OpenWorkflow
- ✅ Sleeping (pausing) workflows
- ✅ Workflow versioning
- ✅ Workflow cancelation
- ✅ Configurable retry policies

## Coming Soon

- Idempotency keys
- Rollback / compensation functions
- Configurable retry policies
- Signals for external events
- Native OpenTelemetry integration
- Additional backends (Redis)
Expand Down
24 changes: 24 additions & 0 deletions packages/docs/docs/workflows.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -153,6 +153,30 @@ defineWorkflow(
);
```

### Retry Policy (Optional)

Control backoff and retry limits for this workflow:

```ts
defineWorkflow(
{
name: "process-order",
retryPolicy: {
initialInterval: "1s",
backoffCoefficient: 2,
maximumInterval: "30s",
maximumAttempts: 5,
},
},
async ({ input, step }) => {
// ...
},
);
```

Any `retryPolicy` fields you omit fall back to defaults. See
[Retries](/docs/retries) for the full behavior and defaults.

## Workflow Function Parameters

The workflow function receives an object with three properties:
Expand Down
8 changes: 8 additions & 0 deletions packages/openworkflow/backend.testsuite.ts
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
import type { Backend } from "./backend.js";
import type { StepAttempt } from "./core/step.js";
import type { WorkflowRun } from "./core/workflow.js";
import { DEFAULT_WORKFLOW_RETRY_POLICY } from "./workflow.js";
import { randomUUID } from "node:crypto";
import { afterAll, beforeAll, describe, expect, test } from "vitest";

Expand Down Expand Up @@ -390,6 +391,7 @@ export function testBackend(options: TestBackendOptions): void {
workflowRunId: claimed.id,
workerId: claimed.workerId ?? "",
error: { message: "failed" },
retryPolicy: DEFAULT_WORKFLOW_RETRY_POLICY,
});
await expect(
backend.sleepWorkflowRun({
Expand Down Expand Up @@ -460,6 +462,7 @@ export function testBackend(options: TestBackendOptions): void {
workflowRunId: claimed.id,
workerId,
error,
retryPolicy: DEFAULT_WORKFLOW_RETRY_POLICY,
});

// rescheduled, not permanently failed
Expand Down Expand Up @@ -496,6 +499,7 @@ export function testBackend(options: TestBackendOptions): void {
workflowRunId: claimed.id,
workerId,
error: { message: "first failure" },
retryPolicy: DEFAULT_WORKFLOW_RETRY_POLICY,
});

expect(firstFailed.status).toBe("pending");
Expand All @@ -516,6 +520,7 @@ export function testBackend(options: TestBackendOptions): void {
workflowRunId: claimed.id,
workerId,
error: { message: "second failure" },
retryPolicy: DEFAULT_WORKFLOW_RETRY_POLICY,
});

expect(secondFailed.status).toBe("pending");
Expand Down Expand Up @@ -1017,6 +1022,7 @@ export function testBackend(options: TestBackendOptions): void {
workflowRunId: created.id,
workerId,
error: { message: "test error" },
retryPolicy: DEFAULT_WORKFLOW_RETRY_POLICY,
});

expect(failed.status).toBe("failed");
Expand Down Expand Up @@ -1054,6 +1060,7 @@ export function testBackend(options: TestBackendOptions): void {
workflowRunId: created.id,
workerId,
error: { message: "test error" },
retryPolicy: DEFAULT_WORKFLOW_RETRY_POLICY,
});

expect(failed.status).toBe("pending");
Expand Down Expand Up @@ -1179,6 +1186,7 @@ export function testBackend(options: TestBackendOptions): void {
workflowRunId: claimed.id,
workerId: claimed.workerId,
error: { message: "test error" },
retryPolicy: DEFAULT_WORKFLOW_RETRY_POLICY,
});
}

Expand Down
2 changes: 2 additions & 0 deletions packages/openworkflow/backend.ts
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@ import type { SerializedError } from "./core/error.js";
import { JsonValue } from "./core/json.js";
import type { StepAttempt, StepAttemptContext, StepKind } from "./core/step.js";
import type { WorkflowRun } from "./core/workflow.js";
import type { RetryPolicy } from "./workflow.js";

export const DEFAULT_NAMESPACE_ID = "default";

Expand Down Expand Up @@ -103,6 +104,7 @@ export interface FailWorkflowRunParams {
workflowRunId: string;
workerId: string;
error: SerializedError;
retryPolicy: RetryPolicy;
Copy link

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding retryPolicy as a required field on FailWorkflowRunParams is a breaking change for third-party Backend implementations (the Backend interface is part of the public surface via OpenWorkflowOptions). If backward compatibility is desired, consider making retryPolicy optional and defaulting it internally (or bumping the package version/changelog accordingly).

Suggested change
retryPolicy: RetryPolicy;
retryPolicy?: RetryPolicy;

Copilot uses AI. Check for mistakes.
}

export interface CancelWorkflowRunParams {
Expand Down
6 changes: 5 additions & 1 deletion packages/openworkflow/client.test.ts
Original file line number Diff line number Diff line change
@@ -1,7 +1,10 @@
import { OpenWorkflow } from "./client.js";
import { BackendPostgres } from "./postgres.js";
import { DEFAULT_POSTGRES_URL } from "./postgres/postgres.js";
import { defineWorkflowSpec } from "./workflow.js";
import {
DEFAULT_WORKFLOW_RETRY_POLICY,
defineWorkflowSpec,
} from "./workflow.js";
import { type as arkType } from "arktype";
import { randomUUID } from "node:crypto";
import * as v from "valibot";
Expand Down Expand Up @@ -224,6 +227,7 @@ describe("OpenWorkflow", () => {
workflowRunId: claimed.id,
workerId,
error: { message: "boom" },
retryPolicy: DEFAULT_WORKFLOW_RETRY_POLICY,
});

const rescheduled = await backend.getWorkflowRun({
Expand Down
70 changes: 70 additions & 0 deletions packages/openworkflow/core/backoff.test.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
import { computeBackoffDelayMs } from "./backoff.js";
import { describe, expect, test } from "vitest";

describe("computeBackoffDelayMs", () => {
test("treats attempt 0 like attempt 1", () => {
const delayMs = computeBackoffDelayMs(
{
initialInterval: "1s",
backoffCoefficient: 2,
maximumInterval: "10s",
},
0,
);

expect(delayMs).toBe(1000);
});

test("uses initial interval on attempt 1", () => {
const delayMs = computeBackoffDelayMs(
{
initialInterval: "250ms",
backoffCoefficient: 3,
maximumInterval: "10s",
},
1,
);

expect(delayMs).toBe(250);
});

test("stays constant when coefficient is 1", () => {
const delayMs = computeBackoffDelayMs(
{
initialInterval: "750ms",
backoffCoefficient: 1,
maximumInterval: "10s",
},
9,
);

expect(delayMs).toBe(750);
});

test("caps delay at maximum interval", () => {
const delayMs = computeBackoffDelayMs(
{
initialInterval: "1s",
backoffCoefficient: 3,
maximumInterval: "5s",
},
4,
);

expect(delayMs).toBe(5000);
});

test("returns finite capped values for very large attempts", () => {
const delayMs = computeBackoffDelayMs(
{
initialInterval: "100ms",
backoffCoefficient: 2,
maximumInterval: "60s",
},
10_000,
);

expect(Number.isFinite(delayMs)).toBe(true);
expect(delayMs).toBe(60_000);
});
});
42 changes: 42 additions & 0 deletions packages/openworkflow/core/backoff.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
import type { DurationString } from "./duration.js";
import { parseDuration } from "./duration.js";

/**
* Shared exponential backoff configuration.
*/
export interface BackoffPolicy {
readonly initialInterval: DurationString;
readonly backoffCoefficient: number;
readonly maximumInterval: DurationString;
}

/**
* Compute capped exponential backoff for a 1-based attempt number.
* @param policy - Backoff policy
* @param attempt - Attempt number (attempt 1 uses initial interval)
Comment on lines +14 to +16
Copy link

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The JSDoc says attempt is 1-based, but the implementation explicitly treats attempt = 0 as attempt 1 (and tests rely on that). Update the documentation to match the actual accepted inputs (e.g., "attempt >= 0" or "attempt 0 is treated as 1") to avoid misuse by callers.

Suggested change
* Compute capped exponential backoff for a 1-based attempt number.
* @param policy - Backoff policy
* @param attempt - Attempt number (attempt 1 uses initial interval)
* Compute capped exponential backoff for an attempt number (1-based; attempt 0 is treated as 1).
* @param policy - Backoff policy
* @param attempt - Attempt number (attempt >= 0; attempt 0 and 1 use the initial interval)

Copilot uses AI. Check for mistakes.
* @returns Delay in milliseconds
*/
export function computeBackoffDelayMs(
policy: BackoffPolicy,
attempt: number,
): number {
const initialIntervalMs = parseBackoffIntervalMs(policy.initialInterval);
const maximumIntervalMs = parseBackoffIntervalMs(policy.maximumInterval);

const exponentialBackoffMs =
initialIntervalMs *
Math.pow(policy.backoffCoefficient, Math.max(0, attempt - 1));

return Math.min(exponentialBackoffMs, maximumIntervalMs);
Comment thread
jamescmartinez marked this conversation as resolved.
}

/**
* Parse a backoff interval duration string into milliseconds.
* Invalid runtime values default to 0ms.
* @param interval - Duration string
* @returns Interval in milliseconds
*/
function parseBackoffIntervalMs(interval: DurationString): number {
const parsedInterval = parseDuration(interval);
return parsedInterval.ok ? parsedInterval.value : 0;
}
6 changes: 0 additions & 6 deletions packages/openworkflow/core/retry.ts

This file was deleted.

Loading
Loading