Skip to content

Add idempotency key support for task deduplication#88

Open
AntoineToussaint wants to merge 3 commits intomainfrom
feat/idempotency-key
Open

Add idempotency key support for task deduplication#88
AntoineToussaint wants to merge 3 commits intomainfrom
feat/idempotency-key

Conversation

@AntoineToussaint
Copy link
Copy Markdown
Member

@AntoineToussaint AntoineToussaint commented Apr 9, 2026

Summary

See https://www.notion.so/tensorzerodotcom/Fanout-Only-once-Durable-Executions-33d7520bbad380c480a9c015c66cbd62?source=copy_link

Two new fields on SpawnOptions for controlling task deduplication:

  • only_once: bool — auto-derives idempotency key from hash(task_name, params). First spawn creates the task, duplicates are no-ops.
  • idempotency_key: Option<String> — explicit key for when params differ but the operation is the same. Takes precedence over only_once.

Default behavior (only_once: false, no key) is unchanged — every spawn creates a new task.

Motivation

In multi-instance deployments, multiple clients may observe the same event and independently try to spawn the same logical task. Without dedup, N instances create N tasks that all execute. This is wasteful for expensive operations and incorrect for non-idempotent ones.

DB Changes

  • New nullable idempotency_key TEXT column on task tables
  • Partial unique index WHERE idempotency_key IS NOT NULL
  • spawn_task checks for existing non-terminal task with same key before inserting
  • Migration adds column/index to all existing queues
  • ensure_queue_tables updated for new queues

Design doc

See docs/design/durable-idempotency-key.md in the autopilot repo.

Test plan

  • Spawn with only_once: true — second spawn returns existing task
  • Spawn with explicit idempotency_key — same behavior
  • Spawn without key — creates new task (backward compatible)
  • Completed/failed/cancelled task with same key — new spawn succeeds
  • Existing tests pass

🤖 Generated with Claude Code

@AntoineToussaint AntoineToussaint force-pushed the feat/idempotency-key branch 3 times, most recently from 4b21848 to 42e29c1 Compare April 9, 2026 21:45
Two new fields on SpawnOptions:
- `only_once: bool` — auto-derives key from hash(task_name, params)
- `idempotency_key: Option<String>` — explicit key (takes precedence)

When a key is set, spawn_task checks for an existing non-terminal task
with the same key. If found, returns the existing task instead of
creating a duplicate. This enables first-served semantics where multiple
clients can safely try to spawn the same logical task.

DB changes:
- New nullable `idempotency_key` column on task tables
- Partial unique index on idempotency_key WHERE NOT NULL
- Migration adds column/index to all existing queues
- ensure_queue_tables updated for new queues

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
AntoineToussaint added a commit to tensorzero/tensorzero that referenced this pull request Apr 10, 2026
Deduplication will be handled at the durable crate level via idempotency
keys (tensorzero/durable#88), not per-instance caching.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Per review on #88: auto-deriving a key from `hash(task_name, params)`
makes it ambiguous what happens when a task is spawned multiple times
with different settings (e.g. a different `max_attempts` would still
hash to the same key and be treated as a duplicate). Callers should
decide what "the same task" means for their use case and pass an
explicit `idempotency_key`.

Drops `SpawnOptions.only_once` and the corresponding SQL fallback in
`spawn_task`. `idempotency_key` is now the single mechanism for
deduplication.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
end if;
end if;
v_cancellation := p_options->'cancellation';
-- Extract parent_task_id for subtask tracking
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you re-add this comment?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 447f2a6 — restored in both sql/schema.sql and the migration.

@@ -0,0 +1,295 @@
-- Add idempotency key support for task deduplication.
-- When set, only the first spawn with a given key creates a task.
-- Subsequent spawns with the same key (for non-terminal tasks) are no-ops.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the non-terminal check is somewhat surprising. In the autopilot use case, consider this scenario:

  1. Gateway 1 streams tool tool calls, spawns a task, which then fails
  2. Gateway 2 starts up, and does the same thing (spawning a new task).

I'm not sure if this is the behavior that we want, especially if the task is something long-running the like GEPA.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gateway 1 wouldn't deal with the error and retry?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed — switched to strict semantics. The check is now unconditional (where idempotency_key = $1), so a key is permanently bound to its first task regardless of state (running/completed/failed/cancelled). Updated the column comment, migration header, and SpawnOptions::idempotency_key docstring to make this explicit. Caller owns retries — pick a new key if you want to retry after failure.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't understand the comment (and the PR), but yes, I think idempotency should be definitely burnt after use.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code was wrong.

Address review feedback: drop the `state not in (terminal)` filter so
that a key is bound to its first task forever, regardless of state. This
matches the principle that an idempotency key has exactly one owner —
if you want to retry after a failure, pick a new key.

- Spawn check is now an unconditional `where idempotency_key = ?`
- Column comment, migration header comment, and SpawnOptions docstring
  all updated to make the strict semantics explicit
- Restored the `-- Extract parent_task_id for subtask tracking` comment
  that was accidentally removed (per Aaron's review)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants