Parent: #163
Goal
Persist task saga lifecycle, leases, heartbeats, stale-worker recovery decisions, and compensation requirements.
This ticket is about making "stale workers" a durable control-plane state instead of an operator surprise. A crashed or expired worker should be recoverable because the task saga says who owns it, when the lease expires, and what cleanup/compensation is required.
Scope
Expected owned surface:
src/forge_loop/tasks/saga.py
- a new task saga store module under
src/forge_loop/tasks/ if needed;
- focused tests under
tests/test_task_saga_store.py or equivalent.
This can be a standalone store first. Runner integration can be a later ticket unless a narrow hook is cheap and well-tested.
Required Behavior
- Create a planned/dispatched/running task saga with issue number, branch, worktree, and registered compensations.
- Acquire a lease with owner ID and expiry time.
- Extend a heartbeat before expiry.
- Detect stale/expired leases.
- Mark terminal states: completed, failed, compensated, quarantined.
- Prevent leasing or mutating terminal tasks except for explicitly allowed audit metadata.
- Record compensation decisions with enough data to clean a worktree/branch later.
Acceptance Tests
- New task starts leaseable and becomes running after lease acquisition.
- Heartbeat extends the lease.
- Expired task is reported stale.
- Completed task cannot be leased again.
- Failed task requires or preserves a compensation record.
- Reopening the store preserves task state, lease expiry, and compensations.
Non-goals
- Do not implement microVM isolation here.
- Do not delete real worktrees in tests.
- Do not change worker execution behavior until the store contract is green.
- Do not silently ignore terminal-state mutations.
Verification
Run at minimum:
- New task saga store tests.
env -u VIRTUAL_ENV uv run --extra dev pytest tests/test_eventlog_sqlite.py -q if the store writes events.
env -u VIRTUAL_ENV uv run --extra dev ruff check <changed files>
env -u VIRTUAL_ENV uv run --extra dev ruff format --check <changed files>
Customer Story
An operator dispatching workers on a real repository without trusting their context or code benefits because stale or crashed workers become explicit recoverable task states.
Source
Expanded during Forge self-dogfood sprint planning on 2026-06-02.
Parent: #163
Goal
Persist task saga lifecycle, leases, heartbeats, stale-worker recovery decisions, and compensation requirements.
This ticket is about making "stale workers" a durable control-plane state instead of an operator surprise. A crashed or expired worker should be recoverable because the task saga says who owns it, when the lease expires, and what cleanup/compensation is required.
Scope
Expected owned surface:
src/forge_loop/tasks/saga.pysrc/forge_loop/tasks/if needed;tests/test_task_saga_store.pyor equivalent.This can be a standalone store first. Runner integration can be a later ticket unless a narrow hook is cheap and well-tested.
Required Behavior
Acceptance Tests
Non-goals
Verification
Run at minimum:
env -u VIRTUAL_ENV uv run --extra dev pytest tests/test_eventlog_sqlite.py -qif the store writes events.env -u VIRTUAL_ENV uv run --extra dev ruff check <changed files>env -u VIRTUAL_ENV uv run --extra dev ruff format --check <changed files>Customer Story
An operator dispatching workers on a real repository without trusting their context or code benefits because stale or crashed workers become explicit recoverable task states.
Source
Expanded during Forge self-dogfood sprint planning on 2026-06-02.