Skip to content

[DX] GAP-18: MemorySaver + --state-file accumulates state across runs causing silent data duplication #1507

@AlexBizon

Description

@AlexBizon

Summary

When --state-file is reused across runs (e.g. during development testing), MemorySaver accumulates state from previous checkpoints. A results list initialized with state.get("results") or [] carries over from the previous run, causing duplicate entries in subsequent runs.

Root Cause

MemorySaver + --state-file persists the full graph state between process invocations. If a LangGraph state field (like results: list) is initialized using state.get("results") or [], it inherits the accumulated list from the previous run's checkpoint rather than starting fresh.

The runtime does not inject a fresh thread_id per invocation, so all runs share the same checkpoint namespace when the same --state-file is used.

Observed Behaviour

  1. Run agent against 3 invoices → results list has 3 entries
  2. Run agent again against 3 new invoices (same state file) → results list has 6 entries (3 old + 3 new)
  3. Each --resume adds another copy

Workaround

Explicitly reset batch-scoped fields at the start of each logical run — do not use state.get("field") or []:

def init_batch(state: InvoiceState) -> dict:
    return {
        "results": [],          # Always start fresh, ignore checkpoint
        "pending_invoices": state["pending_invoices"],
    }

Also delete the state file between independent runs during local development.

Suggested Fix

  1. Document that MemorySaver + --state-file accumulates state across process invocations and that batch-scoped fields must be explicitly reset.
  2. Consider injecting a fresh thread_id per invocation (unless --resume is explicitly passed), so checkpoints from different runs don't share state by default.
  3. Add a CLI flag --new-run or --reset-state that starts a fresh thread_id even when --state-file is present.

Impact

  • Severity: Medium
  • Silent data duplication — very hard to diagnose
  • Particularly insidious during development when the same state file is reused across iterations

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions