-
Notifications
You must be signed in to change notification settings - Fork 23
Description
Summary
When --state-file is reused across runs (e.g. during development testing), MemorySaver accumulates state from previous checkpoints. A results list initialized with state.get("results") or [] carries over from the previous run, causing duplicate entries in subsequent runs.
Root Cause
MemorySaver + --state-file persists the full graph state between process invocations. If a LangGraph state field (like results: list) is initialized using state.get("results") or [], it inherits the accumulated list from the previous run's checkpoint rather than starting fresh.
The runtime does not inject a fresh thread_id per invocation, so all runs share the same checkpoint namespace when the same --state-file is used.
Observed Behaviour
- Run agent against 3 invoices →
resultslist has 3 entries - Run agent again against 3 new invoices (same state file) →
resultslist has 6 entries (3 old + 3 new) - Each
--resumeadds another copy
Workaround
Explicitly reset batch-scoped fields at the start of each logical run — do not use state.get("field") or []:
def init_batch(state: InvoiceState) -> dict:
return {
"results": [], # Always start fresh, ignore checkpoint
"pending_invoices": state["pending_invoices"],
}Also delete the state file between independent runs during local development.
Suggested Fix
- Document that
MemorySaver+--state-fileaccumulates state across process invocations and that batch-scoped fields must be explicitly reset. - Consider injecting a fresh
thread_idper invocation (unless--resumeis explicitly passed), so checkpoints from different runs don't share state by default. - Add a CLI flag
--new-runor--reset-statethat starts a freshthread_ideven when--state-fileis present.
Impact
- Severity: Medium
- Silent data duplication — very hard to diagnose
- Particularly insidious during development when the same state file is reused across iterations