Skip to content

fix: prevent session message loss in LocalManagedAgent._persist_state#1727

Merged
MervinPraison merged 2 commits into
mainfrom
cursor/critical-correctness-bugs-05fc
May 23, 2026
Merged

fix: prevent session message loss in LocalManagedAgent._persist_state#1727
MervinPraison merged 2 commits into
mainfrom
cursor/critical-correctness-bugs-05fc

Conversation

@cursor
Copy link
Copy Markdown
Contributor

@cursor cursor Bot commented May 23, 2026

Bug and impact

LocalManagedAgent._persist_state() could silently delete chat messages from the JSON session store when metadata was persisted while another writer (or another DefaultSessionStore instance) had added messages via add_message().

This affects managed/local agent runs that call _persist_state() after each turn while the gateway, TUI, or another process also writes to the same session file.

Root cause

The DefaultSessionStore path used the same unsafe pattern fixed in #1724 for other mutators:

  1. get_session() β†’ stale in-memory copy
  2. Update metadata
  3. _save_session() β†’ full session overwrite on disk

If a concurrent add_message() ran between steps 1 and 3, those messages were lost.

Fix

Use update_session_metadata() (locked read-modify-write from disk) instead of get_session() + _save_session().

Validation

  • Added test_local_managed_persist_state_preserves_messages reproducing the two-instance race from session store tests.
  • Test passed locally.
Open in WebΒ View AutomationΒ 

Summary by CodeRabbit

  • Bug Fixes

    • Improved session state persistence to prevent accidental loss of user messages added concurrently to the session store.
  • Tests

    • Added test coverage verifying that session chat history and metadata are correctly preserved during concurrent session operations.

Review Change Stack

Use update_session_metadata (locked read-modify-write) instead of
get_session + _save_session, which could overwrite messages written
concurrently via add_message on another store instance.

Adds regression test mirroring session store race coverage from #1724.

Co-authored-by: Mervin Praison <MervinPraison@users.noreply.github.com>
@MervinPraison
Copy link
Copy Markdown
Owner

@coderabbitai review

@MervinPraison
Copy link
Copy Markdown
Owner

/review

@qodo-code-review
Copy link
Copy Markdown

Qodo reviews are paused for this user.

Troubleshooting steps vary by plan Learn more β†’

On a Teams plan?
Reviews resume once this user has a paid seat and their Git account is linked in Qodo.
Link Git account β†’

Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center?
These require an Enterprise plan - Contact us
Contact us β†’

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 23, 2026

βœ… Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 23, 2026

Important

Review skipped

Bot user detected.

To trigger a single review, invoke the @coderabbitai review command.

βš™οΈ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 724e0dc6-31be-45bb-b713-7a56c6cabdf1

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • πŸ” Trigger review
πŸ“ Walkthrough

Walkthrough

This PR fixes session state persistence in LocalManagedAgent to prevent concurrent message overwrites. The implementation now prefers update_session_metadata() when available, otherwise loads and merges into the existing session instead of creating new records. A test validates messages added concurrently to disk are preserved after state persistence.

Changes

Session Metadata Persistence Fix

Layer / File(s) Summary
Session metadata persistence refactor and validation
src/praisonai/integrations/managed_local.py, src/praisonai/tests/unit/integrations/test_managed_agents.py
LocalManagedAgent._persist_state() now calls update_session_metadata() when available; otherwise loads the existing session and merges metadata in-place before saving, removing the prior behavior that created new SessionData records. New test test_local_managed_persist_state_preserves_messages() verifies that concurrently added user messages are retained when agent state is persisted.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

  • MervinPraison/PraisonAI#1709: Fixes update_session_metadata() to prevent stale-cache overwrites and message loss, which is the underlying store method that the main PR now prefers to call.
  • MervinPraison/PraisonAI#1724: Modifies DefaultSessionStore atomic/locked mutation mechanisms to prevent overwriting concurrent message writes, complementing the main PR's changes to how metadata is persisted.

Poem

🐰 Concurrency's dance, now in perfect stride,
Messages merge, none left behind,
Metadata flows where session state glides,
A safer persist, peace of mindβ€”aye! 🌟

πŸš₯ Pre-merge checks | βœ… 5
βœ… Passed checks (5 passed)
Check name Status Explanation
Description Check βœ… Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check βœ… Passed The title clearly and specifically summarizes the main fix: preventing message loss in LocalManagedAgent._persist_state, which is the central change across both modified files.
Docstring Coverage βœ… Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check βœ… Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check βœ… Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
πŸ§ͺ Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch cursor/critical-correctness-bugs-05fc

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❀️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@MervinPraison
Copy link
Copy Markdown
Owner

@copilot Do a thorough review of this PR. Read ALL existing reviewer comments above from Qodo, Coderabbit, and Gemini first β€” incorporate their findings.

Review areas:

  1. Bloat check: Are changes minimal and focused? Any unnecessary code or scope creep?
  2. Security: Any hardcoded secrets, unsafe eval/exec, missing input validation?
  3. Performance: Any module-level heavy imports? Hot-path regressions?
  4. Tests: Are tests included? Do they cover the changes adequately?
  5. Backward compat: Any public API changes without deprecation?
  6. Code quality: DRY violations, naming conventions, error handling?
  7. Address reviewer feedback: If Qodo, Coderabbit, or Gemini flagged valid issues, include them in your review
  8. Suggest specific improvements with code examples where possible

Copy link
Copy Markdown
Contributor

Copilot AI commented May 23, 2026

@copilot Do a thorough review of this PR. Read ALL existing reviewer comments above from Qodo, Coderabbit, and Gemini first β€” incorporate their findings.

Review areas:...

Completed a full review and incorporated all available reviewer signals.

Action taken (fd788e9): I fixed a backward-compatibility gap in _persist_state fallback logic (managed_local.py) so stores without update_session_metadata still handle get_session(...) is None by creating SessionData (restores prior behavior while keeping the race-safe path).

Findings:

  • Bloat/scope: changes remain minimal and focused to session metadata persistence.
  • Security: no secrets/unsafe eval/exec introduced; no new input-handling surface.
  • Performance: no heavy module-level imports added; hot path unchanged except metadata write method choice.
  • Tests: regression test for message-preservation race is in place and passes locally (test_local_managed_persist_state_preserves_messages).
  • Backward compatibility: addressed by fd788e9 fallback restoration.
  • Code quality: no DRY or naming regressions found.
  • Reviewer feedback: Qodo had no review (paused), Coderabbit had no actionable code issue besides a style suggestion; Gemini had no posted review comments.

CI note: multiple workflow runs reported failure with zero jobs created (workflow-level failure/skip behavior), so there were no failed job logs to act on in this PR scope.

@MervinPraison
Copy link
Copy Markdown
Owner

@claude You are the FINAL architecture reviewer. If the branch is under MervinPraison/PraisonAI (not a fork), you are able to make modifications to this branch and push directly. SCOPE: Focus ONLY on Python packages (praisonaiagents, praisonai). Do NOT modify praisonai-rust or praisonai-ts. Read ALL comments above from Gemini, Qodo, CodeRabbit, and Copilot carefully before responding.

Phase 1: Review per AGENTS.md

  1. Protocol-driven: check heavy implementations vs core SDK
  2. Backward compatible: ensure zero feature regressions
  3. Performance: no hot-path regressions

Phase 2: FIX Valid Issues
4. For any VALID bugs or architectural flaws found by Gemini, CodeRabbit, Qodo, Copilot, or any other reviewer: implement the fix
5. Push all code fixes directly to THIS branch (do NOT create a new PR)
6. Comment a summary of exact files modified and what you skipped

Phase 3: Final Verdict
7. If all issues are resolved, approve the PR / close the Issue
8. If blocking issues remain, request changes / leave clear action items

@praisonai-triage-agent
Copy link
Copy Markdown
Contributor

praisonai-triage-agent Bot commented May 23, 2026

Claude finished @MervinPraison's task β€”β€” View job


Final Architecture Review βœ…

Phase 1: Architecture Review per AGENTS.md

  • Read AGENTS.md architecture guidelines
  • Review protocol-driven design compliance
  • Check backward compatibility
  • Verify performance impact
  • Analyze previous reviewer feedback

Phase 2: Fix Valid Issues

  • Validate Copilot's backward compatibility fix
  • Check for any remaining architectural flaws
  • Implement fixes if needed (none found)

Phase 3: Final Verdict

  • Provide approval or request changes
  • Document findings and recommendations

Comprehensive Architecture Review

βœ… Protocol-Driven Design Compliance

  • Changes are correctly implemented in wrapper layer (praisonai package)
  • Uses existing session store protocols (DefaultSessionStore, SessionData)
  • No core SDK modifications - follows architecture boundaries

βœ… Backward Compatibility Verified

  • Main fix: Uses update_session_metadata() for race-safe metadata updates
  • Copilot's fix (fd788e9): Handles legacy case where get_session() returns None by creating new SessionData
  • Zero breaking changes: All existing behavior preserved

βœ… Performance Impact Assessment

  • No hot-path regressions: _persist_state() only called after agent turns
  • No new dependencies: Uses existing session store infrastructure
  • Minimal overhead: Metadata-only updates when possible

βœ… Multi-Agent Safety & Async Safety

  • Each agent manages its own session state independently
  • No shared mutable global state introduced
  • Race condition properly addressed via locked read-modify-write operations

βœ… Previous Reviewer Findings Addressed

  • Qodo: Reviews paused (no actionable issues)
  • CodeRabbit: No actionable comments, all pre-merge checks passed
  • Copilot: Implemented backward compatibility fix (validated)

βœ… Test Coverage Adequate

  • Race condition test (test_local_managed_persist_state_preserves_messages) reproduces exact issue
  • Validates both metadata persistence AND message preservation
  • No real agentic test needed for session persistence logic

FINAL VERDICT: βœ… APPROVED

This PR successfully fixes a critical race condition in session message persistence while maintaining full architectural compliance and backward compatibility. All reviewer feedback has been addressed, and the implementation follows PraisonAI's protocol-driven design principles.

Recommendation: Merge this PR - the fix is minimal, focused, and addresses a real correctness issue without introducing any architectural debt.


| Branch: cursor/critical-correctness-bugs-05fc

Copilot AI requested a review from MervinPraison May 23, 2026 09:08
@MervinPraison MervinPraison marked this pull request as ready for review May 23, 2026 09:28
Copilot AI review requested due to automatic review settings May 23, 2026 09:28
@qodo-code-review
Copy link
Copy Markdown

Qodo reviews are paused for this user.

Troubleshooting steps vary by plan Learn more β†’

On a Teams plan?
Reviews resume once this user has a paid seat and their Git account is linked in Qodo.
Link Git account β†’

Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center?
These require an Enterprise plan - Contact us
Contact us β†’

@greptile-apps
Copy link
Copy Markdown

greptile-apps Bot commented May 23, 2026

Greptile Summary

This PR fixes a race condition in LocalManagedAgent._persist_state() where the old get_session + _save_session pattern could silently drop messages written concurrently by another DefaultSessionStore instance. The fix replaces it with update_session_metadata(), which performs a file-locked read-modify-write.

  • managed_local.py: _persist_state now calls store.update_session_metadata(session_id, **state) when available, falling back to the original (unsafe) path only for stores lacking that method.
  • test_managed_agents.py: Adds test_local_managed_persist_state_preserves_messages, a two-instance test that verifies messages added between a stale _load_session and _persist_state are not lost.

Confidence Score: 3/5

The core race-condition fix is correct, but the chosen helper skips None-valued fields, meaning a cleared compute instance ID is not written back to disk and will be incorrectly restored on the next session resume.

The switch to update_session_metadata correctly solves the message-loss race. However, that helper's if value is None: continue guard introduces a new defect: when _compute_instance_id is set to None after a shutdown and _persist_state is called again, the old instance ID stays in metadata. _restore_state then re-hydrates the freed ID, leaving the agent pointing at a resource that no longer exists. The test added does not cover this path.

src/praisonai/praisonai/integrations/managed_local.py β€” specifically the interaction between None-valued state fields and the update_session_metadata helper's skip logic.

Important Files Changed

Filename Overview
src/praisonai/praisonai/integrations/managed_local.py Replaces racy get+save with locked update_session_metadata, but the new helper silently skips None-valued fields, introducing a regression where clearing compute_instance_id is not persisted.
src/praisonai/tests/unit/integrations/test_managed_agents.py Adds a well-structured two-instance race-condition test that correctly validates both message preservation and metadata correctness after _persist_state.

Sequence Diagram

sequenceDiagram
    participant A as LocalManagedAgent
    participant S as DefaultSessionStore (reader)
    participant D as Disk (session.json)
    participant W as Concurrent Writer

    Note over A,W: Old (racy) path
    A->>S: get_session(id) β†’ stale copy
    W->>D: add_message("second") [atomic write]
    A->>D: _save_session(stale copy) [overwrites "second"]

    Note over A,W: New (fixed) path
    A->>S: "update_session_metadata(id, **state)"
    S->>D: FileLock acquired
    S->>D: _load_session_from_disk β†’ fresh copy (has "second")
    S->>D: mutate metadata only
    S->>D: atomic write (messages preserved)
    S->>D: FileLock released
Loading

Reviews (1): Last reviewed commit: "fix: preserve legacy _persist_state fall..." | Re-trigger Greptile

Comment on lines +491 to +492
if hasattr(store, "update_session_metadata"):
store.update_session_metadata(self._session_id, **state)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Stale compute_instance_id after instance shutdown

DefaultSessionStore.update_session_metadata silently skips any field whose value is None (see if value is None: continue in store.py:541). This means if a compute instance is freed mid-session β€” setting self._compute_instance_id = None at line 1043 β€” the subsequent _persist_state() call won't overwrite the old instance ID that is already in the metadata. On the next _restore_state(), line 547 (if meta.get("compute_instance_id"): self._compute_instance_id = ...) will restore the stale ID, leaving the agent referencing a freed compute resource.

The same applies to compute_provider when self._compute is None. Previously the raw session.metadata.update(state) explicitly stored None, clearing the stale value. The new path no longer does so.

@cursor cursor Bot review requested due to automatic review settings May 23, 2026 09:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants