fix: prevent session message loss in set_agent_info and clear_session by cursor[bot] · Pull Request #1724 · MervinPraison/PraisonAI

cursor · 2026-05-22T09:05:31Z

Bug and impact

DefaultSessionStore.set_agent_info, clear_session, and set_gateway_info still used stale in-memory session objects and could overwrite on-disk chat history written by add_message on another store instance (same class of bug fixed for update_session_metadata in PR #1709).

Trigger: Agent initialisation calls set_agent_info while another worker/process persists messages; or bot session cleanup races with concurrent message writes.

Impact: Silent loss of user/assistant messages in JSON session files (data loss).

Root cause

These mutators called _load_session() (cache-first), modified the object, then _save_session() without reloading from disk inside FileLock — so a stale copy missing recent messages could be written atomically over fresher data.

Fix

Added _modify_session_locked() helper: reload from disk under FileLock, apply mutator, atomic write, update cache.
Routed set_agent_info, clear_session, set_gateway_info, and update_session_metadata through it.
Regression tests for set_agent_info and clear_session two-store races.

Validation

pytest tests/unit/session/test_session_store.py (metadata, set_agent_info, clear_session race tests) — 4 passed.

Summary by CodeRabbit

Tests
- Added regression tests to verify session data persistence and consistency across concurrent access scenarios.
Improvements
- Enhanced session update reliability to better preserve existing data during modifications.

Apply locked read-modify-write (same pattern as update_session_metadata) to set_agent_info, clear_session, and set_gateway_info so stale in-memory copies cannot overwrite messages written by add_message on another instance. Co-authored-by: Mervin Praison <MervinPraison@users.noreply.github.com>

MervinPraison · 2026-05-22T09:05:41Z

@coderabbitai review

MervinPraison · 2026-05-22T09:05:42Z

/review

qodo-code-review · 2026-05-22T09:05:46Z

Qodo reviews are paused for this user.

Troubleshooting steps vary by plan Learn more →

On a Teams plan?
Reviews resume once this user has a paid seat and their Git account is linked in Qodo.
Link Git account →

Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center?
These require an Enterprise plan - Contact us
Contact us →

coderabbitai · 2026-05-22T09:05:48Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai · 2026-05-22T09:05:58Z

Important

Review skipped

Bot user detected.

To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 3a391353-0076-407d-aaea-b1f07d632d8a

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

📝 Walkthrough

Walkthrough

The PR refactors session mutation in store.py to centralize duplicated read-modify-write logic into a new _modify_session_locked() helper. This helper uses file locking, disk reload, closure-based mutation, message trimming, atomic writes, and cache updates. Four session setter methods (set_agent_info, clear_session, update_session_metadata, set_gateway_info) are refactored to delegate persistence to this helper, and two regression tests verify message preservation across multi-instance access patterns.

Changes

Session mutation centralization

Layer / File(s)	Summary
Session mutation helper and imports `src/praisonai-agents/praisonaiagents/session/store.py`	`Callable` is added to typing imports. New `_load_session_from_disk()` and `_modify_session_locked()` helpers implement locked session mutation: acquire `FileLock`, reload from disk, apply a provided mutator closure, update `updated_at`, enforce `max_messages` trimming, write atomically via tempfile + `os.replace`, update in-memory cache, and clean up on errors.
Session setter refactoring `src/praisonai-agents/praisonaiagents/session/store.py`	`set_agent_info`, `clear_session`, `update_session_metadata`, and `set_gateway_info` now define closure-based mutators and delegate to `_modify_session_locked()` instead of inlining read-modify-write and atomic-write logic, eliminating duplication and standardizing failure handling.
Message preservation regression tests `src/praisonai-agents/tests/unit/session/test_session_store.py`	New test `test_set_agent_info_preserves_messages` verifies agent info updates do not discard messages written by a different store instance (two-instance write/read workflow with cache invalidation). New test `test_clear_session_preserves_new_messages` verifies that clearing session does not erase messages added by another instance after the first instance loaded the session.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related issues

docs: update for PraisonAI #1709 — session metadata multi-worker safety + update_session_metadata() API reference PraisonAIDocs#408: Documentation updates describing _modify_session_locked, cross-process file locking, reload-inside-lock, and atomic write semantics for the refactored session mutation methods.

Possibly related PRs

MervinPraison/PraisonAI#1709: Prior fix for stale in-memory overwrites in update_session_metadata persistence; this PR refactors the same code path into the centralized locked mutation helper and updates the corresponding regression tests.

Poem

🐰 A rabbit locks the warren door,
Reloads what was there before,
Mutates each burrow with care,
Atoms write—oh, how fair!
Cache refreshed, no messages lost,
No duplication cost!

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and specifically summarizes the main change: preventing session message loss in two key methods through refactoring session mutation logic.
Docstring Coverage	✅ Passed	Docstring coverage is 82.35% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch cursor/critical-correctness-bugs-0514

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

MervinPraison · 2026-05-22T09:06:07Z

@copilot Do a thorough review of this PR. Read ALL existing reviewer comments above from Qodo, Coderabbit, and Gemini first — incorporate their findings.

Review areas:

Bloat check: Are changes minimal and focused? Any unnecessary code or scope creep?
Security: Any hardcoded secrets, unsafe eval/exec, missing input validation?
Performance: Any module-level heavy imports? Hot-path regressions?
Tests: Are tests included? Do they cover the changes adequately?
Backward compat: Any public API changes without deprecation?
Code quality: DRY violations, naming conventions, error handling?
Address reviewer feedback: If Qodo, Coderabbit, or Gemini flagged valid issues, include them in your review
Suggest specific improvements with code examples where possible

Copilot · 2026-05-22T09:09:17Z

@copilot Do a thorough review of this PR. Read ALL existing reviewer comments above from Qodo, Coderabbit, and Gemini first — incorporate their findings.

Review areas:...

Reviewed thoroughly at commit e3edac9.

Findings:

Bloat/scope: Minimal and focused (2 files, 1 helper + 4 call sites + 2 regression tests). No scope creep.
Security: No secrets, no eval/exec, no unsafe input paths introduced.
Performance: No heavy imports. One extra disk read only on mutation paths (not read hot-path), appropriate for correctness under race conditions.
Tests: Regression coverage added for set_agent_info and clear_session stale-cache races; existing metadata race test still covered.
Backward compatibility: No public API signature changes.
Code quality: Good DRY improvement by consolidating locked read-modify-write logic.
Reviewer feedback: Qodo had no review (paused), Coderabbit has not posted actionable findings yet (processing note only), and no Gemini review comments were present.

One small improvement I suggest (optional) is adding a set_gateway_info race regression test mirroring the two new tests, since that method was also migrated to the locked helper.

MervinPraison · 2026-05-22T09:09:19Z

@claude You are the FINAL architecture reviewer. If the branch is under MervinPraison/PraisonAI (not a fork), you are able to make modifications to this branch and push directly. SCOPE: Focus ONLY on Python packages (praisonaiagents, praisonai). Do NOT modify praisonai-rust or praisonai-ts. Read ALL comments above from Gemini, Qodo, CodeRabbit, and Copilot carefully before responding.

Phase 1: Review per AGENTS.md

Protocol-driven: check heavy implementations vs core SDK
Backward compatible: ensure zero feature regressions
Performance: no hot-path regressions

Phase 2: FIX Valid Issues
4. For any VALID bugs or architectural flaws found by Gemini, CodeRabbit, Qodo, Copilot, or any other reviewer: implement the fix
5. Push all code fixes directly to THIS branch (do NOT create a new PR)
6. Comment a summary of exact files modified and what you skipped

Phase 3: Final Verdict
7. If all issues are resolved, approve the PR / close the Issue
8. If blocking issues remain, request changes / leave clear action items

praisonai-triage-agent · 2026-05-22T09:09:43Z

Addresses Copilot's suggestion to ensure set_gateway_info also preserves messages when racing with concurrent message writes from another instance. Completes test coverage for all four methods migrated to _modify_session_locked. Co-authored-by: Mervin Praison <MervinPraison@users.noreply.github.com>

qodo-code-review · 2026-05-22T09:45:24Z

Qodo reviews are paused for this user.

Troubleshooting steps vary by plan Learn more →

On a Teams plan?
Reviews resume once this user has a paid seat and their Git account is linked in Qodo.
Link Git account →

Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center?
These require an Enterprise plan - Contact us
Contact us →

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

greptile-apps · 2026-05-22T09:48:11Z

Greptile Summary

This PR extends the read-under-FileLock-then-write pattern (first applied to update_session_metadata in #1709) to set_agent_info, clear_session, and set_gateway_info by introducing a _modify_session_locked helper that centralises the atomic read-modify-write cycle.

Introduces _modify_session_locked and _load_session_from_disk helpers so all session mutators reload from disk inside a FileLock before writing, preventing a stale in-memory copy from overwriting newer messages.
Refactors set_agent_info, clear_session, set_gateway_info, and the already-fixed update_session_metadata to delegate through the new helper.
Adds regression tests for the two-store race scenario across all three newly-fixed methods.

Confidence Score: 4/5

The change is safe to merge; the core read-under-lock pattern is sound and matches the already-proven approach used by add_message.

The refactor correctly consolidates four mutator methods behind a single atomic helper, eliminating the stale-cache overwrite bug. A silent error swallow in _load_session_from_disk and a misleading test name are the only issues found — both are minor and don't affect correctness.

store.py: the _load_session_from_disk helper should log on parse/IO error as the existing _load_session does.

Important Files Changed

Filename	Overview
src/praisonai-agents/praisonaiagents/session/store.py	Core fix: new _modify_session_locked helper consolidates atomic read-modify-write; all four mutator methods now route through it. One silent error-swallow in _load_session_from_disk worth logging.
src/praisonai-agents/tests/unit/session/test_session_store.py	Adds three regression tests covering set_agent_info, clear_session, and set_gateway_info two-store races; test name for clear_session test is misleading but logic is correct.

Sequence Diagram

sequenceDiagram
    participant W as Writer (store A)
    participant F as FileLock (disk)
    participant R as Reader (store B)

    W->>F: add_message("second") — acquires lock, reloads, appends, writes
    F-->>W: lock released, disk has [first, second]

    R->>R: _load_session() — stale cache has [first] only (OLD BUG)

    Note over R,F: NEW FIX: _modify_session_locked()
    R->>F: set_agent_info / clear_session / set_gateway_info — acquires lock
    F->>F: _load_session_from_disk() — reads [first, second] from disk
    R->>R: mutator(session) applied on fresh data
    R->>F: atomic write (temp → replace)
    F-->>R: lock released, cache updated

_{Reviews (1): Last reviewed commit: "test: add regression test for set_gatewa..." | Re-trigger Greptile}

greptile-apps · 2026-05-22T09:48:15Z

+    def _load_session_from_disk(self, session_id: str, filepath: str) -> SessionData:
+        """Load session JSON from disk (caller must hold FileLock)."""
+        if os.path.exists(filepath):
+            try:
+                with open(filepath, "r", encoding="utf-8") as f:
+                    data = json.load(f)
+                return SessionData.from_dict(data)
+            except (json.JSONDecodeError, IOError):
+                pass
+        return SessionData(session_id=session_id)


_load_session_from_disk silently swallows json.JSONDecodeError and IOError without any log output, whereas the original _load_session logs a warning. When a session file is corrupt or unreadable, the error will be invisible, making it very hard to diagnose data-loss incidents in production.

Suggested change

def _load_session_from_disk(self, session_id: str, filepath: str) -> SessionData:

"""Load session JSON from disk (caller must hold FileLock)."""

if os.path.exists(filepath):

try:

with open(filepath, "r", encoding="utf-8") as f:

data = json.load(f)

return SessionData.from_dict(data)

except (json.JSONDecodeError, IOError):

pass

return SessionData(session_id=session_id)

def _load_session_from_disk(self, session_id: str, filepath: str) -> SessionData:

"""Load session JSON from disk (caller must hold FileLock)."""

if os.path.exists(filepath):

try:

with open(filepath, "r", encoding="utf-8") as f:

data = json.load(f)

return SessionData.from_dict(data)

except (json.JSONDecodeError, IOError) as e:

logger.warning(f"Failed to load session {session_id}: {e}")

return SessionData(session_id=session_id)

greptile-apps · 2026-05-22T09:48:16Z

+    def test_clear_session_preserves_new_messages(self, temp_store):
+        """Clear must reload from disk so concurrent adds are not lost."""


The test name and docstring say "preserves new messages," but the assertion checks len(history) == 0 — i.e., that all messages were cleared. The test is validating correct clear_session behaviour (reload-then-clear), not message preservation. The misleading name could cause a future reader to misinterpret the intent and write a wrong expectation.

Suggested change

def test_clear_session_preserves_new_messages(self, temp_store):

"""Clear must reload from disk so concurrent adds are not lost."""

def test_clear_session_reads_latest_disk_state(self, temp_store):

"""Clear must reload from disk so it operates on the latest persisted state."""

…#1727) * fix: prevent session message loss in LocalManagedAgent._persist_state Use update_session_metadata (locked read-modify-write) instead of get_session + _save_session, which could overwrite messages written concurrently via add_message on another store instance. Adds regression test mirroring session store race coverage from #1724. Co-authored-by: Mervin Praison <MervinPraison@users.noreply.github.com> * fix: preserve legacy _persist_state fallback when session is absent Agent-Logs-Url: https://github.com/MervinPraison/PraisonAI/sessions/4d08ce0a-1b7b-4e50-b841-6a08416f2f53 Co-authored-by: MervinPraison <454862+MervinPraison@users.noreply.github.com> --------- Co-authored-by: Cursor Agent <cursoragent@cursor.com> Co-authored-by: Mervin Praison <MervinPraison@users.noreply.github.com> Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: MervinPraison <454862+MervinPraison@users.noreply.github.com>

Copilot started work on behalf of MervinPraison May 22, 2026 09:06 View session

Copilot AI requested a review from MervinPraison May 22, 2026 09:09

Copilot finished work on behalf of MervinPraison May 22, 2026 09:09

MervinPraison marked this pull request as ready for review May 22, 2026 09:45

Copilot AI review requested due to automatic review settings May 22, 2026 09:45

Copilot started reviewing on behalf of MervinPraison May 22, 2026 09:45 View session

Copilot AI reviewed May 22, 2026

View reviewed changes

greptile-apps Bot reviewed May 22, 2026

View reviewed changes

MervinPraison merged commit 321b6b2 into main May 22, 2026
19 of 20 checks passed

MervinPraison mentioned this pull request May 22, 2026

docs: update session-persistence concurrency notes + API reference for PR #1724 (DefaultSessionStore reload-under-lock fix) MervinPraison/PraisonAIDocs#417

Open

8 tasks

cursor Bot mentioned this pull request May 23, 2026

fix: prevent session message loss in LocalManagedAgent._persist_state #1727

Merged

MervinPraison mentioned this pull request May 23, 2026

docs: document session-store concurrency guarantees + LocalManagedAgent persist-state safety (PR #1727) MervinPraison/PraisonAIDocs#423

Open

6 tasks

		def test_clear_session_preserves_new_messages(self, temp_store):
		"""Clear must reload from disk so concurrent adds are not lost."""

Uh oh!

Conversation

cursor Bot commented May 22, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Bug and impact

Root cause

Fix

Validation

Summary by CodeRabbit

Uh oh!

MervinPraison commented May 22, 2026

Uh oh!

MervinPraison commented May 22, 2026

Uh oh!

qodo-code-review Bot commented May 22, 2026

Qodo reviews are paused for this user.

Uh oh!

coderabbitai Bot commented May 22, 2026

Uh oh!

coderabbitai Bot commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Estimated code review effort

Possibly related issues

Possibly related PRs

Poem

Uh oh!

MervinPraison commented May 22, 2026

Uh oh!

Copilot AI commented May 22, 2026

Uh oh!

MervinPraison commented May 22, 2026

Uh oh!

praisonai-triage-agent Bot commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Architecture Review Complete ✅

✅ FINAL VERDICT: APPROVED

Architecture Compliance

Code Quality Review

Reviewer Feedback Addressed

Technical Assessment

Uh oh!

qodo-code-review Bot commented May 22, 2026

Qodo reviews are paused for this user.

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot commented May 22, 2026

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps Bot May 22, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot May 22, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

cursor Bot commented May 22, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 22, 2026 •

edited

Loading

praisonai-triage-agent Bot commented May 22, 2026 •

edited

Loading