Skip to content

fix: prevent session message loss in set_agent_info and clear_session#1724

Merged
MervinPraison merged 2 commits into
mainfrom
cursor/critical-correctness-bugs-0514
May 22, 2026
Merged

fix: prevent session message loss in set_agent_info and clear_session#1724
MervinPraison merged 2 commits into
mainfrom
cursor/critical-correctness-bugs-0514

Conversation

@cursor
Copy link
Copy Markdown
Contributor

@cursor cursor Bot commented May 22, 2026

Bug and impact

DefaultSessionStore.set_agent_info, clear_session, and set_gateway_info still used stale in-memory session objects and could overwrite on-disk chat history written by add_message on another store instance (same class of bug fixed for update_session_metadata in PR #1709).

Trigger: Agent initialisation calls set_agent_info while another worker/process persists messages; or bot session cleanup races with concurrent message writes.

Impact: Silent loss of user/assistant messages in JSON session files (data loss).

Root cause

These mutators called _load_session() (cache-first), modified the object, then _save_session() without reloading from disk inside FileLock — so a stale copy missing recent messages could be written atomically over fresher data.

Fix

  • Added _modify_session_locked() helper: reload from disk under FileLock, apply mutator, atomic write, update cache.
  • Routed set_agent_info, clear_session, set_gateway_info, and update_session_metadata through it.
  • Regression tests for set_agent_info and clear_session two-store races.

Validation

  • pytest tests/unit/session/test_session_store.py (metadata, set_agent_info, clear_session race tests) — 4 passed.
Open in Web View Automation 

Summary by CodeRabbit

  • Tests

    • Added regression tests to verify session data persistence and consistency across concurrent access scenarios.
  • Improvements

    • Enhanced session update reliability to better preserve existing data during modifications.

Review Change Stack

Apply locked read-modify-write (same pattern as update_session_metadata)
to set_agent_info, clear_session, and set_gateway_info so stale in-memory
copies cannot overwrite messages written by add_message on another instance.

Co-authored-by: Mervin Praison <MervinPraison@users.noreply.github.com>
@MervinPraison
Copy link
Copy Markdown
Owner

@coderabbitai review

@MervinPraison
Copy link
Copy Markdown
Owner

/review

@qodo-code-review
Copy link
Copy Markdown

Qodo reviews are paused for this user.

Troubleshooting steps vary by plan Learn more →

On a Teams plan?
Reviews resume once this user has a paid seat and their Git account is linked in Qodo.
Link Git account →

Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center?
These require an Enterprise plan - Contact us
Contact us →

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 22, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 22, 2026

Important

Review skipped

Bot user detected.

To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 3a391353-0076-407d-aaea-b1f07d632d8a

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

The PR refactors session mutation in store.py to centralize duplicated read-modify-write logic into a new _modify_session_locked() helper. This helper uses file locking, disk reload, closure-based mutation, message trimming, atomic writes, and cache updates. Four session setter methods (set_agent_info, clear_session, update_session_metadata, set_gateway_info) are refactored to delegate persistence to this helper, and two regression tests verify message preservation across multi-instance access patterns.

Changes

Session mutation centralization

Layer / File(s) Summary
Session mutation helper and imports
src/praisonai-agents/praisonaiagents/session/store.py
Callable is added to typing imports. New _load_session_from_disk() and _modify_session_locked() helpers implement locked session mutation: acquire FileLock, reload from disk, apply a provided mutator closure, update updated_at, enforce max_messages trimming, write atomically via tempfile + os.replace, update in-memory cache, and clean up on errors.
Session setter refactoring
src/praisonai-agents/praisonaiagents/session/store.py
set_agent_info, clear_session, update_session_metadata, and set_gateway_info now define closure-based mutators and delegate to _modify_session_locked() instead of inlining read-modify-write and atomic-write logic, eliminating duplication and standardizing failure handling.
Message preservation regression tests
src/praisonai-agents/tests/unit/session/test_session_store.py
New test test_set_agent_info_preserves_messages verifies agent info updates do not discard messages written by a different store instance (two-instance write/read workflow with cache invalidation). New test test_clear_session_preserves_new_messages verifies that clearing session does not erase messages added by another instance after the first instance loaded the session.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related issues

Possibly related PRs

  • MervinPraison/PraisonAI#1709: Prior fix for stale in-memory overwrites in update_session_metadata persistence; this PR refactors the same code path into the centralized locked mutation helper and updates the corresponding regression tests.

Poem

🐰 A rabbit locks the warren door,
Reloads what was there before,
Mutates each burrow with care,
Atoms write—oh, how fair!
Cache refreshed, no messages lost,
No duplication cost!

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically summarizes the main change: preventing session message loss in two key methods through refactoring session mutation logic.
Docstring Coverage ✅ Passed Docstring coverage is 82.35% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch cursor/critical-correctness-bugs-0514

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@MervinPraison
Copy link
Copy Markdown
Owner

@copilot Do a thorough review of this PR. Read ALL existing reviewer comments above from Qodo, Coderabbit, and Gemini first — incorporate their findings.

Review areas:

  1. Bloat check: Are changes minimal and focused? Any unnecessary code or scope creep?
  2. Security: Any hardcoded secrets, unsafe eval/exec, missing input validation?
  3. Performance: Any module-level heavy imports? Hot-path regressions?
  4. Tests: Are tests included? Do they cover the changes adequately?
  5. Backward compat: Any public API changes without deprecation?
  6. Code quality: DRY violations, naming conventions, error handling?
  7. Address reviewer feedback: If Qodo, Coderabbit, or Gemini flagged valid issues, include them in your review
  8. Suggest specific improvements with code examples where possible

Copy link
Copy Markdown
Contributor

Copilot AI commented May 22, 2026

@copilot Do a thorough review of this PR. Read ALL existing reviewer comments above from Qodo, Coderabbit, and Gemini first — incorporate their findings.

Review areas:...

Reviewed thoroughly at commit e3edac9.

Findings:

  1. Bloat/scope: Minimal and focused (2 files, 1 helper + 4 call sites + 2 regression tests). No scope creep.
  2. Security: No secrets, no eval/exec, no unsafe input paths introduced.
  3. Performance: No heavy imports. One extra disk read only on mutation paths (not read hot-path), appropriate for correctness under race conditions.
  4. Tests: Regression coverage added for set_agent_info and clear_session stale-cache races; existing metadata race test still covered.
  5. Backward compatibility: No public API signature changes.
  6. Code quality: Good DRY improvement by consolidating locked read-modify-write logic.
  7. Reviewer feedback: Qodo had no review (paused), Coderabbit has not posted actionable findings yet (processing note only), and no Gemini review comments were present.

One small improvement I suggest (optional) is adding a set_gateway_info race regression test mirroring the two new tests, since that method was also migrated to the locked helper.

@MervinPraison
Copy link
Copy Markdown
Owner

@claude You are the FINAL architecture reviewer. If the branch is under MervinPraison/PraisonAI (not a fork), you are able to make modifications to this branch and push directly. SCOPE: Focus ONLY on Python packages (praisonaiagents, praisonai). Do NOT modify praisonai-rust or praisonai-ts. Read ALL comments above from Gemini, Qodo, CodeRabbit, and Copilot carefully before responding.

Phase 1: Review per AGENTS.md

  1. Protocol-driven: check heavy implementations vs core SDK
  2. Backward compatible: ensure zero feature regressions
  3. Performance: no hot-path regressions

Phase 2: FIX Valid Issues
4. For any VALID bugs or architectural flaws found by Gemini, CodeRabbit, Qodo, Copilot, or any other reviewer: implement the fix
5. Push all code fixes directly to THIS branch (do NOT create a new PR)
6. Comment a summary of exact files modified and what you skipped

Phase 3: Final Verdict
7. If all issues are resolved, approve the PR / close the Issue
8. If blocking issues remain, request changes / leave clear action items

@praisonai-triage-agent
Copy link
Copy Markdown
Contributor

praisonai-triage-agent Bot commented May 22, 2026

Claude finished @MervinPraison's task —— View job


Architecture Review Complete ✅

Phase 1: Review per AGENTS.md

  • Read AGENTS.md for architecture guidelines
  • Review changed files for protocol-driven design
  • Check backward compatibility
  • Assess performance impact

Phase 2: Address Reviewer Feedback

  • Analyze Copilot's suggestion for set_gateway_info test
  • Implement additional regression test (f6e856a)
  • Push fixes to current branch

Phase 3: Final Validation & Verdict

  • Validate code compiles and imports correctly
  • Provide comprehensive review verdict
  • Final architecture approval

✅ FINAL VERDICT: APPROVED

Architecture Compliance

PASS - All AGENTS.md principles followed:

  • ✅ Protocol-driven core: Session management belongs in Core SDK
  • ✅ DRY principle: Consolidated 4 mutation methods into _modify_session_locked helper
  • ✅ Agent-centric: Fixes session persistence for multi-agent workflows
  • ✅ Multi-agent safe: File locking prevents race conditions between instances
  • ✅ Performance-first: No hot-path regressions, only affects mutation paths

Code Quality Review

  1. Bloat check: ✅ Minimal and focused changes (centralized session mutation logic)
  2. Security: ✅ No security issues - proper file locking and atomic writes
  3. Performance: ✅ One extra disk read per mutation under race conditions (appropriate)
  4. Tests: ✅ Comprehensive regression coverage for all migrated methods
  5. Backward compatibility: ✅ Zero breaking changes to public APIs
  6. Code quality: ✅ Excellent DRY improvement consolidating duplicated logic

Reviewer Feedback Addressed

  • Copilot: Added missing test_set_gateway_info_preserves_messages regression test
  • CodeRabbit: No actionable issues identified
  • Qodo: Reviews paused (no issues to address)

Technical Assessment

The PR fixes a critical data loss bug where stale in-memory session objects could overwrite fresh on-disk chat history. The solution follows excellent engineering practices:

  • Root cause: Methods used cache-first loading, missing concurrent updates
  • Fix: Centralized locked mutation pattern with disk reload inside FileLock
  • Validation: Race condition tests verify concurrent message writes are preserved

Files Modified:

  • store.py (+78/-62): Added _modify_session_locked helper, refactored 4 methods
  • test_session_store.py (+37/-0): Regression tests for race conditions
  • Additional test commit: f6e856a

This is exemplary defensive programming that prevents silent data loss in production multi-agent systems. Ready for merge.


Job Run | Branch

Addresses Copilot's suggestion to ensure set_gateway_info also preserves
messages when racing with concurrent message writes from another instance.
Completes test coverage for all four methods migrated to _modify_session_locked.

Co-authored-by: Mervin Praison <MervinPraison@users.noreply.github.com>
@MervinPraison MervinPraison marked this pull request as ready for review May 22, 2026 09:45
Copilot AI review requested due to automatic review settings May 22, 2026 09:45
@qodo-code-review
Copy link
Copy Markdown

Qodo reviews are paused for this user.

Troubleshooting steps vary by plan Learn more →

On a Teams plan?
Reviews resume once this user has a paid seat and their Git account is linked in Qodo.
Link Git account →

Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center?
These require an Enterprise plan - Contact us
Contact us →

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@greptile-apps
Copy link
Copy Markdown

greptile-apps Bot commented May 22, 2026

Greptile Summary

This PR extends the read-under-FileLock-then-write pattern (first applied to update_session_metadata in #1709) to set_agent_info, clear_session, and set_gateway_info by introducing a _modify_session_locked helper that centralises the atomic read-modify-write cycle.

  • Introduces _modify_session_locked and _load_session_from_disk helpers so all session mutators reload from disk inside a FileLock before writing, preventing a stale in-memory copy from overwriting newer messages.
  • Refactors set_agent_info, clear_session, set_gateway_info, and the already-fixed update_session_metadata to delegate through the new helper.
  • Adds regression tests for the two-store race scenario across all three newly-fixed methods.

Confidence Score: 4/5

The change is safe to merge; the core read-under-lock pattern is sound and matches the already-proven approach used by add_message.

The refactor correctly consolidates four mutator methods behind a single atomic helper, eliminating the stale-cache overwrite bug. A silent error swallow in _load_session_from_disk and a misleading test name are the only issues found — both are minor and don't affect correctness.

store.py: the _load_session_from_disk helper should log on parse/IO error as the existing _load_session does.

Important Files Changed

Filename Overview
src/praisonai-agents/praisonaiagents/session/store.py Core fix: new _modify_session_locked helper consolidates atomic read-modify-write; all four mutator methods now route through it. One silent error-swallow in _load_session_from_disk worth logging.
src/praisonai-agents/tests/unit/session/test_session_store.py Adds three regression tests covering set_agent_info, clear_session, and set_gateway_info two-store races; test name for clear_session test is misleading but logic is correct.

Sequence Diagram

sequenceDiagram
    participant W as Writer (store A)
    participant F as FileLock (disk)
    participant R as Reader (store B)

    W->>F: add_message("second") — acquires lock, reloads, appends, writes
    F-->>W: lock released, disk has [first, second]

    R->>R: _load_session() — stale cache has [first] only (OLD BUG)

    Note over R,F: NEW FIX: _modify_session_locked()
    R->>F: set_agent_info / clear_session / set_gateway_info — acquires lock
    F->>F: _load_session_from_disk() — reads [first, second] from disk
    R->>R: mutator(session) applied on fresh data
    R->>F: atomic write (temp → replace)
    F-->>R: lock released, cache updated
Loading

Reviews (1): Last reviewed commit: "test: add regression test for set_gatewa..." | Re-trigger Greptile

Comment on lines +295 to +304
def _load_session_from_disk(self, session_id: str, filepath: str) -> SessionData:
"""Load session JSON from disk (caller must hold FileLock)."""
if os.path.exists(filepath):
try:
with open(filepath, "r", encoding="utf-8") as f:
data = json.load(f)
return SessionData.from_dict(data)
except (json.JSONDecodeError, IOError):
pass
return SessionData(session_id=session_id)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 _load_session_from_disk silently swallows json.JSONDecodeError and IOError without any log output, whereas the original _load_session logs a warning. When a session file is corrupt or unreadable, the error will be invisible, making it very hard to diagnose data-loss incidents in production.

Suggested change
def _load_session_from_disk(self, session_id: str, filepath: str) -> SessionData:
"""Load session JSON from disk (caller must hold FileLock)."""
if os.path.exists(filepath):
try:
with open(filepath, "r", encoding="utf-8") as f:
data = json.load(f)
return SessionData.from_dict(data)
except (json.JSONDecodeError, IOError):
pass
return SessionData(session_id=session_id)
def _load_session_from_disk(self, session_id: str, filepath: str) -> SessionData:
"""Load session JSON from disk (caller must hold FileLock)."""
if os.path.exists(filepath):
try:
with open(filepath, "r", encoding="utf-8") as f:
data = json.load(f)
return SessionData.from_dict(data)
except (json.JSONDecodeError, IOError) as e:
logger.warning(f"Failed to load session {session_id}: {e}")
return SessionData(session_id=session_id)

Comment on lines +330 to +331
def test_clear_session_preserves_new_messages(self, temp_store):
"""Clear must reload from disk so concurrent adds are not lost."""
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 The test name and docstring say "preserves new messages," but the assertion checks len(history) == 0 — i.e., that all messages were cleared. The test is validating correct clear_session behaviour (reload-then-clear), not message preservation. The misleading name could cause a future reader to misinterpret the intent and write a wrong expectation.

Suggested change
def test_clear_session_preserves_new_messages(self, temp_store):
"""Clear must reload from disk so concurrent adds are not lost."""
def test_clear_session_reads_latest_disk_state(self, temp_store):
"""Clear must reload from disk so it operates on the latest persisted state."""

@MervinPraison MervinPraison merged commit 321b6b2 into main May 22, 2026
19 of 20 checks passed
MervinPraison added a commit that referenced this pull request May 23, 2026
…#1727)

* fix: prevent session message loss in LocalManagedAgent._persist_state

Use update_session_metadata (locked read-modify-write) instead of
get_session + _save_session, which could overwrite messages written
concurrently via add_message on another store instance.

Adds regression test mirroring session store race coverage from #1724.

Co-authored-by: Mervin Praison <MervinPraison@users.noreply.github.com>

* fix: preserve legacy _persist_state fallback when session is absent

Agent-Logs-Url: https://github.com/MervinPraison/PraisonAI/sessions/4d08ce0a-1b7b-4e50-b841-6a08416f2f53

Co-authored-by: MervinPraison <454862+MervinPraison@users.noreply.github.com>

---------

Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Mervin Praison <MervinPraison@users.noreply.github.com>
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: MervinPraison <454862+MervinPraison@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants