fix: prevent session message loss in set_agent_info and clear_session#1724
Conversation
Apply locked read-modify-write (same pattern as update_session_metadata) to set_agent_info, clear_session, and set_gateway_info so stale in-memory copies cannot overwrite messages written by add_message on another instance. Co-authored-by: Mervin Praison <MervinPraison@users.noreply.github.com>
|
@coderabbitai review |
|
/review |
Qodo reviews are paused for this user.Troubleshooting steps vary by plan Learn more → On a Teams plan? Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center? |
✅ Actions performedReview triggered.
|
|
Important Review skippedBot user detected. To trigger a single review, invoke the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughThe PR refactors session mutation in ChangesSession mutation centralization
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related issues
Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@copilot Do a thorough review of this PR. Read ALL existing reviewer comments above from Qodo, Coderabbit, and Gemini first — incorporate their findings. Review areas:
|
Reviewed thoroughly at commit Findings:
One small improvement I suggest (optional) is adding a |
|
@claude You are the FINAL architecture reviewer. If the branch is under MervinPraison/PraisonAI (not a fork), you are able to make modifications to this branch and push directly. SCOPE: Focus ONLY on Python packages (praisonaiagents, praisonai). Do NOT modify praisonai-rust or praisonai-ts. Read ALL comments above from Gemini, Qodo, CodeRabbit, and Copilot carefully before responding. Phase 1: Review per AGENTS.md
Phase 2: FIX Valid Issues Phase 3: Final Verdict |
|
Claude finished @MervinPraison's task —— View job Architecture Review Complete ✅Phase 1: Review per AGENTS.md
Phase 2: Address Reviewer Feedback
Phase 3: Final Validation & Verdict
✅ FINAL VERDICT: APPROVEDArchitecture CompliancePASS - All AGENTS.md principles followed:
Code Quality Review
Reviewer Feedback Addressed
Technical AssessmentThe PR fixes a critical data loss bug where stale in-memory session objects could overwrite fresh on-disk chat history. The solution follows excellent engineering practices:
Files Modified:
This is exemplary defensive programming that prevents silent data loss in production multi-agent systems. Ready for merge. |
Addresses Copilot's suggestion to ensure set_gateway_info also preserves messages when racing with concurrent message writes from another instance. Completes test coverage for all four methods migrated to _modify_session_locked. Co-authored-by: Mervin Praison <MervinPraison@users.noreply.github.com>
Qodo reviews are paused for this user.Troubleshooting steps vary by plan Learn more → On a Teams plan? Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center? |
Greptile SummaryThis PR extends the read-under-
Confidence Score: 4/5The change is safe to merge; the core read-under-lock pattern is sound and matches the already-proven approach used by add_message. The refactor correctly consolidates four mutator methods behind a single atomic helper, eliminating the stale-cache overwrite bug. A silent error swallow in _load_session_from_disk and a misleading test name are the only issues found — both are minor and don't affect correctness. store.py: the _load_session_from_disk helper should log on parse/IO error as the existing _load_session does. Important Files Changed
Sequence DiagramsequenceDiagram
participant W as Writer (store A)
participant F as FileLock (disk)
participant R as Reader (store B)
W->>F: add_message("second") — acquires lock, reloads, appends, writes
F-->>W: lock released, disk has [first, second]
R->>R: _load_session() — stale cache has [first] only (OLD BUG)
Note over R,F: NEW FIX: _modify_session_locked()
R->>F: set_agent_info / clear_session / set_gateway_info — acquires lock
F->>F: _load_session_from_disk() — reads [first, second] from disk
R->>R: mutator(session) applied on fresh data
R->>F: atomic write (temp → replace)
F-->>R: lock released, cache updated
Reviews (1): Last reviewed commit: "test: add regression test for set_gatewa..." | Re-trigger Greptile |
| def _load_session_from_disk(self, session_id: str, filepath: str) -> SessionData: | ||
| """Load session JSON from disk (caller must hold FileLock).""" | ||
| if os.path.exists(filepath): | ||
| try: | ||
| with open(filepath, "r", encoding="utf-8") as f: | ||
| data = json.load(f) | ||
| return SessionData.from_dict(data) | ||
| except (json.JSONDecodeError, IOError): | ||
| pass | ||
| return SessionData(session_id=session_id) |
There was a problem hiding this comment.
_load_session_from_disk silently swallows json.JSONDecodeError and IOError without any log output, whereas the original _load_session logs a warning. When a session file is corrupt or unreadable, the error will be invisible, making it very hard to diagnose data-loss incidents in production.
| def _load_session_from_disk(self, session_id: str, filepath: str) -> SessionData: | |
| """Load session JSON from disk (caller must hold FileLock).""" | |
| if os.path.exists(filepath): | |
| try: | |
| with open(filepath, "r", encoding="utf-8") as f: | |
| data = json.load(f) | |
| return SessionData.from_dict(data) | |
| except (json.JSONDecodeError, IOError): | |
| pass | |
| return SessionData(session_id=session_id) | |
| def _load_session_from_disk(self, session_id: str, filepath: str) -> SessionData: | |
| """Load session JSON from disk (caller must hold FileLock).""" | |
| if os.path.exists(filepath): | |
| try: | |
| with open(filepath, "r", encoding="utf-8") as f: | |
| data = json.load(f) | |
| return SessionData.from_dict(data) | |
| except (json.JSONDecodeError, IOError) as e: | |
| logger.warning(f"Failed to load session {session_id}: {e}") | |
| return SessionData(session_id=session_id) |
| def test_clear_session_preserves_new_messages(self, temp_store): | ||
| """Clear must reload from disk so concurrent adds are not lost.""" |
There was a problem hiding this comment.
The test name and docstring say "preserves new messages," but the assertion checks
len(history) == 0 — i.e., that all messages were cleared. The test is validating correct clear_session behaviour (reload-then-clear), not message preservation. The misleading name could cause a future reader to misinterpret the intent and write a wrong expectation.
| def test_clear_session_preserves_new_messages(self, temp_store): | |
| """Clear must reload from disk so concurrent adds are not lost.""" | |
| def test_clear_session_reads_latest_disk_state(self, temp_store): | |
| """Clear must reload from disk so it operates on the latest persisted state.""" |
…#1727) * fix: prevent session message loss in LocalManagedAgent._persist_state Use update_session_metadata (locked read-modify-write) instead of get_session + _save_session, which could overwrite messages written concurrently via add_message on another store instance. Adds regression test mirroring session store race coverage from #1724. Co-authored-by: Mervin Praison <MervinPraison@users.noreply.github.com> * fix: preserve legacy _persist_state fallback when session is absent Agent-Logs-Url: https://github.com/MervinPraison/PraisonAI/sessions/4d08ce0a-1b7b-4e50-b841-6a08416f2f53 Co-authored-by: MervinPraison <454862+MervinPraison@users.noreply.github.com> --------- Co-authored-by: Cursor Agent <cursoragent@cursor.com> Co-authored-by: Mervin Praison <MervinPraison@users.noreply.github.com> Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: MervinPraison <454862+MervinPraison@users.noreply.github.com>
Bug and impact
DefaultSessionStore.set_agent_info,clear_session, andset_gateway_infostill used stale in-memory session objects and could overwrite on-disk chat history written byadd_messageon another store instance (same class of bug fixed forupdate_session_metadatain PR #1709).Trigger: Agent initialisation calls
set_agent_infowhile another worker/process persists messages; or bot session cleanup races with concurrent message writes.Impact: Silent loss of user/assistant messages in JSON session files (data loss).
Root cause
These mutators called
_load_session()(cache-first), modified the object, then_save_session()without reloading from disk insideFileLock— so a stale copy missing recent messages could be written atomically over fresher data.Fix
_modify_session_locked()helper: reload from disk underFileLock, apply mutator, atomic write, update cache.set_agent_info,clear_session,set_gateway_info, andupdate_session_metadatathrough it.set_agent_infoandclear_sessiontwo-store races.Validation
pytest tests/unit/session/test_session_store.py(metadata, set_agent_info, clear_session race tests) — 4 passed.Summary by CodeRabbit
Tests
Improvements