Skip to content

fix(agent-server): lazy-load persisted conversations on access#3220

Open
nehaaprasaad wants to merge 4 commits into
OpenHands:mainfrom
nehaaprasaad:fix/perf-agnt-conv-strp
Open

fix(agent-server): lazy-load persisted conversations on access#3220
nehaaprasaad wants to merge 4 commits into
OpenHands:mainfrom
nehaaprasaad:fix/perf-agnt-conv-strp

Conversation

@nehaaprasaad
Copy link
Copy Markdown

@nehaaprasaad nehaaprasaad commented May 12, 2026

Why

Agent server startup eagerly hydrated every persisted conversation, creating full EventService and LocalConversation objects for all conversations on disk.

Summary

  • Lazy-load persisted conversations only when runtime access is needed.
  • Keep list/get/count paths lightweight by reading persisted metadata/state.
  • Add regression coverage for lazy startup and single hydration.

Issue Number

fix : #3140

How to Test

  • make build
  • uv run pytest tests/agent_server/test_conversation_service.py -k "startup_does_not_eagerly or hydrates_persisted or stale_owner_cannot_append or restart_resumes"
  • uv run pre-commit run --files openhands-agent-server/openhands/agent_server/conversation_service.py tests/agent_server/test_conversation_service.py

@csmith49 csmith49 requested a review from all-hands-bot May 12, 2026 16:45
Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Needs improvement - Critical runtime error (missing import) and concurrency issues that need addressing before merge.

[CRITICAL ISSUES]

  • Missing Import: importlib is used but not imported - will cause NameError at runtime
  • Race Condition: Hydration lock creation in get_event_service() is not thread-safe
  • Suboptimal Delete Path: delete_conversation() now hydrates before deleting, adding unnecessary overhead

[IMPROVEMENT OPPORTUNITIES]

  • Memory Leak: _hydration_locks dict grows unbounded and is never cleaned up
  • Cache Invalidation: _stored_conversations cache is never invalidated if metadata changes externally
  • Test Naming: test_restart_resumes_conversations_after_non_graceful_shutdown name doesn't match its new assertion

[RISK ASSESSMENT]

  • [Overall PR] ⚠️ Risk Assessment: 🟡 MEDIUM

Performance-critical path (startup, conversation access) with good test coverage, but the missing import is a production blocker. The race condition in lock management could cause subtle bugs under concurrent load. The lazy-loading approach is sound and valuable - just needs these issues fixed before merge.

VERDICT:
Needs rework: Fix the missing import and race condition before merging

KEY INSIGHT:
The dual-state model (hydrated EventService vs metadata-only StoredConversation) is elegant, but the synchronization primitives need attention to prevent race conditions under concurrent access.

Comment thread openhands-agent-server/openhands/agent_server/conversation_service.py Outdated
Comment thread tests/agent_server/test_conversation_service.py
@csmith49
Copy link
Copy Markdown
Collaborator

Hey Neha, this is a great start! Do you mind addressing (or responding to) the review comments and checking on the failing CI?

@nehaaprasaad nehaaprasaad force-pushed the fix/perf-agnt-conv-strp branch from 33356f5 to c74129e Compare May 13, 2026 06:53
@nehaaprasaad
Copy link
Copy Markdown
Author

@csmith49 updated and addressed the comments please check thankyou.

@csmith49 csmith49 requested a review from all-hands-bot May 13, 2026 20:23
Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Needs rework - Critical implementation bugs that will cause runtime failures.

Multiple methods are missing critical implementation code (return statements, loops, helper methods). These issues will cause immediate failures at runtime and must be fixed before merging.

Comment thread openhands-agent-server/openhands/agent_server/conversation_service.py Outdated
@csmith49
Copy link
Copy Markdown
Collaborator

Looks like we had a bad merge or something? Suddenly lots of weird formatting issues and missing chunks.

@nehaaprasaad nehaaprasaad force-pushed the fix/perf-agnt-conv-strp branch from 4075744 to 3405a2a Compare May 14, 2026 02:20
@nehaaprasaad
Copy link
Copy Markdown
Author

@csmith49 please check the updates!

@csmith49
Copy link
Copy Markdown
Collaborator

Hey Neha, you might want to look at the CI results. There's still some malformed code that is stopping our test suite from running properly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

perf: agent server loads all conversations eagerly at startup

3 participants