perf(agent-server): add idle conversation eviction to reduce memory#3256
perf(agent-server): add idle conversation eviction to reduce memory#3256csmith49 wants to merge 1 commit into
Conversation
Add configurable eviction for finished conversations to prevent unbounded memory growth in long-running servers. This addresses issue #3141. New configuration options: - idle_timeout_seconds: Time after which a finished conversation will be evicted from memory (min 60s, default None/disabled) - max_loaded_conversations: Maximum conversations to keep in memory; when exceeded, oldest finished conversations are evicted first (default None) Implementation: - Background eviction task runs every 60 seconds when either policy is enabled - Only terminal-state conversations (FINISHED, ERROR, STUCK) are eligible - Evicted conversations are saved to disk and can be rehydrated on next access - Running/idle conversations are never evicted Co-authored-by: openhands <openhands@all-hands.dev>
REST API breakage checks (OpenAPI) — ✅ PASSEDResult: ✅ PASSED |
Python API breakage checks — ✅ PASSEDResult: ✅ PASSED |
Coverage Report •
|
|||||||||||||||||||||||||
all-hands-bot
left a comment
There was a problem hiding this comment.
🟢 Good taste - Clean, pragmatic solution to unbounded memory growth
This PR addresses a real production issue with a well-designed implementation. The eviction logic is straightforward: a background task periodically removes idle finished conversations from memory while preserving active ones.
Strengths:
- Opt-in with sensible defaults (both policies disabled by default)
- Only evicts terminal-state conversations (FINISHED, ERROR, STUCK)
- Defensive programming (None checks, safe dict operations)
- Comprehensive test coverage (10 tests covering all scenarios)
- Clean, readable code with appropriate logging
- Re-hydration support for evicted conversations
[RISK ASSESSMENT]
Changes core conversation service lifecycle with background state mutation. Risk is mitigated by:
- Feature is opt-in (defaults to None, no behavior change unless explicitly enabled)
- Only affects terminal conversations, never running/active ones
- Comprehensive test coverage including edge cases
- Defensive programming patterns throughout
- No impact on agent decision-making, prompts, or benchmark behavior
The implementation accepts minor race condition inaccuracies in count tracking (between idle timeout and max loaded eviction phases) as a pragmatic trade-off for simplicity. This is acceptable for memory management where approximate enforcement is sufficient.
VERDICT:
✅ Worth merging - Solves a real problem (monotonic memory growth in long-running servers) with a simple, testable solution.
KEY INSIGHT:
Good example of pragmatic systems programming - prioritizes simplicity and observability (counts, debug logging) over perfect accuracy in non-critical timing windows.
all-hands-bot
left a comment
There was a problem hiding this comment.
✅ QA Report: PASS
All eviction features work as designed. Conversations are correctly evicted based on idle timeout and max loaded limits, running conversations are preserved, and evicted conversations can be re-hydrated from disk.
Does this PR achieve its stated goal?
Yes. The PR set out to "add configurable eviction for finished conversations to prevent unbounded memory growth in long-running agent servers," and it delivers exactly that. I verified the implementation by:
- Creating multiple conversations with different states and idle times
- Manually triggering eviction cycles
- Confirming that only terminal-state conversations (FINISHED, ERROR, STUCK) are evicted
- Verifying that running/idle conversations are never evicted
- Testing re-hydration from disk after eviction
- Confirming environment variable configuration works correctly
The feature works end-to-end as documented.
| Phase | Result |
|---|---|
| Environment Setup | ✅ uv sync --dev succeeded |
| CI Status | ✅ All checks passing (build, pre-commit, API compatibility) |
| Functional Verification | ✅ All 4 functional scenarios verified + 10 unit tests passed |
Functional Verification
Test 1: Idle Timeout Eviction
Setup:
ConversationService(
idle_timeout_seconds=60,
max_loaded_conversations=None
)Test execution:
- Created a conversation and marked it FINISHED
- Set
updated_atto 2 minutes ago (beyond 60s timeout) - Manually triggered
_run_eviction_cycle()
Result:
✅ SUCCESS: Conversation was evicted from memory
Log: "Evicted 1 idle conversation(s) from memory; 0 remaining"
This confirms that conversations idle for longer than idle_timeout_seconds are correctly evicted.
Test 2: Max Loaded Conversations
Setup:
ConversationService(
idle_timeout_seconds=None,
max_loaded_conversations=2
)Test execution:
- Created 3 conversations (exceeds max of 2)
- Marked all as FINISHED with different idle times:
- Conv 1: 30 minutes idle (oldest)
- Conv 2: 20 minutes idle
- Conv 3: 10 minutes idle (newest)
- Triggered eviction cycle
Result:
✅ SUCCESS: Oldest conversation was evicted, 2 remain
Log: "Evicted 1 idle conversation(s) from memory; 2 remaining"
This confirms that when the max is exceeded, the oldest (most idle) finished conversation is evicted first.
Test 3: Running Conversations Not Evicted
Setup:
ConversationService(idle_timeout_seconds=60)Test execution:
- Created a conversation in IDLE state (non-terminal)
- Set
updated_atto 2 minutes ago (beyond timeout) - Triggered eviction cycle
Result:
✅ SUCCESS: Running conversation was preserved
This confirms that non-terminal conversations (RUNNING, IDLE, PAUSED, WAITING_FOR_CONFIRMATION) are never evicted, regardless of how long they've been idle.
Test 4: Conversation Re-hydration
Test execution:
- Session 1: Created conversation, marked FINISHED, evicted it
- Session 2: Restarted ConversationService
Result:
✅ SUCCESS: Conversation was re-hydrated from disk
Log: "Resumed conversation 4d9ab504-756e-4f57-bd85-936c25cd6b6e from persistent storage"
This confirms that evicted conversations remain on disk and are automatically re-loaded when the service restarts.
Test 5: Environment Variable Configuration
Test execution:
export OH_IDLE_TIMEOUT_SECONDS=300
export OH_MAX_LOADED_CONVERSATIONS=100Result:
✅ idle_timeout_seconds loaded correctly from env: 300
✅ max_loaded_conversations loaded correctly from env: 100
Environment variables work as documented in the PR description.
Test 6: Validation
Test execution:
Attempted to set OH_IDLE_TIMEOUT_SECONDS=30 (below minimum of 60)
Result:
✅ Validation correctly rejected idle_timeout_seconds=30
Error: "greater than or equal to 60"
The minimum 60-second constraint is properly enforced.
Unit Tests
Ran the 10 new unit tests in tests/agent_server/test_conversation_eviction.py:
test_eviction_task_not_started_when_disabled PASSED
test_eviction_task_started_with_idle_timeout PASSED
test_eviction_task_started_with_max_loaded PASSED
test_idle_timeout_evicts_finished_conversation PASSED
test_idle_timeout_does_not_evict_running_conversation PASSED
test_max_loaded_evicts_oldest_finished_first PASSED
test_eviction_preserves_non_terminal_conversations PASSED
test_eviction_loop_runs_periodically PASSED
test_evicted_conversation_can_be_rehydrated PASSED
test_combined_idle_timeout_and_max_loaded PASSED
10 passed in 0.53s
All tests pass, covering task lifecycle, eviction policies, preservation logic, and re-hydration.
Issues Found
None.
Summary
Addresses #3141 - Adds configurable eviction for finished conversations to prevent unbounded memory growth in long-running agent servers.
Problem
The
_event_servicesdict holds every active conversation with no TTL, idle timeout, max size, or background GC task. A conversation that finishes its work remains fully loaded in memory until explicitly deleted via the API or the server restarts. This causes memory to grow monotonically in long-running servers.Solution
Add a background eviction task that periodically checks and removes idle finished conversations from memory. Evicted conversations can be re-hydrated from disk on next access.
New Configuration Options
idle_timeout_secondsint | NoneNoneNoneto disable.max_loaded_conversationsint | NoneNoneNoneto disable.Implementation Details
FINISHED,ERROR,STUCK) are candidates for evictionmax_loaded_conversations, the most idle (oldestupdated_at) finished conversations are evicted firstRUNNING,IDLE,PAUSED,WAITING_FOR_CONFIRMATION) conversations are never evictedExample Usage
Testing
Added 10 unit tests covering:
All existing
test_conversation_service.pytests pass.This PR was created by an AI agent (OpenHands) on behalf of the user.
@csmith49 can click here to continue refining the PR
Agent Server images for this PR
• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server
Variants & Base Images
eclipse-temurin:17-jdknikolaik/python-nodejs:python3.13-nodejs22-slimgolang:1.21-bookwormPull (multi-arch manifest)
# Each variant is a multi-arch manifest supporting both amd64 and arm64 docker pull ghcr.io/openhands/agent-server:5a85df3-pythonRun
All tags pushed for this build
About Multi-Architecture Support
5a85df3-python) is a multi-arch manifest supporting both amd64 and arm645a85df3-python-amd64) are also available if needed