perf(agent-server): cut conversation search/count to page-slice work#3260
perf(agent-server): cut conversation search/count to page-slice work#3260csmith49 wants to merge 1 commit into
Conversation
Both ``_search_conversations`` and ``_count_conversations`` previously iterated every entry in ``_event_services``, awaited ``get_state()`` on each, and composed a full ``ConversationInfo`` (which serializes the whole state via ``model_dump``) before applying pagination. That made ``GET /api/conversations`` and ``count`` O(N) in the total number of conversations regardless of ``limit``. - ``_search_conversations`` now sorts by ``stored.created_at`` / ``stored.updated_at`` (which already live on ``StoredConversation`` metadata), so we no longer need a state load to order results. Only the actual page slice composes ``ConversationInfo``. - ``_count_conversations`` short-circuits to ``len(_event_services)`` when no ``execution_status`` filter is supplied, skipping the state fan-out entirely. This is the minimum change needed to make these endpoints cheap at scale without introducing a separate sorted metadata index (the bigger refactor suggested in the issue can build on top of this). Refs #3142 Co-authored-by: openhands <openhands@all-hands.dev>
Python API breakage checks — ✅ PASSEDResult: ✅ PASSED |
REST API breakage checks (OpenAPI) — ✅ PASSEDResult: ✅ PASSED |
Coverage Report •
|
||||||||||||||||||||
all-hands-bot
left a comment
There was a problem hiding this comment.
Clean performance optimization that decouples API cost from total conversation count. The key insight - sorting by StoredConversation metadata before loading full state - is well-executed.
[RISK ASSESSMENT]
- [Overall PR]
⚠️ Risk Assessment: 🟢 LOW
This is a backend API performance optimization with comprehensive regression tests. The optimization avoids loading unnecessary conversation state while preserving all existing behavior: sort orders, pagination, and filtering semantics are unchanged. All 142 existing tests pass, and two new tests guard against performance regressions.
VERDICT:
✅ Worth merging: Solves a real O(N) performance issue with minimal, well-tested changes.
KEY INSIGHT:
Good data structure analysis: sorting metadata is already on StoredConversation, so full state deserialization can be deferred to the page slice.
all-hands-bot
left a comment
There was a problem hiding this comment.
✅ QA Report: PASS
Verified that conversation search and count operations use the optimized implementation and preserve all existing behavior contracts.
Does this PR achieve its stated goal?
Yes. This PR successfully reduces the computational cost of conversation search and count operations from O(N) to O(page_size) and O(1) respectively. The optimization eliminates unnecessary state loads by sorting on metadata fields (created_at/updated_at) that are already available on StoredConversation, and only composing full ConversationInfo for the actual page slice returned. The count endpoint with no filter now short-circuits to len(_event_services) without touching any conversation state. All existing behavior (sort orders, pagination, status filtering) is preserved.
| Phase | Result |
|---|---|
| Environment Setup | ✅ Dependencies installed, tests runnable |
| CI Status | ✅ All checks passing (77 agent-server tests, 62 router tests, all other CI green) |
| Functional Verification | ✅ All tests pass, optimization verified by regression tests |
Functional Verification
Test 1: Regression tests verify the optimization
Step 1 — Run the new performance regression tests:
Ran uv run pytest tests/agent_server/test_conversation_service.py::TestConversationServiceSearchCountPerf -v:
tests/agent_server/test_conversation_service.py::TestConversationServiceSearchCountPerf::test_count_no_filter_skips_state_loads PASSED [ 50%]
tests/agent_server/test_conversation_service.py::TestConversationServiceSearchCountPerf::test_search_skips_state_for_non_page_items PASSED [100%]
======================== 2 passed, 5 warnings in 0.60s =========================
Interpretation: The two new regression tests explicitly verify:
test_count_no_filter_skips_state_loads— confirms that counting without a filter never callsget_state()on any conversationtest_search_skips_state_for_non_page_items— confirms that searching only callsget_state()on the conversations in the returned page slice, not all conversations
These tests use mocks to assert that get_state() is only called when necessary, proving the optimization is in place.
Test 2: Existing behavior is preserved
Step 1 — Run all conversation service tests:
Ran uv run pytest tests/agent_server/test_conversation_service.py -v:
======================== 77 passed, 5 warnings in 1.60s =========================
All 77 tests pass, including:
- Sort order tests (CREATED_AT, CREATED_AT_DESC, UPDATED_AT, UPDATED_AT_DESC)
- Pagination tests (page_id handling, next_page_id generation)
- Status filtering tests (execution_status filtering)
- Combined filter and sort tests
Step 2 — Run all conversation router tests (REST API):
Ran uv run pytest tests/agent_server/test_conversation_router.py -v:
======================== 62 passed, 5 warnings in 2.22s =========================
All 62 router tests pass, including:
test_search_conversations_default_paramstest_search_conversations_with_all_paramstest_count_conversations_no_filtertest_count_conversations_with_status_filter
Interpretation: The comprehensive test suite confirms that all API contracts are preserved. Sort orders, pagination, and filtering all work exactly as before. The optimization is a pure internal refactor with no externally observable behavior changes.
Test 3: Code inspection confirms the implementation
Reviewed the implementation in conversation_service.py:
_search_conversations changes:
- Sorts
candidatesbystored.created_atorstored.updated_atdirectly from metadata (lines 421-425) - Only loads state and composes
ConversationInfofor items in the page slice (lines 437-440) - Still loads state for status filtering when
execution_statusis provided (lines 415-418)
_count_conversations changes:
- Fast path: returns
len(self._event_services)immediately when no filter (lines 463-464) - Only iterates and loads state when
execution_statusfilter is provided (lines 467-470)
Interpretation: The implementation matches the PR description exactly. Sorting uses metadata, state loading is deferred until needed, and count has a fast path.
Issues Found
None.
Summary
Addresses #3142 (one of the unresolved sub-issues under the perf tracking issue #3153 that had no PR attached).
ConversationService._search_conversationsand_count_conversationspreviously iterated every entry in_event_services, awaitedget_state()on each, and composed a fullConversationInfo(which serializes the whole state viamodel_dump) before applying pagination. That madeGET /api/conversationsand the count endpoint O(N) in the total number of conversations regardless oflimit.This PR is the minimum change that decouples the per-request cost from total conversation count, without yet introducing the larger sorted-metadata index suggested in the issue (that refactor can build on top of this).
Changes
_search_conversationsnow sorts candidates bystored.created_at/stored.updated_at, which already live onStoredConversationmetadata. We no longer need a state load (or_compose_conversation_info/model_dump) for ordering. Only the actual page slice composesConversationInfo._count_conversationsshort-circuits tolen(_event_services)when noexecution_statusfilter is supplied, skipping the per-conversation state fan-out entirely.next_page_idsemantics are unchanged.Tests
test_conversation_service.pytests continue to pass, as do the 67test_conversation_router{,_acp}.pytests.test_count_no_filter_skips_state_loads— verifiesget_state()is never awaited when counting without a filter.test_search_skips_state_for_non_page_items— verifiesget_state()is awaited only for conversations actually returned in the page slice.Notes for reviewers
execution_statusfiltering in_search_conversations, sinceStoredConversationdoes not currently track that. A future improvement (the "sorted metadata index" idea from the issue) could push that field onto stored metadata and remove the remaining O(N) walk; this PR intentionally keeps the surface area small.StoredConversation.created_at/updated_at, which are the same values previously copied intoConversationInfoby_compose_conversation_info, so observable ordering is unchanged.This PR was created by an AI agent (OpenHands) on behalf of the user.
@csmith49 can click here to continue refining the PR
Agent Server images for this PR
• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server
Variants & Base Images
eclipse-temurin:17-jdknikolaik/python-nodejs:python3.13-nodejs22-slimgolang:1.21-bookwormPull (multi-arch manifest)
# Each variant is a multi-arch manifest supporting both amd64 and arm64 docker pull ghcr.io/openhands/agent-server:ea32c95-pythonRun
All tags pushed for this build
About Multi-Architecture Support
ea32c95-python) is a multi-arch manifest supporting both amd64 and arm64ea32c95-python-amd64) are also available if needed