Skip to content

perf(agent-server): cut conversation search/count to page-slice work#3260

Open
csmith49 wants to merge 1 commit into
mainfrom
fix/3142-conversation-search-count-perf
Open

perf(agent-server): cut conversation search/count to page-slice work#3260
csmith49 wants to merge 1 commit into
mainfrom
fix/3142-conversation-search-count-perf

Conversation

@csmith49
Copy link
Copy Markdown
Collaborator

@csmith49 csmith49 commented May 14, 2026

Summary

Addresses #3142 (one of the unresolved sub-issues under the perf tracking issue #3153 that had no PR attached).

ConversationService._search_conversations and _count_conversations previously iterated every entry in _event_services, awaited get_state() on each, and composed a full ConversationInfo (which serializes the whole state via model_dump) before applying pagination. That made GET /api/conversations and the count endpoint O(N) in the total number of conversations regardless of limit.

This PR is the minimum change that decouples the per-request cost from total conversation count, without yet introducing the larger sorted-metadata index suggested in the issue (that refactor can build on top of this).

Changes

  • _search_conversations now sorts candidates by stored.created_at / stored.updated_at, which already live on StoredConversation metadata. We no longer need a state load (or _compose_conversation_info / model_dump) for ordering. Only the actual page slice composes ConversationInfo.
  • _count_conversations short-circuits to len(_event_services) when no execution_status filter is supplied, skipping the per-conversation state fan-out entirely.
  • Behavior contracts are preserved: sort orders, status filtering, pagination, and next_page_id semantics are unchanged.

Tests

  • All 75 existing test_conversation_service.py tests continue to pass, as do the 67 test_conversation_router{,_acp}.py tests.
  • Added two regression tests guarding the new behavior:
    • test_count_no_filter_skips_state_loads — verifies get_state() is never awaited when counting without a filter.
    • test_search_skips_state_for_non_page_items — verifies get_state() is awaited only for conversations actually returned in the page slice.

Notes for reviewers

  • The state load is still required for execution_status filtering in _search_conversations, since StoredConversation does not currently track that. A future improvement (the "sorted metadata index" idea from the issue) could push that field onto stored metadata and remove the remaining O(N) walk; this PR intentionally keeps the surface area small.
  • Sort keys come from StoredConversation.created_at/updated_at, which are the same values previously copied into ConversationInfo by _compose_conversation_info, so observable ordering is unchanged.

This PR was created by an AI agent (OpenHands) on behalf of the user.

@csmith49 can click here to continue refining the PR


Agent Server images for this PR

GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant Architectures Base Image Docs / Tags
java amd64, arm64 eclipse-temurin:17-jdk Link
python amd64, arm64 nikolaik/python-nodejs:python3.13-nodejs22-slim Link
golang amd64, arm64 golang:1.21-bookworm Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:ea32c95-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-ea32c95-python \
  ghcr.io/openhands/agent-server:ea32c95-python

All tags pushed for this build

ghcr.io/openhands/agent-server:ea32c95-golang-amd64
ghcr.io/openhands/agent-server:ea32c95df98989c3d9ada0f72770cab7dec2e1ef-golang-amd64
ghcr.io/openhands/agent-server:fix-3142-conversation-search-count-perf-golang-amd64
ghcr.io/openhands/agent-server:ea32c95-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:ea32c95-golang-arm64
ghcr.io/openhands/agent-server:ea32c95df98989c3d9ada0f72770cab7dec2e1ef-golang-arm64
ghcr.io/openhands/agent-server:fix-3142-conversation-search-count-perf-golang-arm64
ghcr.io/openhands/agent-server:ea32c95-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:ea32c95-java-amd64
ghcr.io/openhands/agent-server:ea32c95df98989c3d9ada0f72770cab7dec2e1ef-java-amd64
ghcr.io/openhands/agent-server:fix-3142-conversation-search-count-perf-java-amd64
ghcr.io/openhands/agent-server:ea32c95-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:ea32c95-java-arm64
ghcr.io/openhands/agent-server:ea32c95df98989c3d9ada0f72770cab7dec2e1ef-java-arm64
ghcr.io/openhands/agent-server:fix-3142-conversation-search-count-perf-java-arm64
ghcr.io/openhands/agent-server:ea32c95-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:ea32c95-python-amd64
ghcr.io/openhands/agent-server:ea32c95df98989c3d9ada0f72770cab7dec2e1ef-python-amd64
ghcr.io/openhands/agent-server:fix-3142-conversation-search-count-perf-python-amd64
ghcr.io/openhands/agent-server:ea32c95-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim-amd64
ghcr.io/openhands/agent-server:ea32c95-python-arm64
ghcr.io/openhands/agent-server:ea32c95df98989c3d9ada0f72770cab7dec2e1ef-python-arm64
ghcr.io/openhands/agent-server:fix-3142-conversation-search-count-perf-python-arm64
ghcr.io/openhands/agent-server:ea32c95-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim-arm64
ghcr.io/openhands/agent-server:ea32c95-golang
ghcr.io/openhands/agent-server:ea32c95df98989c3d9ada0f72770cab7dec2e1ef-golang
ghcr.io/openhands/agent-server:fix-3142-conversation-search-count-perf-golang
ghcr.io/openhands/agent-server:ea32c95-golang_tag_1.21-bookworm
ghcr.io/openhands/agent-server:ea32c95-java
ghcr.io/openhands/agent-server:ea32c95df98989c3d9ada0f72770cab7dec2e1ef-java
ghcr.io/openhands/agent-server:fix-3142-conversation-search-count-perf-java
ghcr.io/openhands/agent-server:ea32c95-eclipse-temurin_tag_17-jdk
ghcr.io/openhands/agent-server:ea32c95-python
ghcr.io/openhands/agent-server:ea32c95df98989c3d9ada0f72770cab7dec2e1ef-python
ghcr.io/openhands/agent-server:fix-3142-conversation-search-count-perf-python
ghcr.io/openhands/agent-server:ea32c95-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim

About Multi-Architecture Support

  • Each variant tag (e.g., ea32c95-python) is a multi-arch manifest supporting both amd64 and arm64
  • Docker automatically pulls the correct architecture for your platform
  • Individual architecture tags (e.g., ea32c95-python-amd64) are also available if needed

Both ``_search_conversations`` and ``_count_conversations`` previously
iterated every entry in ``_event_services``, awaited ``get_state()`` on
each, and composed a full ``ConversationInfo`` (which serializes the
whole state via ``model_dump``) before applying pagination. That made
``GET /api/conversations`` and ``count`` O(N) in the total number of
conversations regardless of ``limit``.

- ``_search_conversations`` now sorts by ``stored.created_at`` /
  ``stored.updated_at`` (which already live on ``StoredConversation``
  metadata), so we no longer need a state load to order results. Only
  the actual page slice composes ``ConversationInfo``.
- ``_count_conversations`` short-circuits to ``len(_event_services)``
  when no ``execution_status`` filter is supplied, skipping the state
  fan-out entirely.

This is the minimum change needed to make these endpoints cheap at
scale without introducing a separate sorted metadata index (the bigger
refactor suggested in the issue can build on top of this).

Refs #3142

Co-authored-by: openhands <openhands@all-hands.dev>
@github-actions
Copy link
Copy Markdown
Contributor

Python API breakage checks — ✅ PASSED

Result:PASSED

Action log

@github-actions
Copy link
Copy Markdown
Contributor

REST API breakage checks (OpenAPI) — ✅ PASSED

Result:PASSED

Action log

@github-actions
Copy link
Copy Markdown
Contributor

Coverage

Coverage Report •
FileStmtsMissCoverMissing
openhands-agent-server/openhands/agent_server
   conversation_service.py58611680%144–145, 154, 180–181, 185–186, 191, 303–304, 335, 338, 345–351, 378, 384, 478, 484, 489, 495, 503–504, 513–516, 525, 537, 545, 568–569, 607–608, 612, 637–641, 643–644, 647–648, 651–656, 754, 761–765, 768–769, 773–777, 780–781, 785–789, 792–793, 815–816, 820–821, 823–825, 827, 830, 838–842, 845, 852–857, 859–860, 874, 884, 888, 890–891, 896–897, 903–904, 912, 927–928, 958, 973, 1001, 1295, 1298
TOTAL268621168456% 

@csmith49 csmith49 marked this pull request as ready for review May 14, 2026 18:25
Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean performance optimization that decouples API cost from total conversation count. The key insight - sorting by StoredConversation metadata before loading full state - is well-executed.

[RISK ASSESSMENT]

  • [Overall PR] ⚠️ Risk Assessment: 🟢 LOW

This is a backend API performance optimization with comprehensive regression tests. The optimization avoids loading unnecessary conversation state while preserving all existing behavior: sort orders, pagination, and filtering semantics are unchanged. All 142 existing tests pass, and two new tests guard against performance regressions.

VERDICT:
Worth merging: Solves a real O(N) performance issue with minimal, well-tested changes.

KEY INSIGHT:
Good data structure analysis: sorting metadata is already on StoredConversation, so full state deserialization can be deferred to the page slice.

Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ QA Report: PASS

Verified that conversation search and count operations use the optimized implementation and preserve all existing behavior contracts.

Does this PR achieve its stated goal?

Yes. This PR successfully reduces the computational cost of conversation search and count operations from O(N) to O(page_size) and O(1) respectively. The optimization eliminates unnecessary state loads by sorting on metadata fields (created_at/updated_at) that are already available on StoredConversation, and only composing full ConversationInfo for the actual page slice returned. The count endpoint with no filter now short-circuits to len(_event_services) without touching any conversation state. All existing behavior (sort orders, pagination, status filtering) is preserved.

Phase Result
Environment Setup ✅ Dependencies installed, tests runnable
CI Status ✅ All checks passing (77 agent-server tests, 62 router tests, all other CI green)
Functional Verification ✅ All tests pass, optimization verified by regression tests
Functional Verification

Test 1: Regression tests verify the optimization

Step 1 — Run the new performance regression tests:

Ran uv run pytest tests/agent_server/test_conversation_service.py::TestConversationServiceSearchCountPerf -v:

tests/agent_server/test_conversation_service.py::TestConversationServiceSearchCountPerf::test_count_no_filter_skips_state_loads PASSED [ 50%]
tests/agent_server/test_conversation_service.py::TestConversationServiceSearchCountPerf::test_search_skips_state_for_non_page_items PASSED [100%]

======================== 2 passed, 5 warnings in 0.60s =========================

Interpretation: The two new regression tests explicitly verify:

  1. test_count_no_filter_skips_state_loads — confirms that counting without a filter never calls get_state() on any conversation
  2. test_search_skips_state_for_non_page_items — confirms that searching only calls get_state() on the conversations in the returned page slice, not all conversations

These tests use mocks to assert that get_state() is only called when necessary, proving the optimization is in place.

Test 2: Existing behavior is preserved

Step 1 — Run all conversation service tests:

Ran uv run pytest tests/agent_server/test_conversation_service.py -v:

======================== 77 passed, 5 warnings in 1.60s =========================

All 77 tests pass, including:

  • Sort order tests (CREATED_AT, CREATED_AT_DESC, UPDATED_AT, UPDATED_AT_DESC)
  • Pagination tests (page_id handling, next_page_id generation)
  • Status filtering tests (execution_status filtering)
  • Combined filter and sort tests

Step 2 — Run all conversation router tests (REST API):

Ran uv run pytest tests/agent_server/test_conversation_router.py -v:

======================== 62 passed, 5 warnings in 2.22s =========================

All 62 router tests pass, including:

  • test_search_conversations_default_params
  • test_search_conversations_with_all_params
  • test_count_conversations_no_filter
  • test_count_conversations_with_status_filter

Interpretation: The comprehensive test suite confirms that all API contracts are preserved. Sort orders, pagination, and filtering all work exactly as before. The optimization is a pure internal refactor with no externally observable behavior changes.

Test 3: Code inspection confirms the implementation

Reviewed the implementation in conversation_service.py:

_search_conversations changes:

  • Sorts candidates by stored.created_at or stored.updated_at directly from metadata (lines 421-425)
  • Only loads state and composes ConversationInfo for items in the page slice (lines 437-440)
  • Still loads state for status filtering when execution_status is provided (lines 415-418)

_count_conversations changes:

  • Fast path: returns len(self._event_services) immediately when no filter (lines 463-464)
  • Only iterates and loads state when execution_status filter is provided (lines 467-470)

Interpretation: The implementation matches the PR description exactly. Sorting uses metadata, state loading is deferred until needed, and count has a fast path.

Issues Found

None.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants