Skip to content

Comments

Fix event search performance and kind filter bug#2174

Draft
rbren wants to merge 3 commits intomainfrom
fix-event-search-perf
Draft

Fix event search performance and kind filter bug#2174
rbren wants to merge 3 commits intomainfrom
fix-event-search-perf

Conversation

@rbren
Copy link
Contributor

@rbren rbren commented Feb 23, 2026

Summary

This PR fixes critical performance issues and a bug in the events search endpoint that caused:

  1. Kind filter never matching - The filter compared against full module paths instead of simple class names
  2. Slow searches - Holding FIFOLock during full O(n) disk scans blocked agent execution

Key Changes

Bug Fixes

  • Kind filter comparison - Changed from comparing f"{event.__class__.__module__}.{event.__class__.__name__}" to just event.__class__.__name__. Clients send simple class names like 'MessageEvent', not full module paths.

Performance Improvements

  • Removed FIFOLock during search - Search now operates on a snapshot of event log length (GIL makes int reads atomic), avoiding blocking the agent runner thread
  • Early exit for TIMESTAMP_DESC - DESC queries now scan from end of event log, making "last N events" queries O(limit) instead of O(n)
  • O(1) cursor lookup - Uses EventLog.get_index() for pagination cursor lookup when available

Tests Added

New performance tests in test_event_service_perf.py that fail fast for slow queries:

  • test_kind_filter_uses_simple_class_name - Verifies simple class names work
  • test_search_with_limit_completes_quickly - Verifies early exit optimization
  • test_desc_search_last_10_events_fast - Verifies DESC scan from end
  • test_kind_filter_with_many_events_is_fast - Verifies kind filter performance
  • test_pagination_cursor_lookup_is_fast - Verifies O(1) cursor lookup

Testing

uv run pytest tests/agent_server/test_event_service.py tests/agent_server/test_event_service_perf.py -v
# 67 passed

Relaxed Constraints

Per the issue, some constraints about the endpoint being perfectly up-to-date have been relaxed:

  • Search operates on a snapshot of event count, so new events appended during search may not be included
  • This is acceptable as it avoids blocking agent execution with the FIFOLock

@rbren can click here to continue refining the PR


Agent Server images for this PR

GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant Architectures Base Image Docs / Tags
java amd64, arm64 eclipse-temurin:17-jdk Link
python amd64, arm64 nikolaik/python-nodejs:python3.12-nodejs22 Link
golang amd64, arm64 golang:1.21-bookworm Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:bee4fe1-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-bee4fe1-python \
  ghcr.io/openhands/agent-server:bee4fe1-python

All tags pushed for this build

ghcr.io/openhands/agent-server:bee4fe1-golang-amd64
ghcr.io/openhands/agent-server:bee4fe1-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:bee4fe1-golang-arm64
ghcr.io/openhands/agent-server:bee4fe1-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:bee4fe1-java-amd64
ghcr.io/openhands/agent-server:bee4fe1-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:bee4fe1-java-arm64
ghcr.io/openhands/agent-server:bee4fe1-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:bee4fe1-python-amd64
ghcr.io/openhands/agent-server:bee4fe1-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-amd64
ghcr.io/openhands/agent-server:bee4fe1-python-arm64
ghcr.io/openhands/agent-server:bee4fe1-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-arm64
ghcr.io/openhands/agent-server:bee4fe1-golang
ghcr.io/openhands/agent-server:bee4fe1-java
ghcr.io/openhands/agent-server:bee4fe1-python

About Multi-Architecture Support

  • Each variant tag (e.g., bee4fe1-python) is a multi-arch manifest supporting both amd64 and arm64
  • Docker automatically pulls the correct architecture for your platform
  • Individual architecture tags (e.g., bee4fe1-python-amd64) are also available if needed

Key changes:
- Fix kind filter to use simple class name (e.g., 'MessageEvent') instead of
  fully qualified module path. The old code compared against full paths like
  'openhands.sdk.event.llm_convertible.message.MessageEvent' which never
  matched the simpler class names sent by clients.

- Remove FIFOLock acquisition during event search to avoid blocking the agent
  runner thread during O(n) disk scans. The search now operates on a snapshot
  of the event log length (GIL makes int reads atomic).

- Add early exit optimization for TIMESTAMP_DESC queries that scan from the
  end of the event log, reducing search time for 'last N events' queries.

- Add pagination cursor O(1) lookup using EventLog.get_index() when available,
  with fallback to linear search for test mocks.

- Update _count_events_sync with same kind filter fix.

- Add performance tests that fail fast for slow queries:
  * test_kind_filter_uses_simple_class_name
  * test_search_with_limit_completes_quickly
  * test_desc_search_last_10_events_fast
  * test_kind_filter_with_many_events_is_fast
  * test_pagination_cursor_lookup_is_fast

- Update existing tests to use simple class names in kind filter assertions.

Co-authored-by: openhands <openhands@all-hands.dev>
@github-actions
Copy link
Contributor

github-actions bot commented Feb 23, 2026

API breakage checks (Griffe)

Result: Passed

Action log

@github-actions
Copy link
Contributor

github-actions bot commented Feb 23, 2026

Coverage

Coverage Report •
FileStmtsMissCoverMissing
openhands-agent-server/openhands/agent_server
   event_service.py3489971%55–56, 74–76, 84–91, 94–97, 117, 141–142, 165–166, 185, 192, 249, 263–264, 272, 325–326, 330, 338, 341, 389–390, 398–399, 415, 417, 421–423, 427, 436–437, 439, 443, 449, 451, 459–464, 600, 602–603, 607, 621–623, 625, 629–632, 636–639, 647–650, 669–670, 672–679, 681–682, 691–692, 694–695, 702–703, 705–706, 710, 716, 733–734
TOTAL18778858854% 

When the agent is actively running, it holds the state lock for potentially
long periods (during LLM calls, tool execution, etc.). This caused WebSocket
subscriptions and event lookups to block, making the UI appear frozen.

Changes:
- subscribe_to_events: Use non-blocking lock acquisition. If lock is held,
  send minimal state update (just execution_status) instead of blocking.

- _get_event_sync: Remove lock acquisition entirely. Event reading is safe
  without lock since events are immutable once appended.

- Add test: test_subscribe_does_not_block_when_lock_held verifies that
  WebSocket subscriptions complete quickly even when lock is held.

Co-authored-by: openhands <openhands@all-hands.dev>
@rbren
Copy link
Contributor Author

rbren commented Feb 23, 2026

@OpenHands fix the failing gh actions

@openhands-ai
Copy link

openhands-ai bot commented Feb 23, 2026

I'm on it! rbren can track my progress at all-hands.dev

Co-authored-by: openhands <openhands@all-hands.dev>
@openhands-ai
Copy link

openhands-ai bot commented Feb 23, 2026

Summary

I fixed the failing GitHub Actions pre-commit check on PR #2174.

Issue

The Pre-commit checks/pre-commit workflow was failing due to a Ruff formatting violation in event_service.py.

Fix Applied

Changed a multi-line logger.debug() call to a single line:

-                logger.debug(
-                    "Lock held during subscribe, sending minimal state update"
-                )
+                logger.debug("Lock held during subscribe, sending minimal state update")

Verification

  • ✅ Ran pre-commit locally - all checks passed
  • ✅ Change is minimal and directly addresses the formatting issue
  • ✅ Committed with proper co-author attribution
  • ✅ Already pushed to fix-event-search-perf branch

The GitHub Actions should now pass with this formatting fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants