Fix event search performance and kind filter bug#2174
Draft
Conversation
Key changes: - Fix kind filter to use simple class name (e.g., 'MessageEvent') instead of fully qualified module path. The old code compared against full paths like 'openhands.sdk.event.llm_convertible.message.MessageEvent' which never matched the simpler class names sent by clients. - Remove FIFOLock acquisition during event search to avoid blocking the agent runner thread during O(n) disk scans. The search now operates on a snapshot of the event log length (GIL makes int reads atomic). - Add early exit optimization for TIMESTAMP_DESC queries that scan from the end of the event log, reducing search time for 'last N events' queries. - Add pagination cursor O(1) lookup using EventLog.get_index() when available, with fallback to linear search for test mocks. - Update _count_events_sync with same kind filter fix. - Add performance tests that fail fast for slow queries: * test_kind_filter_uses_simple_class_name * test_search_with_limit_completes_quickly * test_desc_search_last_10_events_fast * test_kind_filter_with_many_events_is_fast * test_pagination_cursor_lookup_is_fast - Update existing tests to use simple class names in kind filter assertions. Co-authored-by: openhands <openhands@all-hands.dev>
Contributor
API breakage checks (Griffe)Result: Passed |
Contributor
Coverage Report •
|
||||||||||||||||||||
When the agent is actively running, it holds the state lock for potentially long periods (during LLM calls, tool execution, etc.). This caused WebSocket subscriptions and event lookups to block, making the UI appear frozen. Changes: - subscribe_to_events: Use non-blocking lock acquisition. If lock is held, send minimal state update (just execution_status) instead of blocking. - _get_event_sync: Remove lock acquisition entirely. Event reading is safe without lock since events are immutable once appended. - Add test: test_subscribe_does_not_block_when_lock_held verifies that WebSocket subscriptions complete quickly even when lock is held. Co-authored-by: openhands <openhands@all-hands.dev>
Contributor
Author
|
@OpenHands fix the failing gh actions |
|
I'm on it! rbren can track my progress at all-hands.dev |
Co-authored-by: openhands <openhands@all-hands.dev>
SummaryI fixed the failing GitHub Actions pre-commit check on PR #2174. IssueThe Fix AppliedChanged a multi-line - logger.debug(
- "Lock held during subscribe, sending minimal state update"
- )
+ logger.debug("Lock held during subscribe, sending minimal state update")Verification
The GitHub Actions should now pass with this formatting fix. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR fixes critical performance issues and a bug in the events search endpoint that caused:
Key Changes
Bug Fixes
f"{event.__class__.__module__}.{event.__class__.__name__}"to justevent.__class__.__name__. Clients send simple class names like'MessageEvent', not full module paths.Performance Improvements
EventLog.get_index()for pagination cursor lookup when availableTests Added
New performance tests in
test_event_service_perf.pythat fail fast for slow queries:test_kind_filter_uses_simple_class_name- Verifies simple class names worktest_search_with_limit_completes_quickly- Verifies early exit optimizationtest_desc_search_last_10_events_fast- Verifies DESC scan from endtest_kind_filter_with_many_events_is_fast- Verifies kind filter performancetest_pagination_cursor_lookup_is_fast- Verifies O(1) cursor lookupTesting
uv run pytest tests/agent_server/test_event_service.py tests/agent_server/test_event_service_perf.py -v # 67 passedRelaxed Constraints
Per the issue, some constraints about the endpoint being perfectly up-to-date have been relaxed:
@rbren can click here to continue refining the PR
Agent Server images for this PR
• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server
Variants & Base Images
eclipse-temurin:17-jdknikolaik/python-nodejs:python3.12-nodejs22golang:1.21-bookwormPull (multi-arch manifest)
# Each variant is a multi-arch manifest supporting both amd64 and arm64 docker pull ghcr.io/openhands/agent-server:bee4fe1-pythonRun
All tags pushed for this build
About Multi-Architecture Support
bee4fe1-python) is a multi-arch manifest supporting both amd64 and arm64bee4fe1-python-amd64) are also available if needed