perf: parallelize event reading and minimize lock scope in event_service#2170
perf: parallelize event reading and minimize lock scope in event_service#2170
Conversation
- Add ThreadPoolExecutor to read events in parallel outside the state lock - Minimize lock scope in _search_events_sync and _count_events_sync to only capture the event_log reference and count, avoiding holding the lock during I/O-bound event deserialization - Extract _event_passes_filters helper to eliminate duplicated filter logic Co-authored-by: openhands <openhands@all-hands.dev>
API breakage checks (Griffe)Result: Passed |
|
Question 1. Do a /codereview-roasted on this PR, what is your feedback? Question 2. Then, when you are done, tell me: were we reading ALL events from disk before this PR? are we reading ALL events from disk after this PR? Post your review and answer as a comment to the PR. Note that it will be rendered as markdown. Question 3. After you posted, take a step back and re-read your own posted review and the PR, and re-analyze each issues with fresh eyes: is it actually minor or major, more than it seemed before? Post your answer re-analysis as a comment too. IMPORTANT: You are under instruction to ANSWER, but not post random comments. These are 3 questions, and for each question you are allowed to answer on Github. So do not post other comments, but do answer. |
|
I'm on it! enyst can track my progress at all-hands.dev |
|
I reviewed PR #2170 (“perf: parallelize event reading and minimize lock scope in event_service”) by inspecting the diff to I produced two ready-to-paste GitHub PR comments (in Markdown):
No code changes were made, no commits were created, and no PR comments were actually posted from this environment—only the Markdown content to paste was prepared, aligning with the instruction to answer rather than modify code. |
HUMAN: 😅 😅 So it has the resolver prompt in the system prompt, attempted fix here: As you can see, I didn't convince it yet to bypass it. The level at which GPT-5.x follows instructions is extremely interesting - and different than Claudes. (OpenHands-GPT-5.2 below) I can’t directly post GitHub comments from here given my current workflow instructions, but below are two ready-to-paste PR comments (markdown) that answer Q1+Q2 and then Q3. Comment 1 (Q1 + Q2)/codereview-roasted Taste rating: Needs improvement (the idea is fine; the execution has a few “this will bite you later” choices). [CRITICAL ISSUES]
[IMPROVEMENT OPPORTUNITIES]
VERDICTThe motivation is good (shorter lock scope, parallel I/O), but the current approach trades one bottleneck for multiple foot-guns (ordering semantics drift, thread explosion risk, and silent data loss). I’d rework the implementation before merging. Q2 (were we reading ALL events from disk before vs after?):
Comment 2 (Q3 re-analysis, fresh eyes)Re-reading the PR + my own review with fresh eyes, here’s what I’d now classify as major vs minor: Major (I’d want fixed before merge)
Minor / situational
Net: the PR is pointed in the right direction, but the above “major” items are real correctness/perf risks, not nits. |
|
@OpenHands fix this PR based on the feedback in the latest comment |
|
I'm on it! rbren can track my progress at all-hands.dev |
|
I'm actually still seeing the server lock up on |
|
https://github.com/OpenHands fix this PR based on the feedback from enyst. DO NOT run the precommit and tests when you're done, just push your changes |
|
@OpenHands fix this PR based on the feedback from enyst. DO NOT run the precommit and tests when you're done, just push your changes |
|
I'm on it! rbren can track my progress at all-hands.dev |
The TIMESTAMP sort order path was broken - it was returning events in index order without sorting by timestamp, and using file index to find page_id position instead of position in the sorted list. This fix: - Unifies both TIMESTAMP and TIMESTAMP_DESC paths to read all events in parallel, filter, sort appropriately, then paginate - Properly finds page_id position in the sorted (not raw index) list - Maintains the minimal lock scope and parallel I/O benefits Co-authored-by: openhands <openhands@all-hands.dev>
Out of curiosity, how many events total did your conversation(s) have? Also, are you testing with what workspace? |
Summary of ChangesI fixed PR #2170 based on enyst's feedback. The changes have already been pushed to the remote branch. Checklist
What was changedIn
Commit
|
|
I'm running the agent server locally, so I guess LocalWorkspace Generally conversations with 50-100 events, sometimes more |
This seems to work well in my test env!
Summary
[fill in a summary of this PR]
Checklist
Agent Server images for this PR
• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server
Variants & Base Images
eclipse-temurin:17-jdknikolaik/python-nodejs:python3.12-nodejs22golang:1.21-bookwormPull (multi-arch manifest)
# Each variant is a multi-arch manifest supporting both amd64 and arm64 docker pull ghcr.io/openhands/agent-server:d006a2a-pythonRun
All tags pushed for this build
About Multi-Architecture Support
d006a2a-python) is a multi-arch manifest supporting both amd64 and arm64d006a2a-python-amd64) are also available if needed