perf(sdk): cache deserialized events in EventLog to eliminate O(N²) per-step cost#3263
perf(sdk): cache deserialized events in EventLog to eliminate O(N²) per-step cost#3263csmith49 wants to merge 1 commit into
Conversation
…tep cost EventLog._get_single_item() and __iter__() called Event.model_validate_json() on every access, causing redundant Pydantic deserialization across the 3+ full-history passes that Agent.step() performs each step (get_unmatched_actions, View.from_events, enforce_properties). With N events and S steps this is O(N×S) deserialization calls — O(N²) total work per conversation. Add an _event_cache dict[int, Event] that stores deserialized events by index. Since events are immutable once written, the cache is always valid: - __getitem__ / _get_single_item: check cache before reading from disk - __iter__: check cache before reading from disk; populate on miss - append: cache the event directly (object already in hand) - _scan_and_build_index: clear cache on full index rebuild After the first iteration in a step, all subsequent passes hit the cache and skip both FileStore I/O and model_validate_json deserialization. Partial fix for #3134 Co-authored-by: openhands <openhands@all-hands.dev>
Python API breakage checks — ✅ PASSEDResult: ✅ PASSED |
REST API breakage checks (OpenAPI) — ✅ PASSEDResult: ✅ PASSED |
all-hands-bot
left a comment
There was a problem hiding this comment.
🟢 Good taste - Clean performance fix that eliminates O(N²) deserialization cost.
Key strengths:
- Simple cache implementation with correct invalidation (cleared on index rebuild)
- Events are immutable once written, so caching is safe
- Tests verify both cached and uncached paths appropriately
- No API changes or breaking behavior
[RISK ASSESSMENT]
- [Overall PR]
⚠️ Risk Assessment: 🟢 LOW
Internal performance optimization with no behavioral changes. Cache grows with conversation length but this is an acceptable tradeoff for eliminating quadratic deserialization cost. Good test coverage ensures correctness.
VERDICT:
✅ Worth merging - Solves real O(N²) problem with minimal, correct code.
KEY INSIGHT:
Since events are immutable after write, caching deserialized objects is safe and eliminates repeated JSON parsing overhead - a textbook performance win.
all-hands-bot
left a comment
There was a problem hiding this comment.
✅ QA Report: PASS
EventLog caching implementation successfully eliminates O(N²) deserialization cost with zero behavioral changes.
Does this PR achieve its stated goal?
Yes. The PR set out to "cache deserialized events in EventLog to eliminate O(N²) per-step cost" (issue #3134). Functional verification confirms:
- Caching works correctly: Events are cached on first access (append or iteration), and subsequent accesses return the exact same object (identity check passes).
- Performance improvement delivered: With caching, Agent.step()'s 3+ full-history passes per step now achieve a 66.7% cache hit rate. The second and third passes hit the cache entirely, eliminating redundant
model_validate_json()calls. - O(N²) → O(N) transformation: Before the fix, N events × S steps = O(N×S) deserializations. After, each event is deserialized once regardless of how many passes scan the history.
- Zero behavioral changes: All 21 EventLog tests pass, integration with Conversation works correctly, and cache invalidation on index rebuild is handled properly.
The cache implementation is sound: events are immutable once written, so caching is safe. Cache clearing on _scan_and_build_index() ensures consistency after index rebuilds.
| Phase | Result |
|---|---|
| Environment Setup | ✅ Dependencies installed, project builds successfully |
| CI Status | ✅ All critical tests passing (sdk-tests, tools-tests, workspace-tests, agent-server-tests, pre-commit, API breakage checks) |
| Functional Verification | ✅ 6/6 verification tests passed; performance benchmark confirms caching effectiveness |
Functional Verification
Test 1: Cache Identity
Step 1 — Verify caching works:
Ran custom verification script that creates an EventLog, appends an event, and accesses it multiple times:
event = create_test_event("test-1", "Hello World")
log.append(event)
first = log[0]
second = log[0]
third = log[0]Result:
✓ PASS: Repeated access returns cached object (identity check passed)
Object ID: 140237719028432
This confirms that all three accesses returned the exact same Python object (same memory address), proving no re-deserialization occurred.
Test 2: Iteration Populates Cache
Step 1 — Verify iteration caching:
Created EventLog with 5 events, cleared cache, then iterated:
log._event_cache.clear() # Force cold iteration
events_from_iter = list(log)Result:
Cache cleared. Size: 0
After iteration, cache size: 5
✓ PASS: Indexed access after iteration returns cached objects
This confirms iteration populates the cache, and subsequent indexed access (log[i]) returns the same cached objects.
Test 3: Cache Persistence
Step 1 — Verify cache survives multiple iterations:
Ran three consecutive iterations and verified object identity:
first_pass = list(log)
second_pass = list(log)
third_pass = list(log)Result:
✓ PASS: Multiple iterations return same cached objects
All three iterations returned identical objects (identity check), confirming the cache persists correctly.
Test 4: Append Caches Directly
Step 1 — Verify append() caches the event:
Created an event, appended it, then retrieved it:
event = create_test_event("original", "Original content")
log.append(event)
retrieved = log[0]Result:
✓ PASS: Append caches the event object directly
Original: 140237719028432, Retrieved: 140237719028432
The appended and retrieved objects have the same ID, confirming append() caches directly without a disk roundtrip.
Test 5: Performance Improvement
Step 1 — Simulate Agent.step() behavior:
Created EventLog with 50 events and performed 3 full-history passes:
pass1 = list(log) # First pass: cache miss, deserialize from disk
pass2 = list(log) # Second pass: cache hit
pass3 = list(log) # Third pass: cache hitResult:
✓ Completed 3 full passes over 50 events in 0.0000s
Pass 1: 50 events
Pass 2: 50 events
Pass 3: 50 events
✓ PASS: All passes used cached objects (no re-deserialization)
All three passes returned identical cached objects. The near-zero time confirms caching eliminates deserialization overhead.
Performance Benchmark Results:
Ran performance benchmark with varying conversation sizes:
Events Total Reads Time (s) Events/sec
10 50 0.000011 4,660,338
25 125 0.000010 12,787,512
50 250 0.000014 18,078,897
100 500 0.000025 19,972,876
Cache Effectiveness Over Time:
Step Events Cache Size Cache Hit %
1 5 5 66.7
2 10 10 66.7
3 20 20 66.7
4 30 30 66.7
5 50 50 66.7
The 66.7% hit rate is expected: first pass misses cache (deserializes), subsequent passes hit cache (2/3 = 66.7%).
Object Identity Across Accesses:
Index Access 1 Access 2 Same Object?
0 139698293671488 139698293671488 ✓ YES
1 139697644551808 139697644551808 ✓ YES
2 139697644549888 139697644549888 ✓ YES
3 139697644551088 139697644551088 ✓ YES
4 139697644549408 139697644549408 ✓ YES
Every indexed access returned the exact same object, confirming caching works correctly.
Test 6: Integration with Conversation
Step 1 — Verify caching in real Conversation context:
Created a full Conversation with Agent and sent a message:
conversation = Conversation(agent=agent, workspace="/tmp")
conversation.send_message("Hello!")
event_log = conversation._state.eventsResult:
✓ PASS: Conversation EventLog caching works (2 events)
The EventLog used by Conversation correctly cached events, and repeated iterations returned identical objects.
Issues Found
None.
This QA report was created by an AI agent (OpenHands) on behalf of the user.
Summary
Partial fix for #3134 — perf: O(N²) total cost per conversation from full-history re-scan every step (from tracking issue #3153).
Problem
EventLog._get_single_item()and__iter__()callEvent.model_validate_json()on every access — full Pydantic deserialization from disk each time.Agent.step()performs 3+ full-history passes per step (get_unmatched_actions,View.from_events,enforce_properties), so the same events are deserialized multiple times per step. With N events and S steps this is O(N×S) deserialization calls — O(N²) total work per conversation.Solution
Add an
_event_cache: dict[int, Event]toEventLogthat stores deserialized events by index. Since events are immutable once written, the cache is always valid.Cache integration points
_get_single_item__iter__append_scan_and_build_indexAfter the first iteration in a step, all subsequent passes (View construction, property enforcement, unmatched action scan) hit the cache and skip both FileStore I/O and
model_validate_jsondeserialization entirely.Changes
openhands-sdk/.../conversation/event_store.py_event_cachedict; integrate into_get_single_item,__iter__,append,_scan_and_build_indextests/sdk/conversation/test_event_store.pyTesting
This PR was created by an AI agent (OpenHands) on behalf of the user.
Agent Server images for this PR
• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server
Variants & Base Images
eclipse-temurin:17-jdknikolaik/python-nodejs:python3.13-nodejs22-slimgolang:1.21-bookwormPull (multi-arch manifest)
# Each variant is a multi-arch manifest supporting both amd64 and arm64 docker pull ghcr.io/openhands/agent-server:dd432c3-pythonRun
All tags pushed for this build
About Multi-Architecture Support
dd432c3-python) is a multi-arch manifest supporting both amd64 and arm64dd432c3-python-amd64) are also available if needed