Skip to content

feat(memory): reduce corpus write serialization with exact apply#2210

Open
huangruiteng wants to merge 54 commits into
volcengine:mainfrom
huangruiteng:feat/memory-versioned-apply
Open

feat(memory): reduce corpus write serialization with exact apply#2210
huangruiteng wants to merge 54 commits into
volcengine:mainfrom
huangruiteng:feat/memory-versioned-apply

Conversation

@huangruiteng
Copy link
Copy Markdown
Contributor

@huangruiteng huangruiteng commented May 24, 2026

Summary

This PR adds server-side pieces for faster agent-memory corpus preparation while preserving the normal per-trajectory experience semantics and Memory V2 graph consistency.

The practical target is Vaka / TAU-style agent-memory iteration: clients can submit many session commits concurrently, while the server owns file-level safety, patch shape, graph cleanup, retry telemetry, and bounded apply backpressure. This PR no longer includes batch experience consolidation or a batch provider; the acceleration path is now concurrent session commits + concurrent same-session per-trajectory experience phases + operation-exact apply window.

Product behavior changes only for operation-exact phases:

  • memory.operation_exact_apply_window_seconds now defaults to 10.0s.
  • Agent experience / trajectory and standard long-term extraction can opt into operation_exact apply locks.
  • memory.agent_experience_per_trajectory_max_concurrency lets same-session per-trajectory experience phases run concurrently when operation-exact apply is enabled.
  • memory.long_term_extraction_enabled=false can skip standard long-term user/tool/skill memories while preserving agent memories for agent-memory-only corpus builds.
  • TAU corpus-build config records and validates corpus_session_commit_concurrency, expected lock modes, long-term extraction mode, and apply-window settings.

Design Boundary

This PR is deliberately not a semantic reconcile system.

  • Operation-exact phases lock only the concrete files / overviews that extracted operations will apply to.
  • The apply window briefly queues requests that target overlapping concrete file sets; one owner applies queued operations in arrival order under the union of exact locks.
  • The window is a scheduling / ordering primitive, not a late reconcile prompt.
  • Complete string outputs are normalized into structured SEARCH/REPLACE patches before apply whenever possible, so the apply layer can replay a delta on latest file content instead of replacing a whole stale snapshot.
  • Experience supersedes / delete is treated as a graph rewrite, not ordinary metadata: peer links / backlinks that point at superseded experiences are migrated to the replacement URI, stale old-URI edges are cleaned before delete, deleted-link endpoints are included in operation-exact lock/write sets, superseded target reads are version-tracked so late endpoint drift triggers stale-read retry, and supersedes is only consumed after the target is resolved.

The intended contract is: clients may be aggressive about concurrency; server-side Memory V2 should provide safe apply semantics and telemetry without asking every client to implement its own serial discipline.

What Changed

  • Adds operation-exact apply modes for agent experiences, agent trajectories, and standard long-term memory extraction.
  • Adds memory.operation_exact_apply_window_seconds, default 10.0.
  • Adds memory.agent_experience_per_trajectory_max_concurrency, default 4.
  • Converts complete string outputs into structured StrPatch(blocks=[...]) for both merge_op=patch and merge_op=replace string fields when old field content is available.
  • Lets MemoryUpdater apply a structured string patch through PatchOp even if the schema field is normally merge_op=replace.
  • Keeps replace/delete/unknown/plain unstructured operations conflict-sensitive, while allowing structured string patches and safe merge ops to apply against latest content.
  • Migrates replacement peer links / backlinks, cleans remaining old-URI edges for delete operations, includes deleted-link endpoints in operation-exact lock/write sets, and tracks superseded target reads for exact-apply stale retry.
  • Resolves experience supersedes as a graph rewrite:
    • resolved targets are queued for delete through the normal updater path;
    • source trajectory links are inherited onto the superseding experience;
    • other inherited graph links/backlinks are rewritten from the superseded URI to the replacement URI, with same-replacement self-links dropped;
    • prefetched targets that disappear trigger operation-exact retry;
    • never-resolvable targets mark the operation invalid instead of silently creating a near-duplicate;
    • comma / semicolon / newline separated multi-target supersedes strings are supported, while an exact raw name is tried first so valid filenames containing separators still work.
  • Adds memory.long_term_extraction_enabled, default true.
  • Extends TAU benchmark config, preflight, generated commands, and corpus manifests to record and verify expected memory config.
  • Adds phase telemetry for conflict-sensitive buckets/reasons, conflicts, retries, structured string conversion, apply-window leader/follower/wait signals, and per-trajectory experience concurrency.
  • Adds read-only GET /api/v1/stats/memory-graph to let clients inspect memory graph health after concurrent corpus writes, including memory type counts, source links, backlinks, broken endpoints, missing reciprocal links, and violation samples.
  • Adds async/sync local and HTTP client helpers for the same memory graph health summary, so corpus runners can gate on graph integrity without issuing raw stats HTTP calls.

Validation Signal

Small Throughput Probe

Small TAU-2 retail corpus-prepare probe, cached train transcripts, 8 successful sessions, wait timeout 3600s.

Phase times below are server telemetry attribution. They should not be added to task wall time directly, and the tree-control run did not record a complete other phase. The tree row therefore reports the lock bucket that explains the missing other bottleneck.

mode tasks total task duration experience phase trajectory phase long-term / other phase read
tree control 8/8 sum 3750.6s, max 748.3s 62.7s, 5 calls, 0 retry 942.5s, incl. 691.2s tree wait phase not recorded; tree wait mostly tools+skills=2982.7s baseline directory-lock control
all exact before merge-safe stale 8/8 sum 3221.1s, max 884.5s 110.8s, 7 calls, 0 retry / 0 conflict 262.4s, 8 calls, 0 retry / 0 conflict 3199.1s, 26 calls, 18 retries / 44 conflicts stale retries moved cost from lock wait to LLM rerun
all exact + merge-safe stale 8/8 sum 2608.4s, max 637.1s 110.0s, 7 calls, 0 retry / 0 conflict 200.7s, 8 calls, 0 retry / 0 conflict 2582.1s, 22 calls, 14 retries / 41 conflicts remaining conflicts all tools + plain_string_patch

Read: this is a corpus-prepare throughput signal, not a benchmark-score claim. Experience and trajectory phases were already clean in this probe; the long tail was standard long-term memory extraction, especially tools updates that were still emitted as plain-string patches. The conversion layer addresses that patch-shape problem directly; the latest patch also removes batch consolidation and instead allows same-session per-trajectory experience phases to run concurrently.

Full Retail Corpus Graph Check

Cached TAU-2 retail train transcripts, success-only agent-memory corpus build, corpus_session_commit_concurrency=4, operation_exact_apply_window_seconds=10.0, long-term extraction disabled.

run committed / skipped experiences trajectories links / backlinks broken endpoints missing backlinks lingering supersedes exp without source link read
pre-heal full run 59 / 15 118 143 370 / 367 0 3 0 0 graph endpoints existed, but three source-link backlinks were missing
patched full run 59 / 15 108 135 278 / 278 0 0 0 0 final links/backlinks are balanced; no duplicate experience stems

The full patched run validates the main invariant: source lineage and replacement cleanup are handled by the server apply path, not by best-effort post-apply metadata edits.

That full run also surfaced two multi-target supersedes strings from the extractor, for example one field naming several older experience candidates. The latest commit adds server-side parsing for that edge: try the exact raw target first, then split comma/semicolon/newline candidates; resolve every valid target; inherit all source trajectory links; and only treat the operation as invalid when no target can be resolved. This multi-target parser and replacement-link migration are covered by targeted tests; I did not rerun the full 59-session corpus after these small graph-rewrite follow-ups. The latest follow-up also records the superseded file base digest during graph rewrite, so if the old card gains new links before lock/apply, exact apply retries with a refreshed cleanup plan rather than mutating an endpoint that was not part of the original lock set.

TAU Runner Contract

The TAU runner now supports and records the intended client-side write mode:

memory:
  agent_memory_enabled: true
  agent_experience_apply_lock_mode: operation_exact
  agent_trajectory_apply_lock_mode: operation_exact
  long_term_apply_lock_mode: operation_exact
  operation_exact_apply_window_seconds: 10.0
  long_term_extraction_enabled: false
openviking:
  corpus_session_commit_concurrency: 4

--strict-preflight checks the running OpenViking config before a matrix run. The cached corpus manifest records corpus_session_commit_concurrency, corpus_prepare_mode, stable input-order rows, and the expected OpenViking memory config so mismatched reruns fail fast instead of silently reusing an incompatible corpus.

Follow-up / Open Questions

  • Validate the apply-window owner path on a larger full-corpus prepare with the batch provider removed and same-session per-trajectory concurrency enabled.
  • Decide whether tools / skills should emit structured patches directly instead of relying on the compatibility conversion bridge.
  • Replace free-text supersedes names with stable candidate URI/id when the extractor can expose replace candidates safely.
  • Keep staleness as telemetry / evaluation-policy language for now: record source policy, patch base, apply version, window wait/order, and retry/conflict reason before turning it into a product default budget.

Tests

Latest validation:

  • source /Users/bytedance/Documents/agent-harness/scripts/load_local_env.sh && uv run --with ruff ruff check openviking/session/compressor_v2.py tests/session/memory/test_compressor_v2.py openviking/session/memory/memory_updater.py tests/session/memory/test_memory_updater.py

  • source /Users/bytedance/Documents/agent-harness/scripts/load_local_env.sh && uv run --with ruff ruff format --check openviking/session/compressor_v2.py tests/session/memory/test_compressor_v2.py openviking/session/memory/memory_updater.py tests/session/memory/test_memory_updater.py

  • OPENVIKING_CONFIG_FILE=tests/api_test/ov.conf.template .venv/bin/python -m pytest tests/session/memory/test_compressor_v2.py::test_source_trajectory_links_attach_before_exact_lock tests/session/memory/test_compressor_v2.py::test_resolve_supersedes_tracks_deleted_file_version_for_exact_retry tests/session/memory/test_compressor_v2.py::test_resolve_supersedes_consumes_field_only_after_resolved tests/session/memory/test_compressor_v2.py::test_resolve_supersedes_migrates_peer_links_to_replacement_uri tests/session/memory/test_compressor_v2.py::test_resolve_supersedes_accepts_multiple_replaced_experiences tests/session/memory/test_compressor_v2.py::test_resolve_supersedes_keeps_partial_multi_target_resolution tests/session/memory/test_compressor_v2.py::test_resolve_supersedes_prefers_exact_name_before_splitting tests/session/memory/test_compressor_v2.py::test_resolve_supersedes_retries_when_prefetched_target_disappears tests/session/memory/test_compressor_v2.py::test_resolve_supersedes_unresolved_target_marks_operations_invalid tests/session/memory/test_memory_updater.py::TestMemoryUpdater::test_apply_operations_cleans_peer_backlinks_before_delete tests/session/memory/test_memory_updater.py::TestMemoryUpdater::test_apply_operations_migrates_replacement_links_and_cleans_old_uri tests/session/memory/test_memory_updater.py::TestMemoryUpdater::test_apply_operations_heals_preserved_forward_links_on_upsert tests/session/memory/test_memory_updater.py::TestMemoryUpdater::test_apply_operations_cleans_links_added_after_delete_snapshot (14 passed)

  • .venv/bin/python -m compileall -q openviking/session/compressor_v2.py tests/session/memory/test_compressor_v2.py openviking/session/memory/memory_updater.py tests/session/memory/test_memory_updater.py

  • OPENVIKING_CONFIG_FILE=$(mktemp-empty-config) .venv/bin/python -m pytest tests/server/test_api_stats_memory_graph.py -q (2 passed)

  • OPENVIKING_CONFIG_FILE=tests/api_test/ov.conf.template .venv/bin/python -m pytest tests/server/test_api_stats_memory_graph.py tests/client/test_rebuild_clients.py -q (12 passed)

@github-actions
Copy link
Copy Markdown

Persistent review updated to latest commit 2f09aa9

@github-actions
Copy link
Copy Markdown

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Add max retry limit for version conflicts

Add a maximum retry limit for operation-exact version conflicts to prevent potential
infinite loops. Use a config setting similar to v2_lock_max_retries.

openviking/session/compressor_v2.py [1058-1075]

 if not _operation_exact_retry_driver:
     next_attempt = operation_exact_version_attempt
+    max_retries = getattr(config.memory, "v2_version_conflict_max_retries", 10)
     while True:
+        if next_attempt > max_retries:
+            raise RuntimeError(f"[{phase_label}] Exceeded maximum version conflict retries ({max_retries})")
         result = await self._run_extract_phase(
             provider=provider,
             messages=messages,
             ctx=ctx,
             strict_extract_errors=strict_extract_errors,
             phase_label=phase_label,
             post_apply=post_apply,
             force_tree_lock=force_tree_lock,
             operation_exact_version_attempt=next_attempt,
             _operation_exact_retry_driver=True,
         )
         if isinstance(result, _OperationExactRetrySignal):
             next_attempt = result.next_attempt
             continue
         return result
Suggestion importance[1-10]: 5

__

Why: The suggestion correctly identifies a potential infinite loop risk from unbounded version conflict retries, which is a valid concern. However, the provided improved code would cause a NameError because config is not defined before it's used in the outer retry loop (config is fetched later in the method). The core idea is sound, but the implementation needs adjustment to fetch config earlier.

Low

@qin-ctx qin-ctx requested a review from chenjw May 25, 2026 03:26
@huangruiteng huangruiteng changed the title feat(memory): add experimental exact-apply mode for long-term memory feat(memory): reduce corpus write serialization with exact apply May 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Backlog

Development

Successfully merging this pull request may close these issues.

1 participant