volcengine · huangruiteng · May 22, 2026 · May 22, 2026 · May 22, 2026 · May 23, 2026
diff --git a/benchmark/tau2/llm/README.md b/benchmark/tau2/llm/README.md
@@ -242,6 +242,31 @@ Start the OpenViking service before executing memory cells, and verify it with
 Memory V2 baseline. For trajectory memory evidence, start the service from this
 branch and inspect generated trajectory files; changing `search_uri` alone does
 not prove the new trajectory prompt was used.
+Agent Harness / TAU-2 corpus preparation opts into the faster agent-memory
+write path. The default evidence path keeps experience consolidation on the
+normal per-trajectory route and relies on concurrent session commits plus
+server-side exact apply. Configure the running OpenViking server with:
+
+- `memory.agent_memory_enabled=true`
+- `memory.agent_experience_apply_lock_mode="operation_exact"`
+- `memory.agent_trajectory_apply_lock_mode="operation_exact"`
+- `memory.long_term_apply_lock_mode="operation_exact"`
+- `memory.operation_exact_apply_window_seconds=10.0`
+- `memory.long_term_extraction_enabled=false`
+
+`--strict-preflight` checks `OPENVIKING_CONFIG_FILE` (or `~/.openviking/ov.conf`)
+and fails fast if the server-side memory config does not match the experiment
+config. The `10.0s` operation-exact apply window is now also the OpenViking
+product default; the remaining settings are benchmark / Vaka corpus-prepare
+defaults for faster iteration. Experience consolidation keeps the normal
+per-trajectory semantics, while same-session experience phases may run
+concurrently when operation-exact apply is enabled. The operation-exact apply
+window is a server-side owner
+primitive: requests for the same concrete target set queue during a short
+engineering window, then one owner acquires the union of exact locks and applies
+the queued patch timeline in order against locked, latest content. It is not a
+client-side sleep and does not require the benchmark runner to serialize session
+commits.
 
 ## Memory Adapter
 
@@ -272,8 +297,12 @@ is retrieved during eval (`experiences` by default, `trajectories` for
 `config/trajectory.yaml`). The runner prepares each distinct
 `domain + corpus_id` once and reuses it across eval run ids when the cached
 `corpus_manifest.json` is present. Different corpora may be prepared in
-parallel with `benchmark.corpus_prepare_concurrency`; session commits inside one
-corpus remain serial to preserve OpenViking write semantics.
+parallel with `benchmark.corpus_prepare_concurrency`. Session commits inside one
+corpus can also be submitted concurrently with
+`openviking.corpus_session_commit_concurrency`; the default benchmark config uses
+`4`, while `1` keeps the historical serial commit / wait behavior. The corpus
+manifest records both the configured concurrency and stable input-order rows so
+later eval runs can fail fast on mismatched corpus-build semantics.
 
 By default, trajectory extraction is transcript-only: the runner replays TAU-2
 messages into an OpenViking session and does not expose held-out reward or
@@ -283,9 +312,10 @@ session, skip failed train sessions when building positive procedure memory, and
 cap injected memory by total character budget for content-shape ablations.
 
 Eval cells run in parallel with `benchmark.strategy_concurrency` by default and
-can be overridden with `--strategy-concurrency`. This only parallelizes read-only
-TAU-2 eval cells; corpus writes inside one corpus are still serialized by the
-prepare step.
+can be overridden with `--strategy-concurrency`. This parallelizes read-only
+TAU-2 eval cells; corpus writes are controlled separately by
+`benchmark.corpus_prepare_concurrency` across corpora and
+`openviking.corpus_session_commit_concurrency` within a corpus.
 
 ## User Simulator Policy
 

diff --git a/benchmark/tau2/llm/config/baseline.yaml b/benchmark/tau2/llm/config/baseline.yaml
@@ -48,6 +48,21 @@ openviking:
   url: ${OPENVIKING_URL:-http://localhost:1933}
   account: ${OPENVIKING_ACCOUNT:-default}
   agent_id: ${OPENVIKING_AGENT_ID:-tau2-openviking-agent}
+  # Agent Harness / TAU-2 experiment corpus preparation defaults to concurrent
+  # per-session commits plus server-side exact apply.
+  agent_memory_enabled: true
+  agent_experience_apply_lock_mode: operation_exact
+  agent_trajectory_apply_lock_mode: operation_exact
+  long_term_apply_lock_mode: operation_exact
+  operation_exact_apply_window_seconds: 10.0
+  long_term_extraction_enabled: false
+  corpus_session_commit_concurrency: 4
+  # Corpus prepare can legitimately take far longer than the low-level client
+  # default on tree-lock paths. Keep tree/exact lock experiments comparable by
+  # making both the HTTP client and task wait timeout explicit in the generated
+  # run plan.
+  timeout_seconds: 3600
+  wait_timeout_seconds: 3600
   reuse_corpus_across_runs: true
   retrieval_top_k: 4
   prewrite_retrieval_top_k: 6