Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
55 commits
Select commit Hold shift + click to select a range
f690c88
feat: batch agent experience consolidation
huangruiteng May 22, 2026
f33643e
style: format batch experience tests
huangruiteng May 22, 2026
309f916
test: cover batch experience chunk sizing
huangruiteng May 22, 2026
bf6acfb
fix: derive batch experience prompt from single provider
huangruiteng May 23, 2026
098e50b
style: format batch experience prompt adapter
huangruiteng May 23, 2026
f0de40e
feat(memory): align experience prompt with atomic intent
huangruiteng May 23, 2026
f8dacfa
fix(memory): match atomic experience prompt archive
huangruiteng May 23, 2026
76ebbf6
fix: satisfy batch experience lint
huangruiteng May 23, 2026
d13ccbf
fix: preserve batch experience granularity
huangruiteng May 23, 2026
dc03db1
style: format batch experience test
huangruiteng May 23, 2026
941fa18
feat: expose agent memory phase telemetry
huangruiteng May 23, 2026
abe602b
feat: surface commit telemetry in benchmark manifests
huangruiteng May 23, 2026
8cce894
fix: preserve batch action boundaries
huangruiteng May 23, 2026
3fe7bc5
feat: audit experience corpus quality
huangruiteng May 23, 2026
223a473
chore: trim batch experience diagnostics
huangruiteng May 23, 2026
cf6a7c7
chore: tighten batch prompt adapter
huangruiteng May 23, 2026
beaf4b9
chore: default TAU corpus prep to batch mode
huangruiteng May 23, 2026
1ae63ba
chore: format TAU batch eval config
huangruiteng May 23, 2026
f581f90
feat: prototype exact-lock experience apply
huangruiteng May 23, 2026
e81af24
feat: retry stale experience exact-lock applies
huangruiteng May 23, 2026
5ecc6ac
fix: limit read version tracking to experiences
huangruiteng May 23, 2026
bb72a75
chore: expose experience exact-lock target diagnostics
huangruiteng May 23, 2026
454d760
chore: make TAU OpenViking wait timeout explicit
huangruiteng May 23, 2026
f385e1d
chore: raise default TAU OpenViking wait timeout
huangruiteng May 23, 2026
fe2c33e
chore: expose memory phase lock plans
huangruiteng May 23, 2026
165c22c
chore: expose lock acquire bucket telemetry
huangruiteng May 24, 2026
ee54577
feat: add exact apply mode for agent trajectories
huangruiteng May 24, 2026
533f03c
feat: add operation-exact long-term apply diagnostics
huangruiteng May 24, 2026
2f09aa9
style: format exact-apply changes
huangruiteng May 24, 2026
d9ec156
feat(memory): report stale-read telemetry
huangruiteng May 24, 2026
9d14920
feat(memory): allow agent-only memory extraction
huangruiteng May 24, 2026
afa092d
fix(tau2): validate cached memory config
huangruiteng May 24, 2026
1c0a8b8
feat(memory): convert plain patches and commit corpora concurrently
huangruiteng May 25, 2026
4937d8b
docs(tau2): default corpus prepare to efficient agent memory writes
huangruiteng May 25, 2026
7676788
docs(tau2): use exact apply as corpus prepare default
huangruiteng May 25, 2026
2c84e89
feat(memory): add operation exact apply window
huangruiteng May 25, 2026
08652a4
chore(tau2): use engineering apply window
huangruiteng May 25, 2026
bb01758
feat(memory): enable default exact apply window
huangruiteng May 25, 2026
5c22e32
style(memory): format exact apply changes
huangruiteng May 25, 2026
27ebda9
Merge latest main into memory versioned apply
huangruiteng May 25, 2026
be07939
chore(memory): set exact apply window default to ten seconds
huangruiteng May 25, 2026
adf369c
chore(memory): fix lint after merge
huangruiteng May 25, 2026
096ff99
feat(memory): use per-trajectory concurrency for exact apply
huangruiteng May 25, 2026
95c8a60
fix(memory): clean graph links on superseded deletes
huangruiteng May 25, 2026
f210f8e
fix(memory): retry stale supersedes replacements
huangruiteng May 25, 2026
a4a5e78
fix(memory): include source links in exact apply
huangruiteng May 25, 2026
0d2fe59
fix(memory): heal preserved graph links on upsert
huangruiteng May 25, 2026
9c39d8d
fix(memory): resolve multi-target supersedes replacements
huangruiteng May 25, 2026
66b8d18
fix(memory): migrate replacement graph links
huangruiteng May 26, 2026
9cdbb1c
style(memory): format supersedes tests
huangruiteng May 26, 2026
972ddce
fix(memory): track supersedes reads for exact cleanup
huangruiteng May 26, 2026
db808bf
feat(memory): expose memory graph health stats
huangruiteng May 26, 2026
47f0aee
feat(client): expose memory graph health helper
huangruiteng May 26, 2026
7974e06
test(memory): align graph rewrite expectations
huangruiteng May 26, 2026
1381f4f
Merge remote-tracking branch 'origin/main' into codex/memory-versione…
huangruiteng May 26, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
40 changes: 35 additions & 5 deletions benchmark/tau2/llm/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -242,6 +242,31 @@ Start the OpenViking service before executing memory cells, and verify it with
Memory V2 baseline. For trajectory memory evidence, start the service from this
branch and inspect generated trajectory files; changing `search_uri` alone does
not prove the new trajectory prompt was used.
Agent Harness / TAU-2 corpus preparation opts into the faster agent-memory
write path. The default evidence path keeps experience consolidation on the
normal per-trajectory route and relies on concurrent session commits plus
server-side exact apply. Configure the running OpenViking server with:

- `memory.agent_memory_enabled=true`
- `memory.agent_experience_apply_lock_mode="operation_exact"`
- `memory.agent_trajectory_apply_lock_mode="operation_exact"`
- `memory.long_term_apply_lock_mode="operation_exact"`
- `memory.operation_exact_apply_window_seconds=10.0`
- `memory.long_term_extraction_enabled=false`

`--strict-preflight` checks `OPENVIKING_CONFIG_FILE` (or `~/.openviking/ov.conf`)
and fails fast if the server-side memory config does not match the experiment
config. The `10.0s` operation-exact apply window is now also the OpenViking
product default; the remaining settings are benchmark / Vaka corpus-prepare
defaults for faster iteration. Experience consolidation keeps the normal
per-trajectory semantics, while same-session experience phases may run
concurrently when operation-exact apply is enabled. The operation-exact apply
window is a server-side owner
primitive: requests for the same concrete target set queue during a short
engineering window, then one owner acquires the union of exact locks and applies
the queued patch timeline in order against locked, latest content. It is not a
client-side sleep and does not require the benchmark runner to serialize session
commits.

## Memory Adapter

Expand Down Expand Up @@ -272,8 +297,12 @@ is retrieved during eval (`experiences` by default, `trajectories` for
`config/trajectory.yaml`). The runner prepares each distinct
`domain + corpus_id` once and reuses it across eval run ids when the cached
`corpus_manifest.json` is present. Different corpora may be prepared in
parallel with `benchmark.corpus_prepare_concurrency`; session commits inside one
corpus remain serial to preserve OpenViking write semantics.
parallel with `benchmark.corpus_prepare_concurrency`. Session commits inside one
corpus can also be submitted concurrently with
`openviking.corpus_session_commit_concurrency`; the default benchmark config uses
`4`, while `1` keeps the historical serial commit / wait behavior. The corpus
manifest records both the configured concurrency and stable input-order rows so
later eval runs can fail fast on mismatched corpus-build semantics.

By default, trajectory extraction is transcript-only: the runner replays TAU-2
messages into an OpenViking session and does not expose held-out reward or
Expand All @@ -283,9 +312,10 @@ session, skip failed train sessions when building positive procedure memory, and
cap injected memory by total character budget for content-shape ablations.

Eval cells run in parallel with `benchmark.strategy_concurrency` by default and
can be overridden with `--strategy-concurrency`. This only parallelizes read-only
TAU-2 eval cells; corpus writes inside one corpus are still serialized by the
prepare step.
can be overridden with `--strategy-concurrency`. This parallelizes read-only
TAU-2 eval cells; corpus writes are controlled separately by
`benchmark.corpus_prepare_concurrency` across corpora and
`openviking.corpus_session_commit_concurrency` within a corpus.

## User Simulator Policy

Expand Down
15 changes: 15 additions & 0 deletions benchmark/tau2/llm/config/baseline.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,21 @@ openviking:
url: ${OPENVIKING_URL:-http://localhost:1933}
account: ${OPENVIKING_ACCOUNT:-default}
agent_id: ${OPENVIKING_AGENT_ID:-tau2-openviking-agent}
# Agent Harness / TAU-2 experiment corpus preparation defaults to concurrent
# per-session commits plus server-side exact apply.
agent_memory_enabled: true
agent_experience_apply_lock_mode: operation_exact
agent_trajectory_apply_lock_mode: operation_exact
long_term_apply_lock_mode: operation_exact
operation_exact_apply_window_seconds: 10.0
long_term_extraction_enabled: false
corpus_session_commit_concurrency: 4
# Corpus prepare can legitimately take far longer than the low-level client
# default on tree-lock paths. Keep tree/exact lock experiments comparable by
# making both the HTTP client and task wait timeout explicit in the generated
# run plan.
timeout_seconds: 3600
wait_timeout_seconds: 3600
reuse_corpus_across_runs: true
retrieval_top_k: 4
prewrite_retrieval_top_k: 6
Expand Down
Loading
Loading