Skip to content

feat(memory): reconcile stale exact-lock updates#2335

Draft
huangruiteng wants to merge 24 commits into
volcengine:mainfrom
huangruiteng:feat/memory-file-lock-patch-rewrite-pr1
Draft

feat(memory): reconcile stale exact-lock updates#2335
huangruiteng wants to merge 24 commits into
volcengine:mainfrom
huangruiteng:feat/memory-file-lock-patch-rewrite-pr1

Conversation

@huangruiteng
Copy link
Copy Markdown
Contributor

@huangruiteng huangruiteng commented May 30, 2026

Summary

This PR adds a narrow, opt-in write path for concurrent memory updates:

  • wrap runtime StrPatch values with the field value they were generated against (StrPatchWithBase)
  • wrap runtime string replacements with the field value they were generated against (ReplaceValueWithBase)
  • acquire exact file locks during memory apply, then read the latest target file before merging
  • rewrite stale SEARCH/REPLACE patches against the latest field content when configured
  • synthesize stale string replacements against the latest field content when configured
  • fail fast in exact-file-lock mode when the exact lock, latest read, merge/rewrite/synthesis, or endpoint link update cannot be applied safely

It also carries the minimal validation surface needed to run this end to end:

  • read-only memory graph health inspection under /api/v1/stats/memory-graph, exposed through local/http sync/async clients
  • graph-health signals for quality gates: link/backlink consistency, source-link coverage, schema-heading completeness, and lightweight near-duplicate/source-density diagnostics
  • TAU runner support for configurable concurrent corpus session commits plus cache identity checks
  • corpus gates for committed sessions, extracted memories, concrete scoped search/read, graph health, retrieval trace coverage, and completed scoreboards
  • corpus provenance in manifests: train results sha256, TAU-2 commit, OpenViking commit, and OpenViking config path/hash

Scope boundary

Included in this PR:

  • same-URI patch + StrPatch stale SEARCH/REPLACE rewrite primitive
  • same-URI string replace stale synthesis primitive; stale full-value proposals must be synthesized against latest content or fail fast
  • exact-file-lock operation-level fail-fast safety
  • exact-file-lock apply runtime-scoped to agent memory extraction for trajectories and experiences
  • legacy plain-string patch in the agent exact path interpreted as a base-aware full replacement, not as substring patch
  • agent-only validation config via memory.long_term_extraction_enabled=false, memory.agent_memory_enabled=true, and default/off memory.session_skill_extraction_enabled=false
  • deterministic transcript-local delta handling for tool/skill SUM counters
  • generic delete-time StoredLink cleanup for peer links/backlinks
  • per-phase memory_diff_<phase>.json archives with apply_trace fields (uri, field, merge_op, input/wrapper shape, stale/rewrite status, applied/failed status)
  • TAU corpus-build runner plumbing for commit concurrency, provenance manifesting, and manifest/cache validation

Not included here:

  • exact-file-lock apply for standard long-term user memory, tool/skill memory, or session_skills
  • non-string replace synthesis / typed schema merge
  • tools/skills exact-file-lock adapter
  • cross-URI create-new semantic consolidation/admission
  • session skill asset adapter/lifecycle validation
  • failure-experience retrieval or prompt changes
  • a TAU reward/effect uplift claim, a full-C10 recommended-default claim, or a linear wall-time improvement claim

The new memory-write behavior is off by default behind:

  • memory.memory_apply_exact_file_lock_enabled
  • memory.stale_patch_rewrite_enabled

TAU corpus commit concurrency also defaults to serial (1) unless explicitly configured.

Validation

Latest GitHub checks at head 9c22aebe:

  • check-deps: passed
  • lint / lint: passed
  • API & CLI Integration Tests (ubuntu-24.04): passed
  • pr_review: passed
  • build: skipped

Targeted local validation on this branch includes:

  • tests/session/memory/test_memory_timestamp_parsing.py tests/session/memory/test_memory_updater.py tests/session/memory/test_merge_ops.py
    • Result after duplicate-delete cleanup: 109 passed
  • targeted stale rewrite / stale replacement / stale delete trace tests in tests/session/memory/test_memory_updater.py and tests/session/memory/test_merge_ops.py
  • session-skill / agent-only config gates, extract-loop match-text, tool/skill transcript-local deltas, schema exact-boundary tests, compressor exact-boundary tests, TAU runner/analyzer gates, and phase memory-diff trace coverage
  • uvx ruff and git diff --check on touched runtime/test files: passed in the final local validation sequence

Full-train corpus gate at head 9c22aebe:

  • Exact C4 corpus artifact: /private/tmp/pr2335_9ef7e46d_full_train_exact_c4_deleteskip_20260602_010530
  • Frozen tree C1 corpus artifact: /private/tmp/pr2335_fe114b20_full_train_tree_c1_20260601_231521
  • Four corpus manifests (retail/airline x exact/tree) passed strict_corpus_gate with claim_valid=true
  • All four corpora had healthy graph checks: source-linkless, broken endpoint, missing backlink, missing forward link, parse error, and missing required heading counts were all 0
  • Scoped search/read probes returned 10 hits with 2 non-empty concrete reads for each corpus

Full-train corpus shape:

Corpus Sessions committed / skipped Memories extracted Experiences Trajectories Source avg Single-source Near-dup note
exact retail C4 63 / 11 691 78 180 5.90 29.49% name-similar pairs 113
tree retail C1 63 / 11 670 62 174 7.11 33.87% name-similar pairs 49
exact airline C4 26 / 4 136 25 45 3.44 36.00% content-similar pairs 1
tree airline C1 26 / 4 133 32 44 2.75 50.00% content-similar pairs 3

Weighted source-density read:

  • exact: 103 experiences / 546 source links / source avg 5.30 / single-source 31.1%
  • tree: 94 experiences / 529 source links / source avg 5.63 / single-source 39.4%

Read: source-density and near-dup are measured and did not hit graph/schema hard failures. They are still a quality boundary, especially for exact-retail semantic fragmentation / create-new consolidation. This PR should not be read as semantic applicability parity.

Full held-out r4 eval using the frozen full-train corpora:

  • Exact eval artifact: /private/tmp/pr2335_9c22aebe_full_eval_plan_20260602_032646/exact_result/exact_full_eval_r4_exec_20260602_033640
  • Tree eval artifact: /private/tmp/pr2335_9c22aebe_full_eval_plan_20260602_032646/tree_result/tree_full_eval_r4_exec_20260602_041900
  • Both runs: retail + airline, full held-out test split, 4 repeats, 8/8 cells completed, 0 failed, 240 simulations
Mode Retail reward / DB Airline reward / DB Task-weighted reward Task-weighted DB Retrieval aggregate
exact 0.81875 / 0.81875 0.78750 / 0.81250 0.80833 0.81667 513 events / 1026 injected / 1026 non-empty reads
tree 0.84375 / 0.84375 0.75000 / 0.76250 0.81250 0.81667 482 events / 964 injected / 964 non-empty reads

Same-protocol exact vs tree:

  • task-weighted reward delta: -0.0042
  • task-weighted DB delta: 0.0
  • retail exact lower by 0.025 reward/DB
  • airline exact higher by 0.0375 reward and 0.05 DB
  • all exact/tree summaries had corpus_evidence.claim_valid=true and effect_evidence.claim_valid=true

Read: this supports no catastrophic scoreboard/corpus/retrieval-plumbing regression for the PR-1 exact path. It does not claim reward uplift or semantic relevance of every retrieved memory.

Apply trace summary on full-train corpus writes:

Mode Apply trace rows Applied Skipped stale-deleted Stale detected / rewrite attempted
exact C4 2062 2056 6 87 / 87
tree C1 1992 1992 0 0 / 0

Read: stale/delete behavior is recorded explicitly instead of being hidden as success. Rewrite latency/cost and full-C10 stress rewrite rate remain follow-up telemetry, not claims in this PR.

Claim boundary

This is a common server-side write-safety primitive, not a TAU reward claim. The validated runtime scope is agent trajectories / experiences under the agent-only extraction config. The full-train and full held-out r4 evidence supports corpus health, graph/schema health, retrieval plumbing, and non-catastrophic score regression versus the frozen tree-lock baseline. It does not claim exact-file-lock coverage for standard long-term user memory, tools/skills, or session_skills; it does not solve cross-URI create-new duplicates or experience granularity drift; and it does not claim full-C10 wall-time linear improvement.

@github-actions
Copy link
Copy Markdown

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 4 🔵🔵🔵🔵⚪
🏅 Score: 90
🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 Multiple PR themes

Sub-PR theme: Extract: Skip schema tree lock when exact file lock mode is enabled

Relevant files:

  • openviking/session/compressor_v2.py
  • tests/session/memory/test_compressor_v2.py

Sub-PR theme: Memory apply: Exact file lock + stale patch rewrite

Relevant files:

  • openviking/session/memory/memory_updater.py
  • openviking/session/memory/merge_op/base.py
  • openviking/session/memory/merge_op/patch.py
  • openviking_cli/utils/config/memory_config.py
  • tests/session/memory/test_memory_updater.py
  • tests/session/memory/test_merge_ops.py

⚡ Recommended focus areas for review

Minor code duplication

Duplicate _text_digest function exists in both memory_updater.py and patch.py. Consider extracting to a shared utility module for consistency.

def _text_digest(value: Optional[str]) -> Optional[str]:
    if value is None:
        return None
    return hashlib.sha256(value.encode("utf-8")).hexdigest()
LLM resilience considerations

The _rewrite_stale_patch function calls StructuredLLM().complete_model_async without explicit timeout/retry configuration in the visible code. Ensure the underlying LLM client has proper resilience for production use (this may already be handled by StructuredLLM internally).

async def _rewrite_stale_patch(
    *,
    current_value: str,
    patch_value: StrPatchWithBase,
    error: Exception,
) -> Optional[StrPatch]:
    """Ask the configured LLM to rewrite a stale string patch for latest content."""
    from openviking_cli.utils.llm import StructuredLLM

    original_blocks = [
        {"search": block.search, "replace": block.replace} for block in patch_value.blocks
    ]
    prompt = (
        "Rewrite a stale SEARCH/REPLACE patch so it applies to the latest memory field.\n"
        "Return ONLY the JSON schema requested by the caller.\n\n"
        "Rules:\n"
        "- Preserve the user's intended edit from the original patch.\n"
        "- Use exact SEARCH text copied from the latest field value.\n"
        "- Do not include line-number prefixes unless they are truly part of the field.\n"
        "- If no safe rewrite exists, return an empty blocks list.\n\n"
        f"Patch base value:\n{patch_value.base_value or ''}\n\n"
        f"Latest field value:\n{current_value}\n\n"
        f"Original patch blocks:\n{original_blocks}\n\n"
        f"Patch error:\n{error}"
    )
    rewritten = await StructuredLLM().complete_model_async(prompt, StrPatch)
    if rewritten and rewritten.blocks:
        return rewritten
    return None

@github-actions
Copy link
Copy Markdown

PR Code Suggestions ✨

No code suggestions found for the PR.

@huangruiteng huangruiteng changed the title feat(memory): add exact-lock stale patch rewrite feat(memory): reconcile stale exact-lock updates May 31, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Backlog

Development

Successfully merging this pull request may close these issues.

1 participant