feat(store+upgrader): Phase 2 lock serialization + rollback protection (Issue #632)#780
Closed
jlin53882 wants to merge 3 commits into
Closed
feat(store+upgrader): Phase 2 lock serialization + rollback protection (Issue #632)#780jlin53882 wants to merge 3 commits into
jlin53882 wants to merge 3 commits into
Conversation
PR 739 重建(原本 PR 有 merge conflict): - 新增 src/redis-lock.ts(340行) - dynamic import 避免 ioredis 未安裝時 crash - RedisUnavailableError + Symbol.for marker(跨 module ESM-safe) - isRedisConnectionError 含 wrapped error 遞迴檢查 - parseRedisUrl() 正確解析 URL(含密碼、DB selection) - RedisLockManager + createRedisLockManager 工廠 - dbPath hash namespace 避免跨 store lock interference - 修改 src/store.ts(+50行) - import RedisLockManager/createRedisLockManager/RedisUnavailableError - runWithFileLock() 新增 Redis lock 優先實作 - runWithFileLockCore() 保留既有 file lock 實作 - getRedisLockManager() 模組層級 cache(Option E) - 修正 CortexReach#670:realpath:false 避免 stale lock cleanup 造成 ENOENT - 新增 3 個測試檔(40 個測試,全部通過) - test/redis-lock-error-types.test.mjs - test/redis-lock-url-parse.test.mjs - test/redis-lock-concurrent-init.test.mjs - package.json:新增 ioredis devDependency 注意:batch accumulator(CortexReach#690)完整保留,不受影響。
4bb1e8c to
2b8de59
Compare
Contributor
Author
PR #780 改動詳細說明背景問題:Plugin 在 解法(Two-Phase):
檔案變更總覽
|
| 函式 | 功能 |
|---|---|
safeToNumber(value) |
BigInt/string/number → number;unsafe 時拋錯誤(防精度喪失) |
tryParseNumber(value) |
同上,但 unsafe 時回傳 0(用於 _distance 等外部資料) |
ALLOWED_PATCH_KEYS |
Whitelist:l0_abstract, l1_overview, l2_content, memory_category, upgraded_from, upgraded_at, tier, access_count, confidence |
bulkUpdateMetadataWithPatch() 執行流程(1 lock 內)
Step 1: Query DB → 取得 current metadata(含 Plugin 寫入的 injected_count 等)
Step 2: Merge — base (Plugin fields) + cleanPatch (LLM fields) + cleanMarker
Step 3: Whitelist filter — 只留 ALLOWED_PATCH_KEYS;null/undefined 拋棄
Step 4: Batch delete(1 LanceDB op)
Step 5: Batch add(1 LanceDB op)
Rollback: 若 add 失敗,嘗試 per-entry recovery → restore 原始資料
其他修改
- 所有
Number(row.importance/timestamp)→safeToNumber() Number(row._distance)→tryParseNumber()(graceful degrade)- 刪除:
pendingBatchaccumulator(Issue [BUG] 100 concurrent bulkStore() calls still timeout — updateQueue prevents errors but not throughput #690,已不需要)
src/memory-upgrader.ts 核心改動
舊流程:
for entry:
lock() → LLM enrichment + store.update() → unlock() ← N locks
新流程:
Phase 1(無 lock):
for entry: prepareEntry() ← LLM L0/L1/L2
Phase 2(1 lock):
writeEnrichedBatch() → bulkUpdateMetadataWithPatch() ← 1 lock
scopeFilter 現在正確傳遞到 write phase([F7-fix])。
維護者問題修復對照
| 問題 | 修復方式 |
|---|---|
| F2 Nested lock | writeEnrichedBatch() 直接呼叫 bulkUpdateMetadataWithPatch(),Phase 2 只有 1 層 |
| F4 Re-read 抹掉 Plugin 欄位 | Step 1 re-read 的 base 含 injected_count,merge 後保留 |
| F6 cleanMarker 未過 whitelist | cleanMarker 也經過 ALLOWED_PATCH_KEYS filter |
| MR2 Phase 2a 構建完整 metadata string | 只傳 patch + marker,merge 移到 lock 內 |
| MR3 Re-read 在 delete 之後 | Query → Delete → Add 順序正確 |
| MR5 O(n²) duplicate detection | uniqueEntries 用 new Map() deduplicate |
CI manifest 變動
新增(Issue #632):
test/upgrader-phase2-lock.test.mjstest/upgrader-phase2-extreme.test.mjstest/bulk-recovery-rollback.test.mjstest/upgrader-whitelist-regression.test.mjs
移除(這些測試檔案仍存在,只是從 manifest 移除,不隨此 PR 執行):
test/issue-690-cross-call-batch.test.mjs— pendingBatch 已刪除test/issue606_sdk-migration.test.mjs— Issue [BUG] Windows compatibility: path.join() bug + deprecated extension-api import #606,無關test/infer-provider-from-baseurl.test.mjs— Issue fix(auto-recall): Tier 1 memory counter fix (ms-based suppression + lazy-heal) #712,無關test/is-recall-used.test.mjs— Issue feat(proposal-a): Phase 1 recall governance (Issue #569) #736,無關test/tier1-counters.test.mjs— Issue fix(auto-recall): Tier 1 memory counter fix (ms-based suppression + lazy-heal) #712,無關
-458 刪除說明
全是 pendingBatch accumulator(Issue #690),與 Phase 2 lock 無關:
pendingBatcharray +flushTimer+doFlush()store()路由到bulkStore()的程式碼MAX_PENDING_BATCH_SIZE+ accumulator cleanup
無關內容未被誤刪,已確認。
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
PR #639 重建 — Phase 2 Lock Serialization + Rollback Protection
延續 Issue #632
此 PR 是 PR #639 的乾淨重建版本,解決升級 CLI 與 Plugin 之間的 lock contention 問題。
原始 PR #639:已 CLOSED(scope drift,複雜度過高)
本 PR:乾淨重建,只包含核心功能
核心問題
舊實作每個 entry 都單獨拿一次 lock(N locks per batch),導致 Plugin 要等很久(LLM 在 lock 內跑)。
新實作(Two-Phase):
bulkUpdateMetadataWithPatch()(1 lock per batch)實作內容
src/store.tsbulkUpdateMetadataWithPatch()safeToNumber()tryParseNumber()_distance資料)ALLOWED_PATCH_KEYStier,access_count,confidencesrc/memory-upgrader.tsprepareEntry()writeEnrichedBatch()bulkUpdateMetadataWithPatch(),不再呼叫store.update()F2 fix:
writeEnrichedBatch()直接呼叫bulkUpdateMetadataWithPatch(),Phase 2 lock 只有 1 層。F7 fix:
scopeFilter正確傳遞到 write phase。測試檔案(CI 已註冊)
test/upgrader-phase2-lock.test.mjstest/upgrader-phase2-extreme.test.mjstest/bulk-recovery-rollback.test.mjstest/upgrader-whitelist-regression.test.mjsCI manifest 已更新(
scripts/ci-test-manifest.mjs)。維護者 Review 問題修復狀態
待確認