feat(redis-lock): 分散式 Redis lock 實作，store.ts 整合 + file-lock fallback by jlin53882 · Pull Request #788 · CortexReach/memory-lancedb-pro

jlin53882 · 2026-05-09T17:54:23Z

PR 739 — Redis 分散式 Lock 實作

目標

在高並發寫入時使用 Redis 分散式 lock 解決 lock contention 問題，同時支援 Redis 不可用時無縫降級到現有的 file lock。

實作方式

Option E: Redis lock 優先 + file lock fallback

Redis 可用時：用 Redis lock（分散式跨 process）
Redis 不可用時（連線錯誤）：無縫降級到 file lock
兩種 lock 使用相同的 lock key：.memory-write.lock

修復的問題

Issue	問題	解決方式
#670	stale lock cleanup 造成 ENOENT	現有 realpath:false 設定保留
Issue 1	ioredis top-level import 造成 crash	改用 dynamic import
Issue 3	parseRedisUrl() 吃掉 DB selection	用 URL API 正確解析
Issue 4	不同 dbPath 的 store 互相 blocking	dbPath hash namespace

變更檔案

檔案	變更
`src/redis-lock.ts`	新增（327 行）— RedisLockManager 實作
`src/store.ts`	新增（+49 行）— `runWithFileLock` 整合 Redis lock
`test/redis-lock-url-parse.test.mjs`	新增（125 行，15 tests）
`test/redis-lock-error-types.test.mjs`	新增（138 行，6 tests）
`test/redis-lock-concurrent-init.test.mjs`	新增（84 行，6 tests）
`scripts/ci-test-manifest.mjs`	登記 3 個測試

測試

40 個 redis-lock 測試全部通過
現有 lock 相關測試（lock-release-on-error.test.mjs）通過

- src/redis-lock.ts: RedisLockManager 實作（dynamic import, parseRedisUrl fix, dbPath namespace） - src/store.ts: runWithFileLock 升級 — Redis lock 優先，RedisUnavailableError fallback 到 file lock - test/redis-lock-*.test.mjs: 3 個單元測試（URL parse / error types / concurrent init） - scripts/ci-test-manifest.mjs: 登記 3 個 redis-lock 測試 Option E 策略：Redis 可用時用 Redis lock，Redis 不可用時無縫降級到 file lock。

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 605747abfb

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-09T17:56:45Z

+      } catch (err) {
+        // M4 fix: RedisUnavailableError → 無縫 fallback 到 file lock
+        if (err instanceof Error && Symbol.for("RedisUnavailableError") in err) {
+          // Redis 連線失敗，進 file lock fallback


Do not fall back to file locks after Redis was selected

When Redis was healthy at startup, a later RedisUnavailableError during manager.acquire() falls through to the file-lock path. In a multi-process deployment, other processes that still have Redis connectivity continue guarding the same dbPath only with Redis and never acquire this file lock, so both lock domains can enter fn() concurrently for the same LanceDB store. This defeats the cross-process write serialization and can corrupt or lose writes; runtime Redis failures should not silently switch a single process to a different lock domain unless all participants are coordinated.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-05-09T17:56:45Z

+  if (redisLockInitAttempted) {
+    return redisLockManager;
+  }
+  redisLockInitAttempted = true;


Await the in-flight Redis initialization

If two writes start before the first Redis initialization finishes, the first call sets redisLockInitAttempted = true and awaits createRedisLockManager(), while the second call immediately returns the still-null redisLockManager and uses the file lock. With Redis available, the first call then uses a Redis lock, so startup writes can run under different lock domains for the same dbPath. Cache and await the initialization promise instead of marking initialization complete before the manager is assigned.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-05-09T17:56:45Z

+async function getRedisLockManager(dbPath: string) {
+  if (redisLockInitAttempted) {
+    return redisLockManager;


Cache Redis lock managers per dbPath

The process-wide singleton ignores dbPath after the first store initializes. If a process opens store A first and later writes store B, it will still acquire Redis keys namespaced with A's hash, while another process that opens B first uses B's hash; those two processes can then write to the same store B without sharing a Redis lock. Since dbPath is part of the lock namespace, the cache needs to be keyed by dbPath rather than returning one global manager.

Useful? React with 👍 / 👎.

TurboTheTurtle · 2026-05-10T01:24:04Z

Review finding:

P1: runtime Redis fallback can split the lock domain

In src/store.ts, once getRedisLockManager(this.config.dbPath) returns a manager, runWithFileLock() tries Redis first but falls back to the local file lock when manager.acquire() throws RedisUnavailableError:

src/store.ts:272-275
src/store.ts:282-284

That is unsafe for a distributed lock. If one process has selected Redis locking and another process later sees a transient Redis failure and falls back to file locking, those two processes no longer coordinate with each other and can write concurrently. Runtime Redis lock acquisition failures should probably fail closed, or the process needs a globally consistent mode switch that guarantees all writers are in the same lock domain before entering the critical section.

P2: Redis namespace is process-global but initialized from the first dbPath

redisLockManager and redisLockInitAttempted are singletons (src/store.ts:81-92), while the namespace is derived from the first dbPath. Later MemoryStore instances with a different dbPath reuse the first manager/namespace, so the PR's dbPath isolation guarantee does not hold inside one process. Consider caching managers by normalized dbPath instead of using a single process-wide instance.

R2 (CRITICAL): Per-dbPath cache replaces singleton - redisLockCache Map<dbPath, {manager, initPromise}> per-dbPath instead of global singleton - initPromise=null → fully initialized; in-flight Promise shared for concurrent calls - Fixes: singleton ignored dbPath, all instances shared one manager R3 (HIGH): Race-safe Promise-based init - Concurrent getRedisLockManager('/same') share one init Promise - TOCTOU window eliminated via Map synchronization R1 (MEDIUM): release() error non-fatal handling - isRedisConnectionError(releaseErr) in finally block catches and warns - Connection errors don't propagate; unexpected errors still throw Test: test/redis-lock-cache-r1-r2-r3.test.mjs (9 subtests, all pass) CI: scripts/ci-test-manifest.mjs updated

jlin53882 · 2026-05-10T16:43:16Z

R1/R2/R3 修復說明

已針對維護者提出的 3 個問題進行修復，commit: 51f3df4 on james/pr739-redis-lock-clean

R2 (CRITICAL) — Singleton 忽略 dbPath ✅

原有程式使用全域 let redisLockManager singleton，所有 dbPath 共用同一個 manager instance，忽略了 dbPath 參數。

修復：改用 redisLockCache Map<dbPath, {manager, initPromise}>，每個 dbPath 獨立管理自己的 lock manager。

R3 (HIGH) — Init Race Condition ✅

redisLockInitAttempted 是簡單 boolean，TOCTOU race 導致並發調用重複執行 init。

修復：Map 本身就是同步容器，in-flight Promise 被並發調用共享，不會重複建立 manager。

R1 (MEDIUM) — release() 錯誤導致 mid-operation exception ✅

finally { await release(); } 若 Redis 連線中斷，release() 會 throw，覆蓋原本成功的手術結果。

修復：finally 內加 try-catch，connection error → console.warn（不中斷），其餘 error 仍正常拋出。

測試驗證

9 個 subtests，全 pass：

R2: createRedisLockManager returns null / namespace isolation / per-dbPath isolation
R3: concurrent calls return consistent results / different dbPaths separate init
R1: isRedisConnectionError detection / non-connection throws / release error caught

測試檔案: test/redis-lock-cache-r1-r2-r3.test.mjs

jlin53882 · 2026-05-10T17:13:05Z

RED #1 追蹤更新

問題： (store.ts:142) 包含 5 個 sync I/O，plugin init 時 block event loop。

評估結果：不適合放 PR #788 或 PR #794，scope 各自獨立。

PR	目標	validateStoragePath
PR #788	Redis lock	❌
PR #794	runWithFileLock 內部 sync	❌

已新建 Issue #795 追蹤：
#795

修復方向： → async + fire-and-forget call site ()。

P3: null manager cache entry race - all concurrent calls return identical results (no null manager leak) ✅ VERIFIED PASS P4: cache has no TTL/refresh - repeated calls return same cached value (no retry mechanism exists) ✅ VERIFIED CONFIRMED The uncaughtException in test run is a cleanup artifact (noisy but tests pass). Both issues are observable and correctly handled.

TurboTheTurtle · 2026-05-11T01:45:57Z

Thanks for the quick fix on the per-dbPath Redis manager cache. That resolves the namespace/singleton part of my comment.

I think the P1 lock-domain split is still unresolved on the current head, though. runWithFileLock() still catches RedisUnavailableError from manager.acquire() and then falls through to the local file lock path (src/store.ts:306-314). Once a process has selected Redis locking, a transient runtime Redis acquire failure should not let that same write enter the local file lock domain while other writers may still be coordinating through Redis.

Could this fail closed for runtime Redis acquisition failures instead, or otherwise document/enforce a globally consistent fallback mode before entering the critical section? Initial Redis unavailability before a manager exists can reasonably choose file locking, but falling back after Redis mode is active is the part that still seems unsafe for distributed writers.

jlin53882 · 2026-05-11T02:35:34Z

👋 @TurboTheTurtle 分析結果如下：

核心問題：Fail Closed vs Fallback 的抉擇

你的 concern 是：運行時 acquire() 失敗時 fallback 到 file lock，會造成「某些 process 在 Redis domain，某些在 file lock domain」的 split-brain 狀態，進而破壞 distributed coordination。

`acquire()` 失敗的兩種實際場景

瞬斷（Redis 連不上）

set() throw → isRedisConnectionError() = true → RedisUnavailableError → catch block
這會進 file lock fallback

Lock competition（被別人佔住）

set() return null → retry loop → timeout → throw Error("Lock acquisition timeout")
不在 catch 範圍內，直接 throw，write 失敗

為什麼 Fail Closed 在這個系統裡不是更好的選擇

Fail closed = Redis 瞬斷時放棄這次寫入（throw，不 fallback）。

代價對比：

方案	Redis 瞬斷時
Fail closed	寫入放棄 → 資料沒進去 → 上游 retry 也失敗 → 系統全面停擺
File lock fallback	照常寫入 → 資料進去了 → 单机內協調正常

但 split-brain 真的會發生嗎？

讀完 code 以後，發現關鍵在這裡：

// store.ts runWithFileLock
const manager = await getRedisLockManager(this.config.dbPath);
if (manager) {
  // 在 Redis domain 內
  try {
    const release = await manager.acquire(...);
  } catch (err) {
    if (RedisUnavailableError) {
      // 只有在 Redis 瞬斷且這個 process 的 manager != null 時才進這裡
    }
  }
}
// File lock fallback 在 if (manager) block 外面

Split-brain 發生的路徑：

Process A：getRedisLockManager() 返回 manager（Redis 可用時連上）
Process B：getRedisLockManager() 返回 manager（Redis 可用時連上）
Redis 瞬斷
Process A：acquire() 失敗 → fallback to file lock
Process B：acquire() 還在 retry 中，或也 fallback

但 Fail Closed 並不能完全解決 split-brain

Fail closed 只是「瞬斷的 process 放棄寫入」，但其他正常的 process 還在 Redis domain。這時候：

有 process 在 Redis domain（正常運作）
有 process fail closed（放棄寫入）
還是 split-brain，只是變成「有 process 放棄」而非「有 process fallback」

Fail closed 唯一的優點：不會有「同時用兩種鎖」的不可預期行為。

建議的處理方式

方案	行為	評估
Fail closed（你建議的）	瞬斷時放棄寫入	看起來安全，但代價是「瞬斷 = 全面停擺」，沒有任何額外保護
文件說明	不改 code，只說明「瞬斷時 fallback 是預期行為」	最輕量，但可能不接受
Process-level domain lock	第一次 fallback 後，process 內設定「此後只用 file lock」	複雜，需額外同步

需要確認的問題

想確認你的實際部署架構：

单机多 process 部署：file lock fallback 完全正確，協調沒問題。Fail closed 反而造成「不必要的寫入失敗」。
多機分散式部署：Redis 瞬斷時跨機器協調本來就會有問題，無論 fail closed 與否，跨機器寫入都會失敗。

了解實際部署架構，才能判斷 split-brain 的實際風險有多高？

app3apps · 2026-05-12T14:54:48Z

Closing this PR for now.

Redis-based distributed locking is a substantial optional architecture change, and the current branch still has failing CI. We should only take this complexity when there is a clear product requirement for multi-process/distributed writes and a green, narrowly scoped implementation. A future PR can revisit this with that requirement and validation in place.

chatgpt-codex-connector Bot reviewed May 9, 2026

View reviewed changes

This was referenced May 10, 2026

feat(redis-lock): 分散式 Redis lock 實作，支援 file-lock fallback (PR739 rebuild) #781

Closed

[perf] plugin init 時 validateStoragePath sync blocking（Issue #763 follow-up） #795

Open

jlin53882 mentioned this pull request May 10, 2026

fix(store): non-blocking validateStoragePath async background init (Issue #795) #796

Closed

rwmjhb closed this May 12, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(redis-lock): 分散式 Redis lock 實作，store.ts 整合 + file-lock fallback#788

feat(redis-lock): 分散式 Redis lock 實作，store.ts 整合 + file-lock fallback#788
jlin53882 wants to merge 3 commits into
CortexReach:masterfrom
jlin53882:james/pr739-redis-lock-clean

jlin53882 commented May 9, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 9, 2026

Uh oh!

chatgpt-codex-connector Bot May 9, 2026

Uh oh!

chatgpt-codex-connector Bot May 9, 2026

Uh oh!

TurboTheTurtle commented May 10, 2026

Uh oh!

jlin53882 commented May 10, 2026

Uh oh!

jlin53882 commented May 10, 2026

Uh oh!

TurboTheTurtle commented May 11, 2026

Uh oh!

jlin53882 commented May 11, 2026

Uh oh!

app3apps commented May 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

jlin53882 commented May 9, 2026

PR 739 — Redis 分散式 Lock 實作

目標

實作方式

Option E: Redis lock 優先 + file lock fallback

修復的問題

變更檔案

測試

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 9, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot May 9, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot May 9, 2026

Choose a reason for hiding this comment

Uh oh!

TurboTheTurtle commented May 10, 2026

Uh oh!

jlin53882 commented May 10, 2026

R1/R2/R3 修復說明

R2 (CRITICAL) — Singleton 忽略 dbPath ✅

R3 (HIGH) — Init Race Condition ✅

R1 (MEDIUM) — release() 錯誤導致 mid-operation exception ✅

測試驗證

Uh oh!

jlin53882 commented May 10, 2026

RED #1 追蹤更新

Uh oh!

TurboTheTurtle commented May 11, 2026

Uh oh!

jlin53882 commented May 11, 2026

核心問題：Fail Closed vs Fallback 的抉擇

acquire() 失敗的兩種實際場景

為什麼 Fail Closed 在這個系統裡不是更好的選擇

但 Fail Closed 並不能完全解決 split-brain

建議的處理方式

需要確認的問題

Uh oh!

app3apps commented May 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

`acquire()` 失敗的兩種實際場景