fix(retriever): AbortSignal threading + embedQuery BM25 fallback (PR668/PR708 clean rebuild)#783
fix(retriever): AbortSignal threading + embedQuery BM25 fallback (PR668/PR708 clean rebuild)#783jlin53882 wants to merge 4 commits into
Conversation
PR 739 重建(原本 PR 有 merge conflict): - 新增 src/redis-lock.ts(340行) - dynamic import 避免 ioredis 未安裝時 crash - RedisUnavailableError + Symbol.for marker(跨 module ESM-safe) - isRedisConnectionError 含 wrapped error 遞迴檢查 - parseRedisUrl() 正確解析 URL(含密碼、DB selection) - RedisLockManager + createRedisLockManager 工廠 - dbPath hash namespace 避免跨 store lock interference - 修改 src/store.ts(+50行) - import RedisLockManager/createRedisLockManager/RedisUnavailableError - runWithFileLock() 新增 Redis lock 優先實作 - runWithFileLockCore() 保留既有 file lock 實作 - getRedisLockManager() 模組層級 cache(Option E) - 修正 CortexReach#670:realpath:false 避免 stale lock cleanup 造成 ENOENT - 新增 3 個測試檔(40 個測試,全部通過) - test/redis-lock-error-types.test.mjs - test/redis-lock-url-parse.test.mjs - test/redis-lock-concurrent-init.test.mjs - package.json:新增 ioredis devDependency 注意:batch accumulator(CortexReach#690)完整保留,不受影響。
- Add signal?: AbortSignal parameter to vectorOnlyRetrieval and hybridRetrieval - Pass signal to embedQuery(query, signal) for true cancellation - Remove useLightweightAutoRecall (replaced by James's design: embed fails → BM25 fallback) - Add try/catch in embedQuery calls: if embed fails, fallback to BM25-only retrieval - Fix retrieveWithTrace() missing mode/diagnostics declarations - retrieve() and retrieveWithTrace() no longer force BM25 for auto-recall; uses normal mode with signal cancellation + fallback on error
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a6ba24558d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| this.redis = new Redis({ | ||
| host: parsed.host, | ||
| port: parsed.port, | ||
| db: parsed.db, | ||
| lazyConnect: true, |
There was a problem hiding this comment.
Preserve Redis URL auth and TLS options
When redisUrl points at an authenticated or TLS Redis instance, e.g. redis://:password@host:6379/1 or rediss://..., parseRedisUrl drops the password/protocol and the client is constructed with only host/port/db. That makes managed Redis deployments fail with authentication/TLS errors even though these URL forms are accepted by the API/tests. Pass through password/username and TLS options (or hand the URL directly to ioredis) instead of discarding them.
Useful? React with 👍 / 👎.
| constructor(private readonly config?: LockConfig) { | ||
| // Issue 4 fix: namespace key with dbPath hash,避免不同 dbPath 的 store 互相 blocking | ||
| this._lockNamespace = config?.dbPath ? hashString(config.dbPath) : "default"; |
There was a problem hiding this comment.
Honor LockConfig timing values
In scenarios that pass custom ttl, maxWait, or retryDelay to RedisLockManager, those values are never copied from config, so acquire() always uses the hard-coded 60s TTL/wait and 100ms retry delay. This means callers/tests using maxWait: 500 to fail fast on lock contention can still block for up to a minute, and custom lock TTLs are ignored.
Useful? React with 👍 / 👎.
Summary
Rebuild of PRs #668 and #708 with cleaner scope — fixing the "stuck session" issue (embedQuery HTTP call holds lock after timeout) and adding BM25 fallback when embedding fails.
Problem
PR #668: Session lock held after timeout
Auto-recall uses — when timeout resolves after 5s, the hook returns but the in-flight embedQuery HTTP call keeps running. This holds the per-agent memory-runtime mutex for minutes.
PR #708: Auto-recall always uses BM25 (quality regression)
PR #708 tried to fix the above by forcing auto-recall to BM25 mode (no embedding, no network). But this changes the default behavior — manual/CLI retrieval still uses hybrid, while auto-recall always uses BM25. Quality regression for normal auto-recall scenarios.
Solution (James's design)
Instead of always using BM25 for auto-recall, use BM25 as a fallback when embedding fails:
Changes
Key behaviors
Comparison