Skip to content

fix(retriever): AbortSignal threading + embedQuery BM25 fallback (PR668/PR708 clean rebuild)#783

Closed
jlin53882 wants to merge 4 commits into
CortexReach:masterfrom
jlin53882:james/pr746-clean-v2
Closed

fix(retriever): AbortSignal threading + embedQuery BM25 fallback (PR668/PR708 clean rebuild)#783
jlin53882 wants to merge 4 commits into
CortexReach:masterfrom
jlin53882:james/pr746-clean-v2

Conversation

@jlin53882
Copy link
Copy Markdown
Contributor

Summary

Rebuild of PRs #668 and #708 with cleaner scope — fixing the "stuck session" issue (embedQuery HTTP call holds lock after timeout) and adding BM25 fallback when embedding fails.

Problem

PR #668: Session lock held after timeout

Auto-recall uses — when timeout resolves after 5s, the hook returns but the in-flight embedQuery HTTP call keeps running. This holds the per-agent memory-runtime mutex for minutes.

PR #708: Auto-recall always uses BM25 (quality regression)

PR #708 tried to fix the above by forcing auto-recall to BM25 mode (no embedding, no network). But this changes the default behavior — manual/CLI retrieval still uses hybrid, while auto-recall always uses BM25. Quality regression for normal auto-recall scenarios.

Solution (James's design)

Instead of always using BM25 for auto-recall, use BM25 as a fallback when embedding fails:

Changes

File Change
Add to
Add param to +
Pass to
Add try/catch in embedQuery: catch → BM25 fallback
Remove (replaced by fallback design)
Fix missing mode/diagnostics declarations
Unit tests for signal propagation + BM25 fallback
Register test in

Key behaviors

  1. Signal threading: flows from → → HTTP request is cancelled on abort
  2. BM25 fallback: If throws (timeout, network error, API error), / catch the error and try BM25 instead. Only re-throws if BM25 also returns no results.
  3. No behavior change for normal cases: If embedQuery completes normally, retrieval works exactly as before

Comparison

PR #668 (original) PR #708 This PR
Cancel embed on timeout
Auto-recall always BM25
Embed fail → BM25 fallback
Backward compatible
Scope index.ts + retriever.ts retriever.ts only retriever.ts only

jlin53882 added 4 commits May 10, 2026 00:21
PR 739 重建(原本 PR 有 merge conflict):
- 新增 src/redis-lock.ts(340行)
  - dynamic import 避免 ioredis 未安裝時 crash
  - RedisUnavailableError + Symbol.for marker(跨 module ESM-safe)
  - isRedisConnectionError 含 wrapped error 遞迴檢查
  - parseRedisUrl() 正確解析 URL(含密碼、DB selection)
  - RedisLockManager + createRedisLockManager 工廠
  - dbPath hash namespace 避免跨 store lock interference

- 修改 src/store.ts(+50行)
  - import RedisLockManager/createRedisLockManager/RedisUnavailableError
  - runWithFileLock() 新增 Redis lock 優先實作
  - runWithFileLockCore() 保留既有 file lock 實作
  - getRedisLockManager() 模組層級 cache(Option E)
  - 修正 CortexReach#670:realpath:false 避免 stale lock cleanup 造成 ENOENT

- 新增 3 個測試檔(40 個測試,全部通過)
  - test/redis-lock-error-types.test.mjs
  - test/redis-lock-url-parse.test.mjs
  - test/redis-lock-concurrent-init.test.mjs

- package.json:新增 ioredis devDependency

注意:batch accumulator(CortexReach#690)完整保留,不受影響。
- Add signal?: AbortSignal parameter to vectorOnlyRetrieval and hybridRetrieval
- Pass signal to embedQuery(query, signal) for true cancellation
- Remove useLightweightAutoRecall (replaced by James's design: embed fails → BM25 fallback)
- Add try/catch in embedQuery calls: if embed fails, fallback to BM25-only retrieval
- Fix retrieveWithTrace() missing mode/diagnostics declarations
- retrieve() and retrieveWithTrace() no longer force BM25 for auto-recall; uses normal mode with signal cancellation + fallback on error
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a6ba24558d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/redis-lock.ts
Comment on lines +151 to +155
this.redis = new Redis({
host: parsed.host,
port: parsed.port,
db: parsed.db,
lazyConnect: true,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve Redis URL auth and TLS options

When redisUrl points at an authenticated or TLS Redis instance, e.g. redis://:password@host:6379/1 or rediss://..., parseRedisUrl drops the password/protocol and the client is constructed with only host/port/db. That makes managed Redis deployments fail with authentication/TLS errors even though these URL forms are accepted by the API/tests. Pass through password/username and TLS options (or hand the URL directly to ioredis) instead of discarding them.

Useful? React with 👍 / 👎.

Comment thread src/redis-lock.ts
Comment on lines +133 to +135
constructor(private readonly config?: LockConfig) {
// Issue 4 fix: namespace key with dbPath hash,避免不同 dbPath 的 store 互相 blocking
this._lockNamespace = config?.dbPath ? hashString(config.dbPath) : "default";
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Honor LockConfig timing values

In scenarios that pass custom ttl, maxWait, or retryDelay to RedisLockManager, those values are never copied from config, so acquire() always uses the hard-coded 60s TTL/wait and 100ms retry delay. This means callers/tests using maxWait: 500 to fail fast on lock contention can still block for up to a minute, and custom lock TTLs are ignored.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant