Skip to content

fix: block recursive BrainLayer ingest#287

Merged
EtanHey merged 15 commits into
mainfrom
fix/urgent-audit-recursion-ingest-guard
May 16, 2026
Merged

fix: block recursive BrainLayer ingest#287
EtanHey merged 15 commits into
mainfrom
fix/urgent-audit-recursion-ingest-guard

Conversation

@EtanHey
Copy link
Copy Markdown
Owner

@EtanHey EtanHey commented May 16, 2026

Summary

  • Add a shared ingestion guard for recursive BrainLayer MCP output signatures: ┌─ brain_search:, JSON-RPC 2.0 payloads, and MCP BrainLayer Memory: Invalid JSON-RPC message.
  • Wire the guard into watcher ingestion, queue drain paths, direct brain_store, arbitrated store queueing, and vector upserts.
  • Fold in PR fix: filter audit recursion from default search #277-style search-side audit filtering with include_audit opt-in, plus content-signature filtering for existing recursive MCP output.
  • Strengthen PreCompact filtering so default search excludes checkpoint-looking content even if chunk_origin backfill missed it.

Test plan

  • pytest tests/test_ingest_guard.py -q -> 5 passed
  • pytest tests/test_audit_recursion_filter.py tests/test_ingest_guard.py -q -> 12 passed
  • pytest tests/test_precompact_chunk_origin.py tests/test_search_exact_chunk_id.py tests/test_audit_recursion_filter.py tests/test_ingest_guard.py -q -> 47 passed
  • pytest tests/test_ingest_guard.py tests/test_audit_recursion_filter.py tests/test_precompact_chunk_origin.py tests/test_search_exact_chunk_id.py tests/test_arbitration.py tests/test_watcher_bridge.py tests/test_3tool_aliases.py -q -> 126 passed
  • ruff check ... -> All checks passed
  • python3 -m py_compile src/brainlayer/ingest_guard.py src/brainlayer/search_repo.py src/brainlayer/mcp/search_handler.py src/brainlayer/vector_store.py -> passed
  • pytest -q --ignore=tests/regression/test_drift_detection.py --ignore=tests/test_eval_framework.py --ignore=tests/test_eval_baselines.py --ignore=tests/test_vector_store.py -> 1970 passed, 8 skipped, 1 xfailed, 100 warnings
  • Pre-push gate -> 1926 passed, 9 skipped, 75 deselected, 1 xfailed, 102 warnings; MCP registration 3 passed; isolated eval/hook routing 32 passed; Bun 1 pass; FTS5 determinism PASS

Notes

  • Full unfiltered local pytest -q is blocked by local optional eval environment and live DB assertions: missing deepchecks, numba incompatible with NumPy 2.4 under system Python 3.13, plus live DB/eval baseline drift. The venv pre-push gate passed.
  • No live DB backfill or mutation was run for this PR.

Note

Medium Risk
Changes affect core ingest + retrieval paths (queue drain, watcher, vector upserts, MCP tools) and alter default search/recall behavior, so regressions could hide expected results or drop writes if heuristics misclassify content.

Overview
Prevents BrainLayer from re-ingesting its own MCP/tool output by adding ingest_guard.py (detect + reject recursive brain_* box output and JSON-RPC signatures) and wiring it into all write paths (brain_store, drain/apply handlers, watcher bridge, VectorStore.upsert_chunks/update_chunk, etc.).

Adds an include_audit opt-in (default false) across engine (think/recall), MCP handlers (brain_search/brain_recall/brain_entity), KG queries, context expansion, and SearchMixin so audit/eval + recursive MCP-output chunks are excluded from search/KG/context results by default.

Strengthens default PreCompact checkpoint exclusion to be content-based (not just chunk_origin) and improves KNN overfetch/caching to avoid starving normal results when many filtered (checkpoint/audit) neighbors exist; includes extensive new tests for these filters and ingest guards.

Reviewed by Cursor Bugbot for commit 25cbf6d. Bugbot is set up for automated code reviews on this repo. Configure here.

Summary by CodeRabbit

  • New Features

    • Added an opt-in "include_audit" flag to searches and recalls to allow returning audit/audit-recursion results when requested.
  • Bug Fixes

    • Added write- and ingest-time guards to reject recursive agent/tool outputs and prevent them from being stored or reprocessed.
    • Default search/recall now excludes audit-recursion content.
  • Tests

    • Added extensive tests covering audit-recursion filtering, ingest guards, and related search behaviors.

Review Change Stack

Note

Block recursive BrainLayer MCP output from being ingested or returned in search results

  • Adds recursive_mcp_output_reason in ingest_guard.py to detect content that looks like BrainLayer MCP output (framed boxes, JSON-RPC markers, etc.), and uses it to reject writes at every ingestion entry point: store_memory, upsert_chunks, update_chunk, drain handlers, watcher bridge filters, and the arbitrated store handler.
  • Adds an include_audit flag (default False) threaded through all search paths — vector, FTS, binary KNN, hybrid, KG facts, entity evidence, and context expansion — so audit/recursive content is excluded from results by default.
  • Exposes include_audit as a boolean tool parameter in the brain_search and brain_digest MCP schemas, forwarded through the full call stack to the underlying store and engine methods.
  • Extends checkpoint content filtering to also cover precompact/session-restore-like content identified by content heuristics, not just chunk_origin.
  • Risk: upsert_chunks now silently drops individual invalid chunks from a batch and raises ValueError only when the entire batch is rejected; callers that relied on all-or-nothing batch behavior are affected.
📊 Macroscope summarized 25cbf6d. 6 files reviewed, 1 issue evaluated, 0 issues filtered, 1 comment posted

🗂️ Filtered Issues

Copy link
Copy Markdown

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

@EtanHey
Copy link
Copy Markdown
Owner Author

EtanHey commented May 16, 2026

@coderabbitai review

@EtanHey
Copy link
Copy Markdown
Owner Author

EtanHey commented May 16, 2026

@codex review

@EtanHey
Copy link
Copy Markdown
Owner Author

EtanHey commented May 16, 2026

@cursor @BugBot review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 16, 2026

📝 Walkthrough

Walkthrough

Adds detection of recursive MCP outputs, rejects them at ingestion points, and threads an opt-in include_audit flag through engine/MCP/search layers to exclude or include audit-recursion results at read time.

Changes

Audit Recursion Prevention: Detection, Rejection, and Filtering

Layer / File(s) Summary
Recursive MCP output detection
src/brainlayer/ingest_guard.py
New module detects recursive MCP outputs and provides recursive_mcp_output_reason() and reject_recursive_mcp_output() for classification and rejection.
Write-side rejection guards
src/brainlayer/store.py, src/brainlayer/vector_store.py, src/brainlayer/watcher_bridge.py, src/brainlayer/drain.py, src/brainlayer/mcp/store_handler.py
Detection is enforced at ingestion: store_memory() calls rejection, VectorStore.upsert_chunks rejects per-chunk, watcher/drain skip matching items, and MCP _store invokes rejection when arbitrated. VectorStore init and cache invalidation were extended for audit-recursion accounting.
Search repository filtering
src/brainlayer/search_repo.py, src/brainlayer/kg_repo.py
Adds include_audit to search APIs, builds SQL predicates to exclude audit-recursion-tagged/content chunks by default, includes the flag in hybrid cache keys, adjusts KNN overfetch sizing, and applies post-RRF filtering when include_audit=False. KG facts queries also support audit exclusion.
MCP handler and engine wiring
src/brainlayer/mcp/search_handler.py, src/brainlayer/mcp/__init__.py, src/brainlayer/engine.py
Threads include_audit through MCP handlers and public engine entrypoints (think, recall); MCP tool schemas expose the include_audit input and routing forwards the flag into _brain_recall/_search paths. _exact_chunk_lookup_result() behavior standardized with an empty-result helper.
Guard and filtering tests
tests/test_ingest_guard.py, tests/test_audit_recursion_filter.py, tests/test_precompact_chunk_origin.py, tests/test_search_exact_chunk_id.py, tests/test_phase6_critical.py
Adds and updates tests covering ingestion rejection, hybrid/vector search exclusion/inclusion, substring-safety, legacy JSON-tag fallback, overfetch behavior, KG facts filtering, exact-chunk lookup empty-result semantics, and engine wiring of include_audit.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • EtanHey/brainlayer#79: Both PRs modify src/brainlayer/search_repo.py's hybrid search cache keying and related scoping logic.

"I nibble patterns in the hay,
I guard the memory all day,
No echo back to haunt the stream,
Clean thoughts hop in every dream.
🐇🌿"

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 41.75% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'fix: block recursive BrainLayer ingest' directly and clearly describes the main change: preventing recursive BrainLayer MCP output from being re-ingested through write-side guards.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/urgent-audit-recursion-ingest-guard

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@cursor
Copy link
Copy Markdown

cursor Bot commented May 16, 2026

You need to increase your spend limit or enable usage-based billing to run background agents. Go to Cursor

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 16, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a14c082639

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/brainlayer/search_repo.py Outdated
recursive_content_filter = (
f"LTRIM({content_expr}) NOT LIKE '┌─ brain_search:%' "
f"AND LOWER({content_expr}) NOT LIKE '%mcp brainlayer memory: invalid json-rpc message%' "
f"AND REPLACE(LOWER({content_expr}), ' ', '') NOT LIKE '%\"jsonrpc\":\"2.0\"%'"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Normalize all JSON-RPC whitespace before filtering

When existing recursive MCP output contains a formatted JSON-RPC payload such as {"jsonrpc" :\n "2.0"}, the write guard would classify it as recursive because it uses \s*, but this SQL predicate only strips literal spaces before matching. The direct VectorStore.search() text/vector paths rely on this SQL filter without the hybrid post-filter, so those rows still appear in default search results until include_audit is requested; normalize tabs/newlines as well or reuse an equivalent predicate here.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/brainlayer/mcp/search_handler.py`:
- Around line 478-491: The explicit chunk_id path is applying checkpoint
filtering but ignores the include_audit flag and can return audit-recursive
content via _context even when include_audit is False; update the chunk_id
branch in search_handler.py to also check include_audit by inspecting the
fetched chunk (via _get_vector_store() and store.get_chunk(chunk_id)) and, if
the chunk is an audit/recursive chunk and include_audit is False, return the
same "No results found." empty payload (or otherwise prevent calling _context) —
mirror the same predicate logic used for checkpoint filtering (use
CHUNK_ORIGIN_PRECOMPACT_CHECKPOINT, is_precompact_checkpoint_content, and the
include_checkpoints handling) and only call await _context(chunk_id=chunk_id,
before=before, after=after) when include_audit allows it.

In `@src/brainlayer/search_repo.py`:
- Around line 209-223: The _audit_recursion_count() currently runs a full
COUNT(*) and returns 0 on any DB error which hides SQLITE_BUSY contention and
hurts recall/latency; update _audit_recursion_count to retry when catching
apsw.Error that indicates SQLITE_BUSY (with an exponential backoff and a small
max retries), ensure each retry uses the caller's/read connection via
_read_cursor(), and on repeated failures fall back to returning a cached
"data-version" count value (maintained elsewhere or via a lightweight cached
attribute) instead of zero so callers can make better overfetch decisions;
reference the _audit_recursion_count function and the
_audit_recursion_exclusion_sql helper when implementing the retry+cache
behavior.

In `@tests/test_ingest_guard.py`:
- Around line 35-49: The test opens a VectorStore without using a context
manager so an assertion failure can leak the DB handle; change the
test_direct_store_rejects_recursive_mcp_output (and the other similar test
around the same file) to open VectorStore via a context manager (e.g., with
VectorStore(...) as store:) or ensure a try/finally that always calls
store.close() so the APSW connection is closed even on assertion failures;
adjust usage locations where store is passed to store_memory to use the scoped
store variable.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 2b20103d-2837-4ce9-88bf-819f3afb269f

📥 Commits

Reviewing files that changed from the base of the PR and between a670747 and a14c082.

📒 Files selected for processing (14)
  • src/brainlayer/drain.py
  • src/brainlayer/engine.py
  • src/brainlayer/ingest_guard.py
  • src/brainlayer/mcp/__init__.py
  • src/brainlayer/mcp/search_handler.py
  • src/brainlayer/mcp/store_handler.py
  • src/brainlayer/search_repo.py
  • src/brainlayer/store.py
  • src/brainlayer/vector_store.py
  • src/brainlayer/watcher_bridge.py
  • tests/test_audit_recursion_filter.py
  • tests/test_ingest_guard.py
  • tests/test_precompact_chunk_origin.py
  • tests/test_search_exact_chunk_id.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Macroscope - Correctness Check
  • GitHub Check: Cursor Bugbot
  • GitHub Check: test (3.12)
  • GitHub Check: test (3.11)
  • GitHub Check: test (3.13)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Flag risky DB or concurrency changes explicitly and do not hand-wave lock behavior
Enforce one-write-at-a-time concurrency constraint; reads are safe but brain_digest is write-heavy and must not run in parallel with other MCP work
Run pytest before claiming behavior changed safely; current test suite has 929 tests

**/*.py: Use paths.py:get_db_path() for all database path resolution; all scripts and CLI must use this function rather than hardcoding paths
When performing bulk database operations: stop enrichment workers first, checkpoint WAL before and after, drop FTS triggers before bulk deletes, batch deletes in 5-10K chunks, and checkpoint every 3 batches

Files:

  • src/brainlayer/ingest_guard.py
  • tests/test_search_exact_chunk_id.py
  • src/brainlayer/watcher_bridge.py
  • src/brainlayer/engine.py
  • src/brainlayer/drain.py
  • src/brainlayer/mcp/__init__.py
  • src/brainlayer/vector_store.py
  • src/brainlayer/store.py
  • tests/test_precompact_chunk_origin.py
  • src/brainlayer/mcp/store_handler.py
  • tests/test_ingest_guard.py
  • tests/test_audit_recursion_filter.py
  • src/brainlayer/search_repo.py
  • src/brainlayer/mcp/search_handler.py
src/brainlayer/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/brainlayer/**/*.py: Use retry logic on SQLITE_BUSY errors; each worker must use its own database connection to handle concurrency safely
Classification must preserve ai_code, stack_trace, and user_message verbatim; skip noise entries entirely and summarize build_log and dir_listing entries (structure only)
Use AST-aware chunking via tree-sitter; never split stack traces; mask large tool output
For enrichment backend selection: use Groq as primary backend (cloud, configured in launchd plist), Gemini as fallback via enrichment_controller.py, and Ollama as offline last-resort; allow override via BRAINLAYER_ENRICH_BACKEND env var
Configure enrichment rate via BRAINLAYER_ENRICH_RATE environment variable (default 0.2 = 12 RPM)
Implement chunk lifecycle columns: superseded_by, aggregated_into, archived_at on chunks table; exclude lifecycle-managed chunks from default search; allow include_archived=True to show history
Implement brain_supersede with safety gate for personal data (journals, notes, health/finance); use soft-delete for brain_archive with timestamp
Add supersedes parameter to brain_store for atomic store-and-replace operations
Run linting and formatting with: ruff check src/ && ruff format src/
Run tests with pytest
Use PRAGMA wal_checkpoint(FULL) before and after bulk database operations to prevent WAL bloat

Files:

  • src/brainlayer/ingest_guard.py
  • src/brainlayer/watcher_bridge.py
  • src/brainlayer/engine.py
  • src/brainlayer/drain.py
  • src/brainlayer/mcp/__init__.py
  • src/brainlayer/vector_store.py
  • src/brainlayer/store.py
  • src/brainlayer/mcp/store_handler.py
  • src/brainlayer/search_repo.py
  • src/brainlayer/mcp/search_handler.py
🪛 OpenGrep (1.20.0)
tests/test_audit_recursion_filter.py

[ERROR] 120-120: SQL query built via f-string passed to execute()/executemany(). Use parameterized queries with placeholders instead.

(coderabbit.sql-injection.python-fstring-execute)

src/brainlayer/search_repo.py

[ERROR] 212-218: SQL query built via f-string passed to execute()/executemany(). Use parameterized queries with placeholders instead.

(coderabbit.sql-injection.python-fstring-execute)

🔇 Additional comments (14)
tests/test_audit_recursion_filter.py (1)

8-288: LGTM!

tests/test_precompact_chunk_origin.py (1)

341-377: LGTM!

tests/test_search_exact_chunk_id.py (1)

95-95: LGTM!

Also applies to: 109-110, 166-168

src/brainlayer/ingest_guard.py (1)

7-35: LGTM!

src/brainlayer/store.py (1)

44-45: LGTM!

Also applies to: 109-112

src/brainlayer/vector_store.py (1)

51-52: LGTM!

Also applies to: 117-118, 527-528, 1363-1365

src/brainlayer/watcher_bridge.py (1)

26-27: LGTM!

Also applies to: 131-133, 149-151

src/brainlayer/drain.py (1)

31-32: LGTM!

Also applies to: 163-167, 247-250, 282-285

src/brainlayer/mcp/store_handler.py (1)

504-505: LGTM!

Also applies to: 514-515

tests/test_ingest_guard.py (1)

1-33: LGTM!

Also applies to: 73-114

src/brainlayer/engine.py (1)

181-181: LGTM!

Also applies to: 210-210, 244-244, 284-284, 306-306

src/brainlayer/mcp/__init__.py (1)

514-518: LGTM!

Also applies to: 836-840, 1268-1268, 1357-1357

src/brainlayer/mcp/search_handler.py (1)

13-15: LGTM!

Also applies to: 43-45, 179-179, 190-195, 197-197, 202-205, 214-217, 220-225, 438-438, 495-500, 512-517, 547-547, 552-557, 570-573, 592-593, 639-639, 712-713, 910-910, 969-970, 1031-1032, 1093-1094, 1376-1381, 1397-1403, 1419-1424, 1446-1447

src/brainlayer/search_repo.py (1)

20-20: LGTM!

Also applies to: 44-49, 95-95, 117-117, 133-197, 202-207, 229-233, 262-266, 466-466, 528-530, 615-617, 829-830, 880-882, 1021-1082, 1098-1141, 1201-1203, 1350-1352

Comment thread src/brainlayer/mcp/search_handler.py Outdated
Comment thread src/brainlayer/search_repo.py Outdated
Comment thread tests/test_ingest_guard.py
Comment thread src/brainlayer/search_repo.py Outdated
Comment thread src/brainlayer/mcp/search_handler.py Outdated
Copy link
Copy Markdown

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 16ef4328dc

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/brainlayer/search_repo.py Outdated
Comment on lines 578 to 580
or (not include_audit and self._audit_recursion_count() > 0)
)
effective_k = self._effective_knn_k(n_results, bool(needs_overfetch), include_checkpoints)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Size audit overfetch by the filtered rows

When more than 1000 audit/recursive chunks are nearer than the first valid memory, default vector searches can still return empty or incomplete results: the audit predicate is applied after sqlite-vec chooses its k candidates, but this only flips needs_overfetch and _effective_knn_k caps that overfetch at 1000. This PR is meant to recover search quality from existing recursive pollution, so databases with heavy MCP recursion need the audit count added to k the same way checkpoint filtering does rather than just enabling the generic cap.

Useful? React with 👍 / 👎.

Comment on lines +232 to +235
audit_count = int(row[0]) if row else 0
setattr(self, "_audit_recursion_count_cache", audit_count)
setattr(self, "_audit_recursion_count_cache_data_version", current_data_version)
return audit_count
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Invalidate the new audit-count cache after writes

The new _audit_recursion_count_cache is populated here, but write paths only invalidate the checkpoint cache; PRAGMA data_version does not change for commits made on the same connection. If a store searches before audit-tagged rows exist and then ingests/imports them through the same VectorStore, subsequent vector searches keep using the cached zero count, skip audit overfetch, and can lose normal results behind filtered audit neighbors until the process reconnects.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

Copy link
Copy Markdown

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 585cd9d208

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

entity_name=entity_name,
project_filter=normalized_project,
include_checkpoints=include_checkpoints,
include_audit=include_audit,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Filter audit chunks from KG fact lookups

When a query takes the entity route, include_audit is only forwarded to kg_hybrid_search's chunk search here; the SQL fact lookup above still calls _kg_facts_sql(..., include_checkpoints=...), and kg_hybrid_search's kg_search also ignores include_audit. In databases where a KG relation has source_chunk_id pointing at an audit-tagged or recursive MCP-output chunk, default brain_search can still return that relation in the facts payload even though default search is meant to hide audit/recursive memories. Please plumb the audit predicate through the fact lookups or suppress facts sourced from filtered chunks unless include_audit=True.

Useful? React with 👍 / 👎.

Comment thread src/brainlayer/search_repo.py Outdated
Comment thread src/brainlayer/search_repo.py Outdated
Copy link
Copy Markdown

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c73f0138b3

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +521 to +523
not include_audit
and isinstance(chunk, dict)
and _is_audit_recursion_metadata({"tags": _parsed_chunk_tags(chunk)}, chunk.get("content"))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Filter recursive neighbors from chunk context

When brain_search is called with a normal chunk_id and nonzero before/after, this new audit guard only checks the target chunk before delegating to _context; store.get_context() then returns every surrounding turn without any audit/recursive predicate. In conversations where a nearby turn is a recursive BrainLayer/JSON-RPC output, default chunk-context lookup still emits the polluted content despite include_audit defaulting to false.

Useful? React with 👍 / 👎.

Comment on lines 383 to 384
AND (
r.source_chunk_id IS NULL
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Exclude content-detected checkpoints from KG facts

The commit now treats legacy checkpoint text as hidden in chunk search/exact lookup via is_precompact_checkpoint_content/_precompact_content_exclusion_sql, but the KG fact path still filters only chunk_origin. If an older relation points to a chunk whose content starts with [PreCompact checkpoint] but whose chunk_origin is missing or unknown, default entity searches still return that checkpoint-sourced fact; I verified the same leak through _kg_facts_sql and kg_hybrid_search with such a legacy source chunk.

Useful? React with 👍 / 👎.

Comment thread src/brainlayer/search_repo.py Outdated


def _precompact_content_exclusion_sql(content_expr: str) -> str:
normalized = f"LOWER(LTRIM(COALESCE(CAST({content_expr} AS TEXT), '')))"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent LTRIM strips only spaces, not all whitespace

Medium Severity

_precompact_content_exclusion_sql uses LTRIM(...) without a character set argument, which in SQLite only strips leading spaces. The corresponding Python function is_precompact_checkpoint_content uses .lstrip() which strips all whitespace (tabs, newlines, etc.), and the sibling SQL function _audit_recursion_exclusion_sql correctly uses LTRIM(..., char(9) || char(10) || char(11) || char(12) || char(13) || char(32)). Content with leading tabs or newlines before precompact markers would be caught by Python checks but leak through the SQL filter in search queries.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit c73f013. Configure here.

Copy link
Copy Markdown

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c7b18695d5

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +18 to +19
if stripped.startswith(_BRAIN_SEARCH_BOX_PREFIX):
return "brain_search_output"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reject non-search BrainLayer MCP boxes

When the watcher or brain_store sees formatted output from other BrainLayer MCP paths, such as format_entity_card()/format_kg_search() output beginning with ┌─ Entity: or ┌─ Entity search:, this guard returns None because it only recognizes the brain_search box and the JSON-RPC markers. In that scenario the recursive MCP response can still be ingested, and the default search filters use the same detector so existing rows with those boxes are not hidden either.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

@EtanHey
Copy link
Copy Markdown
Owner Author

EtanHey commented May 16, 2026

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 16, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 36dda5f696

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +30 to +31
if _JSONRPC_MESSAGE_RE.search(stripped):
return "jsonrpc_message"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Narrow the JSON-RPC guard to BrainLayer output

This rejects any content containing a JSON-RPC 2.0 field, not just recursive BrainLayer MCP output. When a user stores or indexes legitimate MCP/JSON-RPC notes or examples such as {"jsonrpc":"2.0","method":"tools/call"}, store_memory, watcher ingestion, and drain all call this guard and will raise or silently drop that memory, so BrainLayer loses valid technical recall for JSON-RPC work.

Useful? React with 👍 / 👎.

Comment thread src/brainlayer/search_repo.py Outdated
@EtanHey
Copy link
Copy Markdown
Owner Author

EtanHey commented May 16, 2026

@coderabbitai review

@EtanHey
Copy link
Copy Markdown
Owner Author

EtanHey commented May 16, 2026

@cursor @BugBot re-review

@EtanHey
Copy link
Copy Markdown
Owner Author

EtanHey commented May 16, 2026

@codex review

Copy link
Copy Markdown

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

@cursor
Copy link
Copy Markdown

cursor Bot commented May 16, 2026

You need to increase your spend limit or enable usage-based billing to run background agents. Go to Cursor

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 16, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3a0d745f8c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +27 to +28
folded = stripped.casefold()
if _INVALID_JSONRPC_MARKER in folded:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Restrict JSON-RPC rejection to BrainLayer output

When a legitimate memory contains a JSON-RPC 2.0 example or debugging payload, this unconditional regex match makes store_memory, upsert_chunks, watcher/drain ingestion, and updates reject or skip it as recursive MCP output even if it is just technical content. Please gate this on BrainLayer/MCP output markers or BrainLayer tool names rather than any "jsonrpc":"2.0" occurrence so valid protocol notes are not silently dropped.

Useful? React with 👍 / 👎.

Comment thread src/brainlayer/search_repo.py Outdated
audit_count = self._audit_recursion_count()
if audit_count <= 0:
return n_results
return min(n_results + audit_count, max(n_results, _FILTERED_KNN_MAX))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Avoid starving results after 2000 filtered memories

When more than ~2000 audit/recursive chunks are closer to the query than any normal chunk, this cap limits the KNN window to 2000, the SQL/post filters discard the whole window, and search can return no results even though matching non-audit memories exist just beyond it. Since this change is meant to protect retrieval from audit-recursion pollution, use a paged/iterative fallback or size the fetch past the filtered count instead of hard-capping below it.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

@EtanHey
Copy link
Copy Markdown
Owner Author

EtanHey commented May 16, 2026

@coderabbitai review

@EtanHey
Copy link
Copy Markdown
Owner Author

EtanHey commented May 16, 2026

@cursor @BugBot re-review

@EtanHey
Copy link
Copy Markdown
Owner Author

EtanHey commented May 16, 2026

@codex review

@cursor
Copy link
Copy Markdown

cursor Bot commented May 16, 2026

You need to increase your spend limit or enable usage-based billing to run background agents. Go to Cursor

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 16, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

return await _entity_handler(query=query, entity_type=entity_type)

P2 Badge Apply audit filtering to entity recall evidence

include_audit is now exposed on brain_recall, but the mode="entity" branch ignores it and delegates to _brain_entity, whose lookup still returns evidence via store.get_entity_chunks() without the new audit-recursion filters. In contexts where a recursive/audit chunk is linked to an entity, brain_recall(mode="entity") and smart-routed entity queries can still surface the polluted evidence by default, bypassing the filtering added for search/KG hybrid paths.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".


if filter_meta_noise and _contains_meta_noise(doc):
continue
if not include_audit and _is_audit_recursion_metadata(meta, doc):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Honor include_audit before meta-noise filtering

When callers set include_audit=True to inspect audit/recursive MCP-output memories, hybrid search still drops any candidate whose content matches META_NOISE_PATTERNS before reaching this new audit gate. For example, an audit-tagged chunk containing brain_search("...") is retrievable through the plain text search(..., include_audit=True) path but disappears from public brain_search/hybrid results, so the advertised opt-in cannot recover a class of recursive MCP outputs.

Useful? React with 👍 / 👎.

Comment thread src/brainlayer/search_repo.py Outdated
Copy link
Copy Markdown

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

@EtanHey
Copy link
Copy Markdown
Owner Author

EtanHey commented May 16, 2026

@codex review
@cursor @BugBot re-review
@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 16, 2026

@EtanHey: Triggering a full review of PR #287. I'll examine all the changed files including ingest_guard.py, the guard integration across drain.py, store.py, vector_store.py, watcher_bridge.py, and store_handler.py, the include_audit flag threading through search_repo.py, kg_repo.py, engine.py, mcp/search_handler.py, and mcp/__init__.py, as well as the new test suites.

✅ Actions performed

Full review triggered.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Low

elif name == "brain_recall":

When brain_recall is called with source_filter or correction_category parameters, they are silently dropped and not forwarded to _brain_recall(). This means filtered recall queries return unfiltered results, breaking the caller's intent. Consider adding source_filter and correction_category to the _brain_recall() call at lines 1333-1364.

🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file src/brainlayer/mcp/__init__.py around line 1333:

When `brain_recall` is called with `source_filter` or `correction_category` parameters, they are silently dropped and not forwarded to `_brain_recall()`. This means filtered recall queries return unfiltered results, breaking the caller's intent. Consider adding `source_filter` and `correction_category` to the `_brain_recall()` call at lines 1333-1364.

Evidence trail:
src/brainlayer/mcp/__init__.py lines 1333-1364 (brain_recall dispatcher - no source_filter or correction_category); src/brainlayer/mcp/__init__.py lines 1242-1275 (brain_search dispatcher - passes source_filter line 1270, correction_category line 1271); src/brainlayer/mcp/search_handler.py lines 923-954 (_brain_recall signature - accepts source_filter line 951, correction_category line 952); src/brainlayer/mcp/__init__.py lines 689-843 (brain_recall tool schema - no source_filter or correction_category properties)

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 25cbf6dc3a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

store: VectorStore,
embed_fn: Callable[[str], List[float]],
entity_type: Optional[str] = None,
include_audit: bool = False,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Filter audit-sourced entity relations

With the new default include_audit=False contract on entity lookups, brain_entity still returns relations that were created from audit/recursive chunks because entity_lookup() only passes the flag to get_entity_chunks() while relations_raw = store.get_entity_relations(entity_id) remains unfiltered by source_chunk_id. In a KG where an audit-tagged chunk added a relation, looking up that entity now hides the audit evidence but still displays the audit-sourced relationship, so the default entity path remains polluted despite the opt-in flag.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 4 total unresolved issues (including 3 from previous reviews).

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 25cbf6d. Configure here.

cursor.execute(
"""
SELECT conversation_id, position, content, metadata, content_type
SELECT conversation_id, position, content, metadata, content_type, tags, chunk_origin
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_context unconditionally queries missing chunk_origin column

High Severity

The get_context method now unconditionally selects chunk_origin from the chunks table in both the target chunk query and the surrounding chunks query. However, this column may not exist on old readonly databases where _has_chunk_origin is False. This will cause a SQL error at runtime. Other methods like get_chunk handle this correctly by conditionally including the column (e.g., chunk_origin_select = ", chunk_origin" if has_chunk_origin else "").

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 25cbf6d. Configure here.

@EtanHey EtanHey merged commit bd27412 into main May 16, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant