fix: block recursive BrainLayer ingest#287
Conversation
There was a problem hiding this comment.
Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.
|
@coderabbitai review |
|
@codex review |
📝 WalkthroughWalkthroughAdds detection of recursive MCP outputs, rejects them at ingestion points, and threads an opt-in include_audit flag through engine/MCP/search layers to exclude or include audit-recursion results at read time. ChangesAudit Recursion Prevention: Detection, Rejection, and Filtering
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
You need to increase your spend limit or enable usage-based billing to run background agents. Go to Cursor |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a14c082639
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| recursive_content_filter = ( | ||
| f"LTRIM({content_expr}) NOT LIKE '┌─ brain_search:%' " | ||
| f"AND LOWER({content_expr}) NOT LIKE '%mcp brainlayer memory: invalid json-rpc message%' " | ||
| f"AND REPLACE(LOWER({content_expr}), ' ', '') NOT LIKE '%\"jsonrpc\":\"2.0\"%'" |
There was a problem hiding this comment.
Normalize all JSON-RPC whitespace before filtering
When existing recursive MCP output contains a formatted JSON-RPC payload such as {"jsonrpc" :\n "2.0"}, the write guard would classify it as recursive because it uses \s*, but this SQL predicate only strips literal spaces before matching. The direct VectorStore.search() text/vector paths rely on this SQL filter without the hybrid post-filter, so those rows still appear in default search results until include_audit is requested; normalize tabs/newlines as well or reuse an equivalent predicate here.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/brainlayer/mcp/search_handler.py`:
- Around line 478-491: The explicit chunk_id path is applying checkpoint
filtering but ignores the include_audit flag and can return audit-recursive
content via _context even when include_audit is False; update the chunk_id
branch in search_handler.py to also check include_audit by inspecting the
fetched chunk (via _get_vector_store() and store.get_chunk(chunk_id)) and, if
the chunk is an audit/recursive chunk and include_audit is False, return the
same "No results found." empty payload (or otherwise prevent calling _context) —
mirror the same predicate logic used for checkpoint filtering (use
CHUNK_ORIGIN_PRECOMPACT_CHECKPOINT, is_precompact_checkpoint_content, and the
include_checkpoints handling) and only call await _context(chunk_id=chunk_id,
before=before, after=after) when include_audit allows it.
In `@src/brainlayer/search_repo.py`:
- Around line 209-223: The _audit_recursion_count() currently runs a full
COUNT(*) and returns 0 on any DB error which hides SQLITE_BUSY contention and
hurts recall/latency; update _audit_recursion_count to retry when catching
apsw.Error that indicates SQLITE_BUSY (with an exponential backoff and a small
max retries), ensure each retry uses the caller's/read connection via
_read_cursor(), and on repeated failures fall back to returning a cached
"data-version" count value (maintained elsewhere or via a lightweight cached
attribute) instead of zero so callers can make better overfetch decisions;
reference the _audit_recursion_count function and the
_audit_recursion_exclusion_sql helper when implementing the retry+cache
behavior.
In `@tests/test_ingest_guard.py`:
- Around line 35-49: The test opens a VectorStore without using a context
manager so an assertion failure can leak the DB handle; change the
test_direct_store_rejects_recursive_mcp_output (and the other similar test
around the same file) to open VectorStore via a context manager (e.g., with
VectorStore(...) as store:) or ensure a try/finally that always calls
store.close() so the APSW connection is closed even on assertion failures;
adjust usage locations where store is passed to store_memory to use the scoped
store variable.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 2b20103d-2837-4ce9-88bf-819f3afb269f
📒 Files selected for processing (14)
src/brainlayer/drain.pysrc/brainlayer/engine.pysrc/brainlayer/ingest_guard.pysrc/brainlayer/mcp/__init__.pysrc/brainlayer/mcp/search_handler.pysrc/brainlayer/mcp/store_handler.pysrc/brainlayer/search_repo.pysrc/brainlayer/store.pysrc/brainlayer/vector_store.pysrc/brainlayer/watcher_bridge.pytests/test_audit_recursion_filter.pytests/test_ingest_guard.pytests/test_precompact_chunk_origin.pytests/test_search_exact_chunk_id.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Macroscope - Correctness Check
- GitHub Check: Cursor Bugbot
- GitHub Check: test (3.12)
- GitHub Check: test (3.11)
- GitHub Check: test (3.13)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
**/*.py: Flag risky DB or concurrency changes explicitly and do not hand-wave lock behavior
Enforce one-write-at-a-time concurrency constraint; reads are safe but brain_digest is write-heavy and must not run in parallel with other MCP work
Run pytest before claiming behavior changed safely; current test suite has 929 tests
**/*.py: Usepaths.py:get_db_path()for all database path resolution; all scripts and CLI must use this function rather than hardcoding paths
When performing bulk database operations: stop enrichment workers first, checkpoint WAL before and after, drop FTS triggers before bulk deletes, batch deletes in 5-10K chunks, and checkpoint every 3 batches
Files:
src/brainlayer/ingest_guard.pytests/test_search_exact_chunk_id.pysrc/brainlayer/watcher_bridge.pysrc/brainlayer/engine.pysrc/brainlayer/drain.pysrc/brainlayer/mcp/__init__.pysrc/brainlayer/vector_store.pysrc/brainlayer/store.pytests/test_precompact_chunk_origin.pysrc/brainlayer/mcp/store_handler.pytests/test_ingest_guard.pytests/test_audit_recursion_filter.pysrc/brainlayer/search_repo.pysrc/brainlayer/mcp/search_handler.py
src/brainlayer/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/brainlayer/**/*.py: Use retry logic onSQLITE_BUSYerrors; each worker must use its own database connection to handle concurrency safely
Classification must preserveai_code,stack_trace, anduser_messageverbatim; skipnoiseentries entirely and summarizebuild_loganddir_listingentries (structure only)
Use AST-aware chunking via tree-sitter; never split stack traces; mask large tool output
For enrichment backend selection: use Groq as primary backend (cloud, configured in launchd plist), Gemini as fallback viaenrichment_controller.py, and Ollama as offline last-resort; allow override viaBRAINLAYER_ENRICH_BACKENDenv var
Configure enrichment rate viaBRAINLAYER_ENRICH_RATEenvironment variable (default 0.2 = 12 RPM)
Implement chunk lifecycle columns:superseded_by,aggregated_into,archived_aton chunks table; exclude lifecycle-managed chunks from default search; allowinclude_archived=Trueto show history
Implementbrain_supersedewith safety gate for personal data (journals, notes, health/finance); use soft-delete forbrain_archivewith timestamp
Addsupersedesparameter tobrain_storefor atomic store-and-replace operations
Run linting and formatting with:ruff check src/ && ruff format src/
Run tests withpytest
UsePRAGMA wal_checkpoint(FULL)before and after bulk database operations to prevent WAL bloat
Files:
src/brainlayer/ingest_guard.pysrc/brainlayer/watcher_bridge.pysrc/brainlayer/engine.pysrc/brainlayer/drain.pysrc/brainlayer/mcp/__init__.pysrc/brainlayer/vector_store.pysrc/brainlayer/store.pysrc/brainlayer/mcp/store_handler.pysrc/brainlayer/search_repo.pysrc/brainlayer/mcp/search_handler.py
🪛 OpenGrep (1.20.0)
tests/test_audit_recursion_filter.py
[ERROR] 120-120: SQL query built via f-string passed to execute()/executemany(). Use parameterized queries with placeholders instead.
(coderabbit.sql-injection.python-fstring-execute)
src/brainlayer/search_repo.py
[ERROR] 212-218: SQL query built via f-string passed to execute()/executemany(). Use parameterized queries with placeholders instead.
(coderabbit.sql-injection.python-fstring-execute)
🔇 Additional comments (14)
tests/test_audit_recursion_filter.py (1)
8-288: LGTM!tests/test_precompact_chunk_origin.py (1)
341-377: LGTM!tests/test_search_exact_chunk_id.py (1)
95-95: LGTM!Also applies to: 109-110, 166-168
src/brainlayer/ingest_guard.py (1)
7-35: LGTM!src/brainlayer/store.py (1)
44-45: LGTM!Also applies to: 109-112
src/brainlayer/vector_store.py (1)
51-52: LGTM!Also applies to: 117-118, 527-528, 1363-1365
src/brainlayer/watcher_bridge.py (1)
26-27: LGTM!Also applies to: 131-133, 149-151
src/brainlayer/drain.py (1)
31-32: LGTM!Also applies to: 163-167, 247-250, 282-285
src/brainlayer/mcp/store_handler.py (1)
504-505: LGTM!Also applies to: 514-515
tests/test_ingest_guard.py (1)
1-33: LGTM!Also applies to: 73-114
src/brainlayer/engine.py (1)
181-181: LGTM!Also applies to: 210-210, 244-244, 284-284, 306-306
src/brainlayer/mcp/__init__.py (1)
514-518: LGTM!Also applies to: 836-840, 1268-1268, 1357-1357
src/brainlayer/mcp/search_handler.py (1)
13-15: LGTM!Also applies to: 43-45, 179-179, 190-195, 197-197, 202-205, 214-217, 220-225, 438-438, 495-500, 512-517, 547-547, 552-557, 570-573, 592-593, 639-639, 712-713, 910-910, 969-970, 1031-1032, 1093-1094, 1376-1381, 1397-1403, 1419-1424, 1446-1447
src/brainlayer/search_repo.py (1)
20-20: LGTM!Also applies to: 44-49, 95-95, 117-117, 133-197, 202-207, 229-233, 262-266, 466-466, 528-530, 615-617, 829-830, 880-882, 1021-1082, 1098-1141, 1201-1203, 1350-1352
There was a problem hiding this comment.
Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 16ef4328dc
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| or (not include_audit and self._audit_recursion_count() > 0) | ||
| ) | ||
| effective_k = self._effective_knn_k(n_results, bool(needs_overfetch), include_checkpoints) |
There was a problem hiding this comment.
Size audit overfetch by the filtered rows
When more than 1000 audit/recursive chunks are nearer than the first valid memory, default vector searches can still return empty or incomplete results: the audit predicate is applied after sqlite-vec chooses its k candidates, but this only flips needs_overfetch and _effective_knn_k caps that overfetch at 1000. This PR is meant to recover search quality from existing recursive pollution, so databases with heavy MCP recursion need the audit count added to k the same way checkpoint filtering does rather than just enabling the generic cap.
Useful? React with 👍 / 👎.
| audit_count = int(row[0]) if row else 0 | ||
| setattr(self, "_audit_recursion_count_cache", audit_count) | ||
| setattr(self, "_audit_recursion_count_cache_data_version", current_data_version) | ||
| return audit_count |
There was a problem hiding this comment.
Invalidate the new audit-count cache after writes
The new _audit_recursion_count_cache is populated here, but write paths only invalidate the checkpoint cache; PRAGMA data_version does not change for commits made on the same connection. If a store searches before audit-tagged rows exist and then ingests/imports them through the same VectorStore, subsequent vector searches keep using the cached zero count, skip audit overfetch, and can lose normal results behind filtered audit neighbors until the process reconnects.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.
There was a problem hiding this comment.
Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 585cd9d208
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| entity_name=entity_name, | ||
| project_filter=normalized_project, | ||
| include_checkpoints=include_checkpoints, | ||
| include_audit=include_audit, |
There was a problem hiding this comment.
Filter audit chunks from KG fact lookups
When a query takes the entity route, include_audit is only forwarded to kg_hybrid_search's chunk search here; the SQL fact lookup above still calls _kg_facts_sql(..., include_checkpoints=...), and kg_hybrid_search's kg_search also ignores include_audit. In databases where a KG relation has source_chunk_id pointing at an audit-tagged or recursive MCP-output chunk, default brain_search can still return that relation in the facts payload even though default search is meant to hide audit/recursive memories. Please plumb the audit predicate through the fact lookups or suppress facts sourced from filtered chunks unless include_audit=True.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c73f0138b3
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| not include_audit | ||
| and isinstance(chunk, dict) | ||
| and _is_audit_recursion_metadata({"tags": _parsed_chunk_tags(chunk)}, chunk.get("content")) |
There was a problem hiding this comment.
Filter recursive neighbors from chunk context
When brain_search is called with a normal chunk_id and nonzero before/after, this new audit guard only checks the target chunk before delegating to _context; store.get_context() then returns every surrounding turn without any audit/recursive predicate. In conversations where a nearby turn is a recursive BrainLayer/JSON-RPC output, default chunk-context lookup still emits the polluted content despite include_audit defaulting to false.
Useful? React with 👍 / 👎.
| AND ( | ||
| r.source_chunk_id IS NULL |
There was a problem hiding this comment.
Exclude content-detected checkpoints from KG facts
The commit now treats legacy checkpoint text as hidden in chunk search/exact lookup via is_precompact_checkpoint_content/_precompact_content_exclusion_sql, but the KG fact path still filters only chunk_origin. If an older relation points to a chunk whose content starts with [PreCompact checkpoint] but whose chunk_origin is missing or unknown, default entity searches still return that checkpoint-sourced fact; I verified the same leak through _kg_facts_sql and kg_hybrid_search with such a legacy source chunk.
Useful? React with 👍 / 👎.
|
|
||
|
|
||
| def _precompact_content_exclusion_sql(content_expr: str) -> str: | ||
| normalized = f"LOWER(LTRIM(COALESCE(CAST({content_expr} AS TEXT), '')))" |
There was a problem hiding this comment.
Inconsistent LTRIM strips only spaces, not all whitespace
Medium Severity
_precompact_content_exclusion_sql uses LTRIM(...) without a character set argument, which in SQLite only strips leading spaces. The corresponding Python function is_precompact_checkpoint_content uses .lstrip() which strips all whitespace (tabs, newlines, etc.), and the sibling SQL function _audit_recursion_exclusion_sql correctly uses LTRIM(..., char(9) || char(10) || char(11) || char(12) || char(13) || char(32)). Content with leading tabs or newlines before precompact markers would be caught by Python checks but leak through the SQL filter in search queries.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit c73f013. Configure here.
There was a problem hiding this comment.
Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c7b18695d5
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if stripped.startswith(_BRAIN_SEARCH_BOX_PREFIX): | ||
| return "brain_search_output" |
There was a problem hiding this comment.
Reject non-search BrainLayer MCP boxes
When the watcher or brain_store sees formatted output from other BrainLayer MCP paths, such as format_entity_card()/format_kg_search() output beginning with ┌─ Entity: or ┌─ Entity search:, this guard returns None because it only recognizes the brain_search box and the JSON-RPC markers. In that scenario the recursive MCP response can still be ingested, and the default search filters use the same detector so existing rows with those boxes are not hidden either.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 36dda5f696
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if _JSONRPC_MESSAGE_RE.search(stripped): | ||
| return "jsonrpc_message" |
There was a problem hiding this comment.
Narrow the JSON-RPC guard to BrainLayer output
This rejects any content containing a JSON-RPC 2.0 field, not just recursive BrainLayer MCP output. When a user stores or indexes legitimate MCP/JSON-RPC notes or examples such as {"jsonrpc":"2.0","method":"tools/call"}, store_memory, watcher ingestion, and drain all call this guard and will raise or silently drop that memory, so BrainLayer loses valid technical recall for JSON-RPC work.
Useful? React with 👍 / 👎.
|
@coderabbitai review |
|
@codex review |
There was a problem hiding this comment.
Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.
|
You need to increase your spend limit or enable usage-based billing to run background agents. Go to Cursor |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 3a0d745f8c
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| folded = stripped.casefold() | ||
| if _INVALID_JSONRPC_MARKER in folded: |
There was a problem hiding this comment.
Restrict JSON-RPC rejection to BrainLayer output
When a legitimate memory contains a JSON-RPC 2.0 example or debugging payload, this unconditional regex match makes store_memory, upsert_chunks, watcher/drain ingestion, and updates reject or skip it as recursive MCP output even if it is just technical content. Please gate this on BrainLayer/MCP output markers or BrainLayer tool names rather than any "jsonrpc":"2.0" occurrence so valid protocol notes are not silently dropped.
Useful? React with 👍 / 👎.
| audit_count = self._audit_recursion_count() | ||
| if audit_count <= 0: | ||
| return n_results | ||
| return min(n_results + audit_count, max(n_results, _FILTERED_KNN_MAX)) |
There was a problem hiding this comment.
Avoid starving results after 2000 filtered memories
When more than ~2000 audit/recursive chunks are closer to the query than any normal chunk, this cap limits the KNN window to 2000, the SQL/post filters discard the whole window, and search can return no results even though matching non-audit memories exist just beyond it. Since this change is meant to protect retrieval from audit-recursion pollution, use a paged/iterative fallback or size the fetch past the filtered count instead of hard-capping below it.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.
|
@coderabbitai review |
|
@codex review |
|
You need to increase your spend limit or enable usage-based billing to run background agents. Go to Cursor |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
💡 Codex Review
brainlayer/src/brainlayer/mcp/search_handler.py
Line 1021 in e2accbc
include_audit is now exposed on brain_recall, but the mode="entity" branch ignores it and delegates to _brain_entity, whose lookup still returns evidence via store.get_entity_chunks() without the new audit-recursion filters. In contexts where a recursive/audit chunk is linked to an entity, brain_recall(mode="entity") and smart-routed entity queries can still surface the polluted evidence by default, bypassing the filtering added for search/KG hybrid paths.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
|
||
| if filter_meta_noise and _contains_meta_noise(doc): | ||
| continue | ||
| if not include_audit and _is_audit_recursion_metadata(meta, doc): |
There was a problem hiding this comment.
Honor include_audit before meta-noise filtering
When callers set include_audit=True to inspect audit/recursive MCP-output memories, hybrid search still drops any candidate whose content matches META_NOISE_PATTERNS before reaching this new audit gate. For example, an audit-tagged chunk containing brain_search("...") is retrievable through the plain text search(..., include_audit=True) path but disappears from public brain_search/hybrid results, so the advertised opt-in cannot recover a class of recursive MCP outputs.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.
|
@codex review |
|
✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
🟢 Low
brainlayer/src/brainlayer/mcp/__init__.py
Line 1333 in 25cbf6d
When brain_recall is called with source_filter or correction_category parameters, they are silently dropped and not forwarded to _brain_recall(). This means filtered recall queries return unfiltered results, breaking the caller's intent. Consider adding source_filter and correction_category to the _brain_recall() call at lines 1333-1364.
🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file src/brainlayer/mcp/__init__.py around line 1333:
When `brain_recall` is called with `source_filter` or `correction_category` parameters, they are silently dropped and not forwarded to `_brain_recall()`. This means filtered recall queries return unfiltered results, breaking the caller's intent. Consider adding `source_filter` and `correction_category` to the `_brain_recall()` call at lines 1333-1364.
Evidence trail:
src/brainlayer/mcp/__init__.py lines 1333-1364 (brain_recall dispatcher - no source_filter or correction_category); src/brainlayer/mcp/__init__.py lines 1242-1275 (brain_search dispatcher - passes source_filter line 1270, correction_category line 1271); src/brainlayer/mcp/search_handler.py lines 923-954 (_brain_recall signature - accepts source_filter line 951, correction_category line 952); src/brainlayer/mcp/__init__.py lines 689-843 (brain_recall tool schema - no source_filter or correction_category properties)
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 25cbf6dc3a
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| store: VectorStore, | ||
| embed_fn: Callable[[str], List[float]], | ||
| entity_type: Optional[str] = None, | ||
| include_audit: bool = False, |
There was a problem hiding this comment.
Filter audit-sourced entity relations
With the new default include_audit=False contract on entity lookups, brain_entity still returns relations that were created from audit/recursive chunks because entity_lookup() only passes the flag to get_entity_chunks() while relations_raw = store.get_entity_relations(entity_id) remains unfiltered by source_chunk_id. In a KG where an audit-tagged chunk added a relation, looking up that entity now hides the audit evidence but still displays the audit-sourced relationship, so the default entity path remains polluted despite the opt-in flag.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
There are 4 total unresolved issues (including 3 from previous reviews).
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 25cbf6d. Configure here.
| cursor.execute( | ||
| """ | ||
| SELECT conversation_id, position, content, metadata, content_type | ||
| SELECT conversation_id, position, content, metadata, content_type, tags, chunk_origin |
There was a problem hiding this comment.
get_context unconditionally queries missing chunk_origin column
High Severity
The get_context method now unconditionally selects chunk_origin from the chunks table in both the target chunk query and the surrounding chunks query. However, this column may not exist on old readonly databases where _has_chunk_origin is False. This will cause a SQL error at runtime. Other methods like get_chunk handle this correctly by conditionally including the column (e.g., chunk_origin_select = ", chunk_origin" if has_chunk_origin else "").
Additional Locations (1)
Reviewed by Cursor Bugbot for commit 25cbf6d. Configure here.


Summary
┌─ brain_search:, JSON-RPC 2.0 payloads, andMCP BrainLayer Memory: Invalid JSON-RPC message.brain_store, arbitrated store queueing, and vector upserts.include_auditopt-in, plus content-signature filtering for existing recursive MCP output.chunk_originbackfill missed it.Test plan
pytest tests/test_ingest_guard.py -q-> 5 passedpytest tests/test_audit_recursion_filter.py tests/test_ingest_guard.py -q-> 12 passedpytest tests/test_precompact_chunk_origin.py tests/test_search_exact_chunk_id.py tests/test_audit_recursion_filter.py tests/test_ingest_guard.py -q-> 47 passedpytest tests/test_ingest_guard.py tests/test_audit_recursion_filter.py tests/test_precompact_chunk_origin.py tests/test_search_exact_chunk_id.py tests/test_arbitration.py tests/test_watcher_bridge.py tests/test_3tool_aliases.py -q-> 126 passedruff check ...-> All checks passedpython3 -m py_compile src/brainlayer/ingest_guard.py src/brainlayer/search_repo.py src/brainlayer/mcp/search_handler.py src/brainlayer/vector_store.py-> passedpytest -q --ignore=tests/regression/test_drift_detection.py --ignore=tests/test_eval_framework.py --ignore=tests/test_eval_baselines.py --ignore=tests/test_vector_store.py-> 1970 passed, 8 skipped, 1 xfailed, 100 warningsNotes
pytest -qis blocked by local optional eval environment and live DB assertions: missingdeepchecks,numbaincompatible with NumPy 2.4 under system Python 3.13, plus live DB/eval baseline drift. The venv pre-push gate passed.Note
Medium Risk
Changes affect core ingest + retrieval paths (queue drain, watcher, vector upserts, MCP tools) and alter default search/recall behavior, so regressions could hide expected results or drop writes if heuristics misclassify content.
Overview
Prevents BrainLayer from re-ingesting its own MCP/tool output by adding
ingest_guard.py(detect + reject recursivebrain_*box output and JSON-RPC signatures) and wiring it into all write paths (brain_store, drain/apply handlers, watcher bridge,VectorStore.upsert_chunks/update_chunk, etc.).Adds an
include_auditopt-in (default false) across engine (think/recall), MCP handlers (brain_search/brain_recall/brain_entity), KG queries, context expansion, andSearchMixinso audit/eval + recursive MCP-output chunks are excluded from search/KG/context results by default.Strengthens default PreCompact checkpoint exclusion to be content-based (not just
chunk_origin) and improves KNN overfetch/caching to avoid starving normal results when many filtered (checkpoint/audit) neighbors exist; includes extensive new tests for these filters and ingest guards.Reviewed by Cursor Bugbot for commit 25cbf6d. Bugbot is set up for automated code reviews on this repo. Configure here.
Summary by CodeRabbit
New Features
Bug Fixes
Tests
Note
Block recursive BrainLayer MCP output from being ingested or returned in search results
recursive_mcp_output_reasonin ingest_guard.py to detect content that looks like BrainLayer MCP output (framed boxes, JSON-RPC markers, etc.), and uses it to reject writes at every ingestion entry point:store_memory,upsert_chunks,update_chunk, drain handlers, watcher bridge filters, and the arbitrated store handler.include_auditflag (defaultFalse) threaded through all search paths — vector, FTS, binary KNN, hybrid, KG facts, entity evidence, and context expansion — so audit/recursive content is excluded from results by default.include_auditas a boolean tool parameter in thebrain_searchandbrain_digestMCP schemas, forwarded through the full call stack to the underlying store and engine methods.chunk_origin.upsert_chunksnow silently drops individual invalid chunks from a batch and raisesValueErroronly when the entire batch is rejected; callers that relied on all-or-nothing batch behavior are affected.📊 Macroscope summarized 25cbf6d. 6 files reviewed, 1 issue evaluated, 0 issues filtered, 1 comment posted
🗂️ Filtered Issues