Skip to content

Conversation

@ChrisCoder9000
Copy link
Contributor

@ChrisCoder9000 ChrisCoder9000 commented Jan 27, 2026

Summary by CodeRabbit

  • New Features

    • Entity-level retrieval: info, context, synergies and status endpoints for richer entity exploration
    • Graph consolidation workflow to normalize and clean relationships
    • Plugin guidance for extending functionality
    • Backup endpoints scaffolded (create/reset/delete)
  • Improvements

    • Token usage tracking with detailed cost reporting and configurable pricing
    • Better long-running task support with increased timeouts
  • Updates

    • Version bumped to 2.2.0 and license year updated

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 27, 2026

📝 Walkthrough

Walkthrough

This PR bumps the project to 2.2.0 and adds extensive features: new graph traversal and existence APIs, agent refactors with token accounting and tool modes, graph consolidation and ingestion managers, expanded search/retrieval endpoints, token utilities, and multiple new tools and background tasks. No breaking removals.

Changes

Cohort / File(s) Summary
Version & Build
​.dockerignore, LICENSE, README.md, pyproject.toml, Makefile
Version bump to 2.2.0-dev; expanded .dockerignore; license year update; Makefile ENV handling adjusted.
Config & Pricing
src/config.py
Added PricingConfig and Config.pricing; added run_graph_consolidator flag.
Data adapters & interfaces
src/adapters/data.py, src/adapters/interfaces/data.py, src/lib/mongo/client.py
Added update_structured_data method to adapters/interfaces and Mongo client; minor formatting changes.
Embeddings API
src/adapters/embeddings.py, src/adapters/interfaces/embeddings.py
search_similar_by_ids return type changed from list[Vector] → dict[str, list[Vector]] and docstrings adjusted.
Graph adapter & Neo4j client
src/adapters/graph.py, src/adapters/interfaces/graph.py, src/lib/neo4j/client.py
Added check_node_existence, get_neighborhood, get_next_by_flow_key; get_2nd_degree_hops similarity filtering; Neo4j client extended to store/return polarity, metadata, flow_key, amount, event_names, richer node/predicate fields.
Constants / Models
src/constants/agents.py, src/constants/kg.py
New architect-agent models, TokenDetail family; extended Node/Predicate/Relationship schemas (polarity, metadata, flow_key, amount); new EntityInfo/EntitySynergy types.
Prompts
src/constants/prompts/*.py
Large prompt expansions for architect, janitor, kg consolidator, scout; new and updated prompt constants.
Agents — Architect / Janitor / Scout / KG
src/core/agents/*.py
Major refactors: ArchitectAgent dual-mode (single/tooler), new tools integration, run_tooler, token accounting; JanitorAgent: run_graph_consolidator, run_atomic_janitor, multi-schema support, token accounting; ScoutAgent adds polarity and token accounting; KGAgent adds graph-consolidator mode, retries, token accounting.
Agent Tools — Architect
src/core/agents/tools/architect_agent/*
Added CreateRelationship, GetRemainingEntitiesToProcess, MarkEntitiesAsUsed, CheckUsedEntities tools (creation, validation, marking, serialization).
Agent Tools — Janitor / KG
src/core/agents/tools/janitor_agent/*, src/core/agents/tools/kg_agent/*
Janitor schema/search tools extended (event_names, multi-query); KG execute tool adds brain_id support.
Token utilities
src/utils/tokens.py
New token helpers: token_detail_from_token_counts and merge_token_details.
Ingestion & Auto-KG
src/core/saving/ingestion_manager.py, src/core/saving/auto_kg.py
New IngestionManager for node/rel vectorization; auto_kg refactored to use scout/architect tooler flows and consolidate_graph; removed prior IngestionManager implementation from auto_kg.
Graph consolidation layer
src/core/layers/graph_consolidation/*
New consolidate_graph implementation orchestrating JanitorAgent and KGAgent per-batch, returning TokenDetail summary.
Search / Retrieval
src/core/search/*.py, src/core/search/entity_*
New extract_str_entities_from_text, EntityContext, EventSynergyRetriever (MatchPath), EntitySinergyRetriever and other search helpers for contextual and synergy retrieval.
API models, controllers & routes
src/services/api/constants/requests.py, src/services/api/controllers/entities.py, src/services/api/routes/retrieve.py
New request/response models for entity info/context/synergies/status; new controllers and four new endpoints wired into retrieve routes.
Workers / Tasks
src/workers/app.py, src/workers/tasks/ingestion.py
Increased Celery timeouts to 24h; added process_architect_relationships Celery task and enriched ingestion/structured data flow.
Backups, Plugins, Placeholders & Docs
plugins/readme.txt, src/core/backups/*, src/core/plugins/*, many init.py and README/readme.txt additions
Added plugin README and many placeholder modules/docstrings (backups, schedulers, package init files, metadata headers).
Misc — LLM & Embeddings clients
src/lib/llm/client_small.py, src/lib/embeddings/client_small.py
LangChain client lazy init, larger default max_new_tokens, improved JSON decode/retry handling, new module-level embeddings_small client instance.
API System routes
src/services/api/routes/system.py
Added unimplemented endpoints for brain reset/delete/create-backup (raise NotImplementedError).
Various metadata/headers & small fixes Multiple files updated with header metadata changes, docstrings, minor formatting adjustments.

Sequence Diagram(s)

sequenceDiagram
    participant Client as Client
    participant Architect as ArchitectAgent<br/>(tooler)
    participant Tools as Architect Tools
    participant Janitor as JanitorAgent
    participant KG as GraphAdapter
    participant Emb as EmbeddingsAdapter

    Client->>Architect: run_tooler(text, entities)
    Architect->>Tools: request tool actions (create/get remaining/mark used)
    Tools->>Janitor: validate/normalize relationships
    Janitor->>KG: neighborhood/exists checks & normalization
    Janitor-->>Tools: return fixes / status
    Tools->>Architect: update relationships_set / used_entities_set
    Architect->>Emb: similarity checks for deduplication
    Architect-->>Client: List[ArchitectAgentRelationship]
Loading
sequenceDiagram
    participant Client as Client
    participant API as Retrieve Controller
    participant Retriever as EventSynergyRetriever
    participant Emb as EmbeddingsAdapter
    participant VS as VectorStoreAdapter
    participant KG as GraphAdapter

    Client->>API: GET /retrieve/entity/info (target, query, depth)
    API->>Retriever: retrieve_matches(target, query, depth)
    Retriever->>Emb: embed(query)
    Emb-->>Retriever: query_embedding
    Retriever->>Emb: embed(target)
    Emb-->>Retriever: target_embedding
    Retriever->>VS: search(target_embedding)
    VS-->>Retriever: target_node_vector
    Retriever->>KG: get_nodes_by_uuid(target_uuid)
    KG-->>Retriever: target_node
    loop depth
      Retriever->>KG: get_neighbors(current_node)
      KG-->>Retriever: neighbors
      Retriever->>Emb: embed(neighbor descriptions)
      Emb-->>Retriever: neighbor_embeddings
      Retriever->>Retriever: compute similarities and build MatchPath
    end
    Retriever-->>API: MatchPath
    API-->>Client: GetEntityInfoResponse
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~55 minutes

Possibly related PRs

Poem

🐰 I hopped through nodes and tokens bright,
Tools in paw, I joined the fight,
Architects build, janitors mend,
Siblings, synergies — graphs extend,
Version 2.2.0 — hop, hop, delight! 🎉

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title 'Featured graph consolidation layer (5th agent)' accurately reflects the primary objective of introducing a new graph consolidation layer with associated agents and workflow orchestration.
Docstring Coverage ✅ Passed Docstring coverage is 96.53% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Note

Due to the large number of review comments, Critical severity comments were prioritized as inline comments.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (8)
src/lib/llm/client_small.py (1)

98-114: Default max_new_tokens=100000 exceeds Gemini model limits and will cause API failures.

The hardcoded 100,000 default exceeds the output token limit for all current Gemini models (most support 65,536 max; Gemini 2.0 Flash supports only 8,192). Requests with max_output_tokens above the model's limit will fail parameter validation. Default to None instead and only send max_output_tokens when explicitly provided.

Also fix the conditional check from if max_new_tokens else {} to if max_new_tokens is not None to correctly handle max_new_tokens=0.

🔧 Suggested adjustment
    def generate_text(
-        self, prompt: str, max_new_tokens: int = 100000, timeout: int = None
+        self, prompt: str, max_new_tokens: int | None = None, timeout: int = None
    ) -> str:
         if not prompt or len(prompt.strip()) == 0:
             return "Input prompt is empty"

         timeout = timeout or self.default_timeout
         from google.genai.types import GenerateContentConfig

         def _generate():
             return self.client.models.generate_content(
                 model=self.model,
                 contents=[prompt],
                 config=GenerateContentConfig(
                     response_mime_type="text/plain",
-                    **({"max_output_tokens": max_new_tokens} if max_new_tokens else {}),
+                    **(
+                        {"max_output_tokens": max_new_tokens}
+                        if max_new_tokens is not None
+                        else {}
+                    ),
                 ),
             )

    def generate_json(
         self,
         prompt: str,
-        max_new_tokens: int = 100000,
+        max_new_tokens: int | None = None,
         max_retries: int = 3,
         timeout: int = None,
     ) -> dict:
         timeout = timeout or self.default_timeout
         from google.genai.types import GenerateContentConfig

         def _generate():
             _response = self.client.models.generate_content(
                 model=self.model,
                 contents=[prompt],
                 config=GenerateContentConfig(
                     response_mime_type="application/json",
-                    **({"max_output_tokens": max_new_tokens} if max_new_tokens else {}),
+                    **(
+                        {"max_output_tokens": max_new_tokens}
+                        if max_new_tokens is not None
+                        else {}
+                    ),
                 ),
             )
src/workers/app.py (1)

48-54: visibility_timeout is much lower than task time limits.

The visibility_timeout (7200s = 2 hours) in broker_transport_options is significantly lower than the task time limits (86400s = 24 hours). If a task runs longer than the visibility timeout, the broker may re-deliver it to another worker, causing duplicate execution. Consider increasing visibility_timeout to at least match task_time_limit.

🐛 Proposed fix
     broker_transport_options={
-        "visibility_timeout": 7200,
+        "visibility_timeout": 86400,  # Match task_time_limit to prevent duplicate execution
         "fanout_prefix": True,
         "fanout_patterns": True,
     },
src/core/agents/tools/janitor_agent/JanitorAgentSearchEntitiesTool.py (1)

71-79: Fix _queries initialization and empty-result check.

_queries is undefined when args are provided (UnboundLocalError), and len(found_entities) is the wrong predicate; it should check founds, otherwise the first query always gets the “not found” message.

🐛 Proposed fix
-        _query = ""
-        if len(args) > 0:
-            _query = args[0]
-        else:
-            _queries = kwargs.get("queries", [])
+        _queries = []
+        if len(args) > 0:
+            _queries = args[0] if isinstance(args[0], list) else [args[0]]
+        else:
+            _queries = kwargs.get("queries", [])
@@
-            if len(found_entities) == 0:
+            if len(founds) == 0:
                 found_entities[_clean_query(_query)] = (
                     "Knowledge graph does not contain any entities that match the query"
                 )
             else:
                 found_entities[_clean_query(_query)] = [
                     f.model_dump(mode="json") for f in founds
                 ]

Also applies to: 119-126

src/services/api/routes/retrieve.py (1)

122-127: Incomplete endpoint implementation.

The /context endpoint has no implementation - the function body is empty with no return statement. This will return None implicitly, which FastAPI will convert to a null JSON response.

Suggested fix

Either implement the endpoint or remove it if it's not ready:

 `@retrieve_router.get`("/context")
 async def get_context(request):
     """
     Get the context of an entity.
     """
+    raise NotImplementedError("Endpoint not yet implemented")

Or remove the endpoint entirely if not needed.

src/core/agents/kg_agent.py (1)

171-171: Pass type_="normal" to _get_agent to avoid a misconfigured prompt.

_get_agent now requires type_, but several callers still pass the old positional args. This results in system_prompt=None, which can break agent behavior.

🐛 Suggested fix
-        self._get_agent(identification_params, metadata, brain_id=brain_id)
+        self._get_agent(
+            type_="normal",
+            identification_params=identification_params,
+            metadata=metadata,
+            brain_id=brain_id,
+        )
-        self._get_agent(identification_params, metadata={}, brain_id=brain_id)
+        self._get_agent(
+            type_="normal",
+            identification_params=identification_params,
+            metadata={},
+            brain_id=brain_id,
+        )
-        self._get_agent(
-            identification_params={},
-            metadata={},
-            tools=[
-                KGAgentExecuteGraphOperationTool(
-                    self,
-                    self.kg,
-                    self.database_desc,
-                )
-            ],
-            output_schema=RetrieveNeighborsOutputSchema,
-            extra_system_prompt=extra_system_prompt_str,
-        )
+        self._get_agent(
+            type_="normal",
+            identification_params={},
+            metadata={},
+            tools=[
+                KGAgentExecuteGraphOperationTool(
+                    self,
+                    self.kg,
+                    self.database_desc,
+                )
+            ],
+            output_schema=RetrieveNeighborsOutputSchema,
+            extra_system_prompt=extra_system_prompt_str,
+        )

Also applies to: 210-210, 249-261

src/lib/neo4j/client.py (2)

202-233: Avoid overwriting ingestion metadata when node metadata exists.

metadata from ingestion is replaced by node.metadata, which drops caller-supplied metadata. Merge both and avoid re-overwriting in the attribute loop.

🛠️ Proposed fix
-            all_properties = {
-                **(node.properties or {}),
-                "metadata": metadata,
-            }
+            merged_metadata = {**(metadata or {}), **(node.metadata or {})}
+            all_properties = {
+                **(node.properties or {}),
+                "metadata": merged_metadata or None,
+            }
@@
-            for attr in attributes:
-                if getattr(node, attr, None):
-                    all_properties[attr] = getattr(node, attr)
+            for attr in attributes:
+                if attr == "metadata":
+                    continue
+                if getattr(node, attr, None) is not None:
+                    all_properties[attr] = getattr(node, attr)

486-504: Return the newly added node fields in get_by_uuids.

You now map polarity, happened_at, last_updated, observations, and metadata from the record, but the Cypher doesn’t return them—so they’ll always be defaults.

🛠️ Proposed fix
-        RETURN n.uuid as uuid, n.name as name, labels(n) as labels, n.description as description, properties(n) as properties
+        RETURN n.uuid as uuid, n.name as name, labels(n) as labels, n.description as description,
+               properties(n) as properties,
+               n.polarity as polarity, n.happened_at as happened_at, n.last_updated as last_updated,
+               n.observations as observations, n.metadata as metadata
src/core/agents/architect_agent.py (1)

370-379: Include required token fields in ArchitectAgentResponse.

ArchitectAgentResponse requires input_tokens and output_tokens, but the success path omits them, which will raise validation errors.

🛠️ Proposed fix
         return ArchitectAgentResponse(
             new_nodes=[
                 ArchitectAgentNew(**new_node.model_dump(mode="json"))
                 for new_node in all_new_nodes
             ],
             relationships=[
                 ArchitectAgentRelationship(**relationship.model_dump(mode="json"))
                 for relationship in all_relationships
             ],
+            input_tokens=self.input_tokens,
+            output_tokens=self.output_tokens,
         )
🤖 Fix all issues with AI agents
In `@src/adapters/graph.py`:
- Around line 537-540: There is an incomplete if statement checking
similarity_threshold and fd_pred["metadata"]["v_id"] in src/adapters/graph.py
which will cause a syntax error; either remove the unused conditional or
complete it by explicitly comparing fd_pred["metadata"]["v_id"] to the intended
value and adding the body (for example, perform the intended filter action such
as continuing the loop or applying a threshold check). Locate the block
referencing similarity_threshold and fd_pred (use those exact symbols) and
either delete the partial "if fd_pred['metadata']['v_id']" line or replace it
with a full condition and a concrete action (e.g., if
fd_pred["metadata"]["v_id"] != v_id: continue) to implement the intended
filtering.

In `@src/core/agents/architect_agent.py`:
- Around line 314-317: The code calls a nonexistent method
self.get_token_summary(...) which will raise at runtime; replace that call with
the correct token-update method (e.g.
self.update_token_summary(m.usage_metadata)) or implement a get_token_summary
wrapper that forwards to the class's existing token-summary logic; update the
loop in architect_agent.py to invoke the actual token-summary method (replace
self.get_token_summary with self.update_token_summary or add a get_token_summary
method that calls update_token_summary) so usage metadata is processed without
raising.

In `@src/core/search/entity_info.py`:
- Around line 201-205: Node.happened_at is Optional[str|None] but the code (in
the recency calculation block using days_ago and recency) does datetime.now() -
node.happened_at which will TypeError when happened_at is a string; update the
block that computes days_ago/recency (referencing Node.happened_at and recency)
to first parse string dates into a datetime (e.g., try parsing with
dateutil.parser.parse or datetime.strptime for expected formats), catch parsing
exceptions and fall back to treating the value as missing (days_ago = 0), and
only perform datetime arithmetic when you have a valid datetime object so the
recency formula remains safe.

In `@src/core/search/entity_sibilings.py`:
- Around line 32-41: The code assumes vector_store_adapter.search_vectors(...)
returns a non-empty list and directly accesses target_node_vs[0], which can
raise an IndexError; update the logic in the function containing
target_embedding/target_node_vs so you first check if target_node_vs is truthy
(or len(target_node_vs) > 0) and if it's empty return the same fallback (e.g.,
return None, []) before accessing target_node_vs[0] and calling
graph_adapter.get_by_uuid(target_node_id, brain_id=self.brain_id).

In `@src/services/api/controllers/retrieve.py`:
- Around line 144-151: The filter condition in retrieve.py that builds
fd_v_neighbors_embeddings_map is wrongly requiring v.id.isdigit(), which never
holds for UUID vector IDs (created via node.uuid/str(uuid4())); update the
if-check in the block using cosine_similarity and v.id to remove the digit-only
constraint so it only checks cosine_similarity(looking_for_v.embeddings,
v.embeddings) > 0.5, that v.id exists, and that not v.id.replace("-",
"").isalpha(); remove the v.id.isdigit() clause so the neighbor map is populated
correctly.
🟠 Major comments (19)
src/core/agents/tools/chat_agent/__init__.py-4-7 (1)

4-7: Remove personal email/PII from source header.

Embedding personal email addresses in source code can violate privacy/compliance policies and creates unnecessary PII retention. Prefer SCM metadata or a team alias without PII.

src/core/plugins/__init__.py-1-9 (1)

1-9: Remove or obfuscate personal email addresses to avoid PII exposure.

The docstring contains personal email addresses that will be committed to version control and remain in git history. This could raise compliance/privacy concerns (GDPR, CCPA) and violates best practices for handling PII in source code.

Consider using a generic team email, GitHub username, or removing the email entirely.

🔒 Proposed fix to remove email addresses
 """
 File: /__init__.py
 Created Date: Monday January 12th 2026
-Author: Christian Nonis <alch.infoemail@gmail.com>
+Author: Christian Nonis
 -----
 Last Modified: Monday January 12th 2026 9:22:09 pm
-Modified By: the developer formerly known as Christian Nonis at <alch.infoemail@gmail.com>
+Modified By: Christian Nonis
 -----
 """
src/core/layers/outputs/__init__.py-1-8 (1)

1-8: Remove personal email/PII from module header.

The docstring embeds a personal email address, which is a privacy/compliance risk and unnecessary in source headers. Prefer VCS history, CODEOWNERS, or org-level contact info.

🧹 Proposed fix
-"""
-File: /__init__.py
-Created Date: Tuesday January 13th 2026
-Author: Christian Nonis <alch.infoemail@gmail.com>
------
-Last Modified: Tuesday January 13th 2026 8:48:57 pm
-Modified By: the developer formerly known as Christian Nonis at <alch.infoemail@gmail.com>
------
-"""
+"""Outputs layer package initializer."""
src/core/layers/graph_consolidation/graph_consolidation.py-132-140 (1)

132-140: Token details collected before operation executes.

Token counts are appended to token_details on lines 132-139 before run_graph_consolidator_operator is called on line 140. This means the captured tokens will be zero or initialization values, not reflecting the actual LLM usage from the operation.

🐛 Proposed fix: move token collection after the operation
             kg_agent = KGAgent(
                 llm_adapter=llm_small_adapter,
                 cache_adapter=cache_adapter,
                 kg=graph_adapter,
                 vector_store=vector_store_adapter,
                 embeddings=embeddings_adapter,
                 database_desc=_neo4j_client.graphdb_description,
             )
-            token_details.append(
-                token_detail_from_token_counts(
-                    kg_agent.input_tokens,
-                    kg_agent.output_tokens,
-                    kg_agent.cached_tokens,
-                    kg_agent.reasoning_tokens,
-                )
-            )
             kg_agent.run_graph_consolidator_operator(task, brain_id=brain_id)
+            token_details.append(
+                token_detail_from_token_counts(
+                    kg_agent.input_tokens,
+                    kg_agent.output_tokens,
+                    kg_agent.cached_tokens,
+                    kg_agent.reasoning_tokens,
+                )
+            )
src/core/backups/backup_creator.py-13-31 (1)

13-31: Fix undefined Backup and avoid silent None returns.

Backup is undefined (will cause runtime NameError) and is not imported or defined anywhere in the codebase. The stub returns None implicitly via pass, which violates the declared return type contract. Use a forward reference string annotation and raise NotImplementedError to provide a clear signal that the function is not yet implemented.

🔧 Proposed fix
-def create_backup(brain_id: str) -> Backup:
+def create_backup(brain_id: str) -> "Backup":
@@
-    pass
+    raise NotImplementedError("create_backup is not implemented yet")
src/core/search/entity_sibilings.py-43-46 (1)

43-46: Potential KeyError when accessing neighbors dictionary.

If target_node_id is not present in _neighbors, line 46 will raise a KeyError. Use .get() with a default value for safer access.

🐛 Proposed fix
         _neighbors = graph_adapter.get_neighbors(
             [target_node_id], brain_id=self.brain_id
         )
-        neighbors = _neighbors[target_node_id]
+        neighbors = _neighbors.get(target_node_id, [])
src/workers/app.py-48-49 (1)

48-49: Soft time limit equals hard time limit, defeating graceful shutdown.

Setting task_soft_time_limit equal to task_time_limit (both 86400s) defeats the purpose of having a soft limit. The soft limit should be lower to give tasks time to clean up gracefully before the hard limit kills them. Consider setting task_soft_time_limit to a lower value (e.g., 82800s = 23 hours).

🐛 Proposed fix
-    task_time_limit=86400,
-    task_soft_time_limit=86400,
+    task_time_limit=86400,
+    task_soft_time_limit=82800,  # 23 hours - allows 1 hour for graceful shutdown
src/lib/mongo/client.py-400-408 (1)

400-408: Update operation does not verify document existence or update success.

The update_one call returns the input structured_data regardless of whether a document was actually found and updated. Consider checking matched_count or modified_count from the update result to handle the case where no document matches the query.

🐛 Proposed fix
     def update_structured_data(
         self, structured_data: StructuredData, brain_id: str
     ) -> StructuredData:
         collection = self.get_collection("structured_data", database=brain_id)
-        collection.update_one(
+        result = collection.update_one(
             {"id": structured_data.id},
             {"$set": structured_data.model_dump(mode="json")},
         )
+        if result.matched_count == 0:
+            raise ValueError(f"No structured data found with id: {structured_data.id}")
         return structured_data

Alternatively, if you want upsert behavior (create if not exists):

-        collection.update_one(
+        collection.update_one(
             {"id": structured_data.id},
             {"$set": structured_data.model_dump(mode="json")},
+            upsert=True,
         )
src/core/search/entity_sibilings.py-26-28 (1)

26-28: The polarity parameter is unused and indicates incomplete implementation.

The method declares polarity: Literal["same", "opposite"] but never uses it in the method body. The type hint suggests the parameter should filter results by same or opposite polarity connections, but this logic is not implemented. The method always returns the same results regardless of the polarity argument value. This needs to be completed or removed.

src/core/search/entity_context.py-25-53 (1)

25-53: Return arity/type mismatch in get_context.

The signature says 3-tuple, early returns return 2 values, and the final return has 4 values. Also, text_contexts is a set[str], not list[dict]. This will break callers expecting a consistent tuple shape.

🐛 Proposed fix
-from typing import Tuple
+from typing import Optional, Tuple, List
@@
-    def get_context(
-        self, context_depth: int = 3
-    ) -> Tuple[Node, list[dict], list[dict]]:
+    def get_context(
+        self, context_depth: int = 3
+    ) -> Tuple[Optional[Node], list[dict], list[str], list[dict]]:
@@
-        if not target_node_vs:
-            return (None, [])
+        if not target_node_vs:
+            return (None, [], [], [])
@@
-        if not target_node:
-            return (None, [])
+        if not target_node:
+            return (None, [], [], [])

Also applies to: 87-89

src/core/search/entity_context.py-55-81 (1)

55-81: Guard against missing predicate/node before attribute access.

predicate.description, predicate.direction, and node.name are used unconditionally. If any neighborhood entry lacks these, this will throw.

🐛 Proposed fix
             for nn in neighbors:
                 node = nn.get("node")
                 if node and hasattr(node, "description") and node.description:
                     text_contexts.add(node.description)
                 predicate = nn.get("predicate")
@@
-                entry = {
-                    "description": predicate.description,
-                    "information_direction": predicate.direction,
-                    predicate.name: {
-                        node.name: node.description,
-                        "info": nested_info,
-                    },
-                }
+                if not node or not predicate:
+                    continue
+                entry = {
+                    "description": predicate.description,
+                    "information_direction": predicate.direction,
+                    predicate.name: {
+                        node.name: node.description,
+                        "info": nested_info,
+                    },
+                }
src/services/api/routes/retrieve.py-401-410 (1)

401-410: Mutable default argument.

Using [] as a default argument (line 404) is problematic because mutable defaults are shared across function calls. Use None and initialize inside the function.

Suggested fix
 `@retrieve_router.get`(path="/entity/status")
 async def get_entity_status(
     target: str,
-    types: Optional[List[str]] = [],
+    types: Optional[List[str]] = None,
     brain_id: str = "default",
 ):
     """
     Get the entity status for a given target.
     """
-    return await get_entity_status_controller(target, types, brain_id)
+    return await get_entity_status_controller(target, types or [], brain_id)
src/workers/tasks/ingestion.py-494-532 (1)

494-532: Ensure new structured data is persisted and guard textual_data access.

new_structured_data remains None when there’s no existing record, so save_structured_data will likely error or no-op. Also, len(element.textual_data.keys()) will raise if textual_data is None.

🐛 Suggested fix
-            new_structured_data = None
-
-            if existing_structured_data:
+            new_structured_data = None
+
+            if existing_structured_data:
                 new_structured_data = existing_structured_data.model_copy(
                     update={
                         "data": {
                             **(
                                 existing_structured_data.data
                                 if existing_structured_data.data
                                 else {}
                             ),
                             **(element.json_data if element.json_data else {}),
                         },
                         "types": [
                             *existing_structured_data.types,
                             *element.types,
                         ],
                         "metadata": {
                             **(
                                 existing_structured_data.metadata
                                 if existing_structured_data.metadata
                                 else {}
                             ),
                             **(element.metadata if element.metadata else {}),
                         },
                         "brain_version": BRAIN_VERSION,
                     },
                 )
+            else:
+                new_structured_data = StructuredData(
+                    id=uuid,
+                    data=element.json_data if element.json_data else {},
+                    types=element.types,
+                    metadata=element.metadata if element.metadata else {},
+                    brain_version=BRAIN_VERSION,
+                )
 ...
-                (
-                    json.dumps(element.textual_data, indent=2)
-                    if len(element.textual_data.keys()) > 0
-                    else description
-                ),
+                (
+                    json.dumps(element.textual_data, indent=2)
+                    if element.textual_data and len(element.textual_data) > 0
+                    else description
+                ),

Also applies to: 584-589

src/core/saving/auto_kg.py-70-76 (1)

70-76: Cap the timeout derived from input length.

timeout=360 * len(input) becomes extremely large for long inputs and zero for empty input, which can stall workers or force immediate timeouts. Consider a min/max bound.

🐛 Suggested fix
+    timeout = max(30, min(1800, 2 * len(input)))
     architect_agent.run_tooler(
         input,
         entities.entities,
         targeting=targeting,
         brain_id=brain_id,
-        timeout=360 * len(input),
+        timeout=timeout,
     )
src/core/agents/tools/architect_agent/ArchitectAgentCreateRelationshipTool.py-400-413 (1)

400-413: Avoid enqueuing ingestion before validating wrong_relationships.

The ingestion task is dispatched and relationships are appended even if janitor_response.wrong_relationships is non-empty (checked later). This can persist invalid relationships despite returning an error.

🐛 Suggested fix
-        if relationships_data:
-            from src.workers.tasks.ingestion import process_architect_relationships
-
-            print(
-                "[DEBUG (architect_agent_create_relationship)]: Sending relationships to ingestion task"
-            )
-            process_architect_relationships.delay(
-                {
-                    "relationships": relationships_data,
-                    "brain_id": self.brain_id,
-                }
-            )
-
-        self.architect_agent.relationships_set.extend(output_rels)
-
-        wrong_relationships = []
-        if getattr(janitor_response, "wrong_relationships", []):
-            wrong_relationships = getattr(janitor_response, "wrong_relationships", [])
+        wrong_relationships = getattr(janitor_response, "wrong_relationships", []) or []
+        if wrong_relationships:
+            return {
+                "status": "ERROR",
+                "wrong_relationships": janitor_response.wrong_relationships,
+                "newly_created_nodes": newly_created_nodes,
+            }
+
+        if relationships_data:
+            from src.workers.tasks.ingestion import process_architect_relationships
+
+            print(
+                "[DEBUG (architect_agent_create_relationship)]: Sending relationships to ingestion task"
+            )
+            process_architect_relationships.delay(
+                {
+                    "relationships": relationships_data,
+                    "brain_id": self.brain_id,
+                }
+            )
+
+        self.architect_agent.relationships_set.extend(output_rels)
src/services/api/controllers/entities.py-89-118 (1)

89-118: Guard against empty graph lookups and missing neighbors.

get_by_uuids(...)[0] and rel_tuples[target_node.uuid] can raise when the vector index is stale or a node has been deleted. Add defensive checks to keep the endpoint from 500s.

🐛 Suggested fix
-    for target_node_v in target_node_vs:
-        target_node_id = target_node_v.metadata.get("uuid")
-        target_node = graph_adapter.get_by_uuids([target_node_id], brain_id=brain_id)[0]
+    for target_node_v in target_node_vs:
+        target_node_id = target_node_v.metadata.get("uuid")
+        if not target_node_id:
+            continue
+        nodes = graph_adapter.get_by_uuids([target_node_id], brain_id=brain_id)
+        if not nodes:
+            continue
+        target_node = nodes[0]
 ...
-    rel_tuples = graph_adapter.get_neighbors([target_node], brain_id=brain_id)
+    rel_tuples = graph_adapter.get_neighbors([target_node], brain_id=brain_id)
+    neighbors = rel_tuples.get(target_node.uuid, [])
 ...
-        has_relationships=len(rel_tuples[target_node.uuid]) > 0,
-        relationships=rel_tuples[target_node.uuid],
+        has_relationships=len(neighbors) > 0,
+        relationships=neighbors,
src/constants/prompts/architect_agent.py-40-94 (1)

40-94: Make example JSON valid to avoid malformed outputs.

The sample outputs contain "relationships: [ (missing quote) and // comments, which contradict “Return ONLY JSON” and can prompt invalid responses from the model.

🛠️ Proposed fix
-    "relationships: [
+    "relationships": [
@@
-    "new_nodes": [] // No new nodes were created in this example
+    "new_nodes": []
@@
-            "reason": "The entity was missing from the entities found by the scout." // Why the node was created by you
+            "reason": "The entity was missing from the entities found by the scout."

Also applies to: 112-144

src/lib/neo4j/client.py-280-303 (1)

280-303: Only use predicate attributes when populating relationship fields.

The current loop copies subject/object attributes onto the relationship, which can overwrite relationship metadata and leak node properties into r.*.

🛠️ Proposed fix
-        objects = [subject, to_object, predicate]
+        objects = [predicate]
@@
-                if value:
+                if value is not None:
                     extra_ops += f"""
             SET r.{attr} = {self._format_value(value)}
             """
src/constants/agents.py-78-85 (1)

78-85: Make flow_key optional or provide a default.

Upstream outputs (_ArchitectAgentRelationship) don’t include flow_key, so constructing ArchitectAgentRelationship will fail unless every response includes it.

🛠️ Proposed fix
-    flow_key: str
+    flow_key: Optional[str] = None
🟡 Minor comments (16)
src/core/plugins/__init__.py-2-2 (1)

2-2: Correct the file path in the docstring.

The file path shows /__init__.py but the actual file is located at src/core/plugins/__init__.py. This inconsistency could cause confusion during maintenance.

📝 Proposed fix
-File: /__init__.py
+File: /src/core/plugins/__init__.py
src/lib/milvus/readme.txt-4-4 (1)

4-4: Pin the Docker image tag for reproducibility.

The zilliz/attu:latest tag makes the setup non-deterministic across time. The docker-compose.yaml in this directory already pins all other service versions (etcd:v3.5.18, minio and milvus:v2.6.4), so the documentation should follow the same practice for consistency.

🔧 Suggested update
-docker run -p 3500:3000 --network milvus -e MILVUS_URL=standalone:19530 zilliz/attu:latest
+docker run -p 3500:3000 --network milvus -e MILVUS_URL=standalone:19530 zilliz/attu:<PINNED_VERSION>
Makefile-64-64 (1)

64-64: Shell syntax issue in environment variable export.

The export $(DEBUG_ENVS) ENV="development" syntax is problematic. The ENV="development" after $(DEBUG_ENVS) won't be properly exported—it will be interpreted as an argument to export but the lack of separation means it may not set the variable correctly in all shells.

🔧 Suggested fix
-		bash -c "export $(DEBUG_ENVS) ENV="development" && poetry run celery -A src.workers.app worker --loglevel=info --pool=threads --concurrency=10"; \
+		bash -c "export $(DEBUG_ENVS) ENV='development' && poetry run celery -A src.workers.app worker --loglevel=info --pool=threads --concurrency=10"; \

Or more explicitly:

-		bash -c "export $(DEBUG_ENVS) ENV="development" && poetry run celery -A src.workers.app worker --loglevel=info --pool=threads --concurrency=10"; \
+		bash -c "export $(DEBUG_ENVS) && export ENV='development' && poetry run celery -A src.workers.app worker --loglevel=info --pool=threads --concurrency=10"; \
src/core/agents/tools/janitor_agent/JanitorAgentGetSchemaTool.py-30-34 (1)

30-34: Update the target description to include event_names.
The enum includes event_names, but the description still says only node/relationship types.

✏️ Suggested fix
-                "description": "The target to get the schema of. (node_labels or relationship_types)",
+                "description": "The target to get the schema of. (node_labels, relationship_types, or event_names)",
src/config.py-193-197 (1)

193-197: Normalize RUN_GRAPH_CONSOLIDATOR parsing to avoid case-sensitivity pitfalls.
Common env values like True or 1 will currently resolve to False.

🛠️ Suggested fix
-        self.run_graph_consolidator = (
-            os.getenv("RUN_GRAPH_CONSOLIDATOR", "true") == "true"
-        )
+        self.run_graph_consolidator = os.getenv(
+            "RUN_GRAPH_CONSOLIDATOR", "true"
+        ).lower() in {"true", "1", "yes"}
src/core/search/entity_sibilings.py-22-28 (1)

22-28: Class and method naming inconsistencies with typos.

The class name EntitySinergyRetriever should be EntitySynergyRetriever, and the method retrieve_sibilings should be retrieve_siblings. Additionally, the filename entity_sibilings.py should be entity_siblings.py. These naming inconsistencies will make the codebase harder to search and maintain.

src/constants/prompts/janitor_agent.py-96-163 (1)

96-163: Additional typos in ATOMIC_JANITOR_AGENT_SYSTEM_PROMPT.

  • Line 118: "exaustive" → "exhaustive"
  • Line 153: "wiht" → "with"
📝 Proposed fix
-When you execute search operations in the knowledge graph and the results are empty (meaning that entities are not found in the knowledge graph), don't complain,
-it simply mean that the entities are new and not in the knowledge graph yet, be exaustive the first time you search with different options for the same entity.
+When you execute search operations in the knowledge graph and the results are empty (meaning that entities are not found in the knowledge graph), don't complain,
+it simply means that the entities are new and not in the knowledge graph yet, be exhaustive the first time you search with different options for the same entity.
...
-   "fixed_relationships": [{{...}}], // return here the relationships that you already fixed wiht small fixes.
+   "fixed_relationships": [{{...}}], // return here the relationships that you already fixed with small fixes.
src/constants/prompts/janitor_agent.py-48-86 (1)

48-86: Multiple typos in the prompt text.

Several typos exist in this prompt that could affect clarity:

  • Line 58: "rapresent" → "represent"
  • Line 61: "belong" → "they belong"
  • Line 72: "reffer" → "refer"
📝 Proposed fix
-   (eg: (PERSON:John {{ "uuid": "uuid_1" }})-[ACCOMPLISHED_ACTION]->(EVENT:Trip {{ "uuid": "uuid_3" }})-[INTO_LOCATION]->(CITY:New York {{ "uuid": "uuid_4" }}) and (PERSON:John {{ "uuid": "uuid_1" }})-[MOVED]->(EVENT:Went to {{ "uuid": "uuid_5" }})-[HAPPENED_WITHIN]->(CITY:New York {{ "uuid": "uuid_4" }})",
-   In the example above Trip and Went to are unified into a single event node since they rapresent the same thing.
+   In the example above Trip and Went to are unified into a single event node since they represent the same thing.
    
 - Adding connections between nodes that are not directly connected but are related to create a more coherent graph:
-   Connect nodes that are far from each other in the graph because the belong to different contexts but are related.
+   Connect nodes that are far from each other in the graph because they belong to different contexts but are related.
...
-Each task should include info in also the nodes/relationships/properties/triplets that are involved in the task so that the kg_agent can execute the tasks and reffer to the correct nodes/relationships/properties/triplets.
+Each task should include info on the nodes/relationships/properties/triplets that are involved in the task so that the kg_agent can execute the tasks and refer to the correct nodes/relationships/properties/triplets.
src/services/api/routes/retrieve.py-44-49 (1)

44-49: Typo in function name: "sibilings" should be "siblings".

The import get_entity_sibilings contains a typo. This appears to originate from the controller file (src/services/api/controllers/entities.py). Consider renaming for consistency.

src/constants/kg.py-69-72 (1)

69-72: Typo in description.

"identitier" should be "identifier".

Suggested fix
     flow_key: Optional[str] = Field(
         default=None,
-        description="Unique identitier for contextualizing the predicate into the context flow",
+        description="Unique identifier for contextualizing the predicate into the context flow",
     )
src/constants/prompts/scout_agent.py-67-68 (1)

67-68: Typo in example.

"Partecipated" should be "Participated".

Suggested fix
-    {{"type": "EVENT", "name": "Partecipated in", "description": "The company was part of a joint venture with Apple and Google", "polarity": "positive"}},
+    {{"type": "EVENT", "name": "Participated in", "description": "The company was part of a joint venture with Apple and Google", "polarity": "positive"}},
src/constants/prompts/scout_agent.py-54-58 (1)

54-58: Typo and pronoun inconsistency in example.

  1. "Partecipated" should be "Participated" (line 56)
  2. "his colleagues" should be "her colleagues" since it refers to Mary (lines 56-58)
Suggested fix
-    {{"type": "EVENT", "name": "Partecipated in", "description": "Mary was doing meetings with his colleagues in San Francisco", "polarity": "neutral"}},
-    {{"type": "EVENT", "name": "Meetings", "description": "Mary was doing meetings with his colleagues in San Francisco", "polarity": "neutral"}},
-    {{"type": "PERSON", "name": "Colleagues", "description": "The colleagues Mary was doing meetings with in San Francisco", "polarity": "neutral"}},
+    {{"type": "EVENT", "name": "Participated in", "description": "Mary was doing meetings with her colleagues in San Francisco", "polarity": "neutral"}},
+    {{"type": "EVENT", "name": "Meetings", "description": "Mary was doing meetings with her colleagues in San Francisco", "polarity": "neutral"}},
+    {{"type": "PERSON", "name": "Colleagues", "description": "The colleagues Mary was doing meetings with in San Francisco", "polarity": "neutral"}},
src/core/saving/auto_kg.py-106-118 (1)

106-118: Guard against divide-by-zero when input is empty.

len(input) can be 0, which would raise ZeroDivisionError.

🐛 Suggested fix
+    char_count = max(len(input), 1)
     print(
         f"> Cost/Character -> ${((token_details.input.total * config.pricing.input_token_price + token_details.output.total * config.pricing.output_token_price) / len(input)):.12f}"
     )
src/core/agents/tools/architect_agent/ArchitectAgentCreateRelationshipTool.py-260-268 (1)

260-268: Token detail should reflect JanitorAgent usage, not ArchitectAgent.

janitor_token_detail is currently computed from architect_agent counters, which likely double-counts or misses janitor usage.

🐛 Suggested fix
-        janitor_token_detail = token_detail_from_token_counts(
-            self.architect_agent.input_tokens,
-            self.architect_agent.output_tokens,
-            self.architect_agent.cached_tokens,
-            self.architect_agent.reasoning_tokens,
-        )
+        janitor_token_detail = token_detail_from_token_counts(
+            janitor_agent.input_tokens,
+            janitor_agent.output_tokens,
+            janitor_agent.cached_tokens,
+            janitor_agent.reasoning_tokens,
+        )
src/core/agents/tools/architect_agent/ArchitectAgentCreateRelationshipTool.py-424-438 (1)

424-438: Normalize tool return type—all branches should return JSON strings, not dicts.

The _run method is declared to return str, but the error branch (line 425) returns a dict while the success branch (line 434) returns a JSON string. This type inconsistency can break downstream consumers expecting uniform handling. Convert the error dict to json.dumps() to match the success path and the declared return type.

Suggested fix
-            return {
-                "status": "ERROR",
-                "wrong_relationships": janitor_response.wrong_relationships,
-                "newly_created_nodes": newly_created_nodes,
-            }
+            return json.dumps(
+                {
+                    "status": "ERROR",
+                    "wrong_relationships": janitor_response.wrong_relationships,
+                    "newly_created_nodes": newly_created_nodes,
+                }
+            )
src/core/agents/janitor_agent.py-175-186 (1)

175-186: Reset token counters at the start of run methods.

Token counters accumulate across repeated calls via the += operator in _update_token_counts. To track per-call metrics, reset them at method entry. The run method in architect_agent.py uses this pattern (lines 195-196), confirming it as the intended approach.

🛠️ Proposed fix
     def run_graph_consolidator(
         self,
         new_relationships: List[ArchitectAgentRelationship],
         brain_id: str = "default",
         timeout: int = 90,
         max_retries: int = 3,
     ) -> list[str]:
         """
         Runs a final janitor consolidation across the kg snapshot
         """
+        self.input_tokens = 0
+        self.output_tokens = 0
+        self.cached_tokens = 0
+        self.reasoning_tokens = 0
+        self.token_detail = None
@@
     def run(
         self,
         input_output: JanitorAgentInputOutput,
         text: str,
         targeting: Optional[Node] = None,
         brain_id: str = "default",
         timeout: int = 90,
         max_retries: int = 3,
     ) -> JanitorAgentInputOutput:
         """
         Run the janitor agent.
         """
+        self.input_tokens = 0
+        self.output_tokens = 0
+        self.cached_tokens = 0
+        self.reasoning_tokens = 0
+        self.token_detail = None
@@
     def run_atomic_janitor(
         self,
         input_relationships: List[ArchitectAgentRelationship],
         text: str,
         targeting: Optional[Node] = None,
         brain_id: str = "default",
         timeout: int = 90,
         max_retries: int = 3,
     ) -> AtomicJanitorAgentInputOutput | str:
         """
         Run the atomic janitor agent.
         Can return istructions on how to correct the wrong relationships or 'OK' if the relationships are correct.
         """
+        self.input_tokens = 0
+        self.output_tokens = 0
+        self.cached_tokens = 0
+        self.reasoning_tokens = 0
+        self.token_detail = None

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 27, 2026

Note

Docstrings generation - SUCCESS
Generated docstrings for this pull request at #7

Docstrings generation was requested by @ChrisCoder9000.

* #6 (comment)

The following files were modified:

* `src/adapters/data.py`
* `src/adapters/embeddings.py`
* `src/adapters/graph.py`
* `src/adapters/interfaces/data.py`
* `src/adapters/interfaces/embeddings.py`
* `src/adapters/interfaces/graph.py`
* `src/config.py`
* `src/core/agents/architect_agent.py`
* `src/core/agents/janitor_agent.py`
* `src/core/agents/kg_agent.py`
* `src/core/agents/scout_agent.py`
* `src/core/agents/tools/architect_agent/ArchitectAgentCheckUsedEntitiesTool.py`
* `src/core/agents/tools/architect_agent/ArchitectAgentCreateRelationshipTool.py`
* `src/core/agents/tools/architect_agent/ArchitectAgentGetRemainingEntitiesToProcessTool.py`
* `src/core/agents/tools/architect_agent/ArchitectAgentMarkEntitiesAsUsedTool.py`
* `src/core/agents/tools/janitor_agent/JanitorAgentGetSchemaTool.py`
* `src/core/agents/tools/janitor_agent/JanitorAgentSearchEntitiesTool.py`
* `src/core/agents/tools/kg_agent/KGAgentExecuteGraphOperationTool.py`
* `src/core/backups/backup_creator.py`
* `src/core/layers/graph_consolidation/graph_consolidation.py`
* `src/core/saving/auto_kg.py`
* `src/core/saving/ingestion_manager.py`
* `src/core/search/entities.py`
* `src/core/search/entity_context.py`
* `src/core/search/entity_info.py`
* `src/core/search/entity_sibilings.py`
* `src/lib/embeddings/client_small.py`
* `src/lib/llm/client_small.py`
* `src/lib/mongo/client.py`
* `src/lib/neo4j/client.py`
* `src/services/api/controllers/entities.py`
* `src/services/api/controllers/retrieve.py`
* `src/services/api/routes/retrieve.py`
* `src/utils/tokens.py`
* `src/workers/tasks/ingestion.py`
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Note

Due to the large number of review comments, Critical severity comments were prioritized as inline comments.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (6)
src/lib/llm/client_small.py (3)

224-225: Inconsistent JSON response stripping compared to LLMClientLarge.

LLMClientLarge.generate_json uses .strip("```json").strip("```") to handle markdown code blocks, but this implementation uses .strip("").strip("```"). The .strip("") is a no-op, and the missing "```json" strip may leave residual markers if the model returns JSON in a fenced code block with a language tag.

🐛 Proposed fix
-            _response = _response.text.strip("").strip("```")
+            _response = _response.text.strip("```json").strip("```")

227-234: Same overly broad retry issue as generate_text.

The Exception base class in the retry trigger will mask programming errors. Narrow to specific recoverable exceptions.


152-157: Overly broad exception type in retry logic will mask programming errors.

Using Exception as a retry trigger will retry on any exception, including programming errors like TypeError, AttributeError, or KeyError. This masks bugs and makes debugging difficult. Consider narrowing to specific transient/recoverable exceptions from the Google GenAI SDK (e.g., google.genai.errors.APIError), or rely on the SDK's built-in HttpRetryOptions instead of a manual retry decorator.

src/adapters/interfaces/embeddings.py (1)

59-81: Interface signature change is correct; all implementations properly aligned.

The return type change from list[Vector] to dict[str, list[Vector]] is correctly implemented across all code paths:

  • Interface definition: src/adapters/interfaces/embeddings.py (line 67)
  • Adapter delegation: src/adapters/embeddings.py
  • Milvus implementation: src/lib/milvus/client.py (line 285)

All implementations match the interface contract and enable per-ID result mapping as intended.

Fix the EN DASH character on line 75: Change "0.0–1.0" to use a hyphen-minus (0.0-1.0) for ASCII consistency in docstrings.

src/core/agents/tools/kg_agent/KGAgentExecuteGraphOperationTool.py (1)

73-87: Avoid clobbering a valid positional query.
If args[0] is valid but kwargs["args"] is empty/partial, the current logic overwrites _query with "", leading to false “No query provided.” Only parse kwargs when _query is still empty.

✅ Guard kwargs parsing
-        if len(kwargs) > 0:
+        if not _query and len(kwargs) > 0:
             args_query = kwargs.get("args", {})
             if isinstance(args_query, dict):
                 _query = args_query.get("query", "")
src/core/agents/architect_agent.py (1)

489-498: Include token counts in the success response.

The empty-path includes token counts but the success path omits them, despite the docstring.

🐛 Proposed fix
         return ArchitectAgentResponse(
             new_nodes=[
                 ArchitectAgentNew(**new_node.model_dump(mode="json"))
                 for new_node in all_new_nodes
             ],
             relationships=[
                 ArchitectAgentRelationship(**relationship.model_dump(mode="json"))
                 for relationship in all_relationships
             ],
+            input_tokens=self.input_tokens,
+            output_tokens=self.output_tokens,
         )
🤖 Fix all issues with AI agents
In `@src/core/backups/backup_creator.py`:
- Line 13: The function create_backup declares a return type Backup which is not
defined or imported in this module; fix by either importing Backup from its
defining module (add an import for Backup), declaring Backup locally if
intended, or enable postponed evaluation of annotations by adding from
__future__ import annotations and changing the annotation to a forward-reference
string "Backup" (or just use "Backup" with the future import); update the
create_backup signature accordingly so the module no longer raises NameError on
import.

In `@src/core/layers/graph_consolidation/graph_consolidation.py`:
- Around line 137-145: The token snapshot is taken too early:
token_detail_from_token_counts(kg_agent.input_tokens, kg_agent.output_tokens,
kg_agent.cached_tokens, kg_agent.reasoning_tokens) is invoked before
kg_agent.run_graph_consolidator_operator(...) runs so all counts are zero; move
the token collection/ token_details.append call to immediately after
kg_agent.run_graph_consolidator_operator(task, brain_id=brain_id) so it reads
the populated kg_agent.input_tokens, output_tokens, cached_tokens and
reasoning_tokens values returned/updated by the operation.

In `@src/core/search/entity_context.py`:
- Around line 32-66: The get_context method returns inconsistent tuple sizes and
has an incorrect type annotation; change its signature to return
Tuple[Optional[Node], list[dict], list[dict], list[dict]] (import Optional if
needed), and update all early-return paths (e.g., where it currently returns
(None, []) after target_node_vs or target_node is missing) to return a
consistent four-element tuple like (None, [], [], []); ensure the final
successful return also matches this 4-tuple shape so callers that unpack four
values do not fail.

In `@src/lib/neo4j/client.py`:
- Around line 1412-1426: In get_schema, change the event_names_query Cypher to
use list membership ('EVENT' IN labels(n)) instead of CONTAINS and ensure the
returned field matches the record key: update the query to RETURN n.name AS name
(so record["name"] works) or adjust the record access to the actual alias;
specifically fix event_names_query and the code that builds event_names (refer
to event_names_query and the list comprehension for event_names) so the Cypher
uses "'EVENT' IN labels(n)" and returns an aliased name that matches
record["name"].

In `@src/services/api/routes/retrieve.py`:
- Around line 131-142: The get_context endpoint currently only has a docstring
and returns None; either implement its logic or explicitly mark it as
unimplemented—import and raise fastapi.HTTPException(status_code=501,
detail="get_context not implemented") (or return fastapi.responses.JSONResponse
with 501) from the get_context function to avoid implicit None responses;
reference the get_context function and ensure you add the necessary import
(HTTPException or JSONResponse) at the top of the module.
🟠 Major comments (20)
src/services/api/routes/system.py-37-42 (1)

37-42: Use POST or DELETE instead of GET for destructive operations.

The reset endpoint performs a destructive action (resetting brain state) but uses GET. This violates HTTP semantics where GET should be safe and idempotent. GET requests can be inadvertently triggered by browser prefetching, link previews, crawlers, or cached proxies, potentially causing unintended data loss.

Proposed fix
-@system_router.get(path="/brains/{brain_id}/reset")  # TODO
+@system_router.post(path="/brains/{brain_id}/reset")  # TODO
 async def reset(brain_id: str):
src/services/api/routes/system.py-45-50 (1)

45-50: Use DELETE method for the delete endpoint.

Similar to the reset endpoint, delete is a destructive operation that should not use GET. The DELETE HTTP method is semantically appropriate here, or POST if you prefer a uniform approach for all mutations.

Proposed fix
-@system_router.get(path="/brains/{brain_id}/delete")  # TODO
+@system_router.delete(path="/brains/{brain_id}")  # TODO
 async def delete(brain_id: str):
src/core/agents/tools/janitor_agent/JanitorAgentSearchEntitiesTool.py-94-99 (1)

94-99: Fix kwargs path: it always returns before processing queries.

The function returns immediately even when queries is provided via kwargs, so kwargs-based calls never execute.

✅ Suggested fix
-        if len(args) > 0:
-            _queries = args[0] if isinstance(args[0], list) else [args[0]]
-        else:
-            _queries = kwargs.get("queries", [])
-            return "No queries provided in the arguments or kwargs"
+        if len(args) > 0:
+            _queries = args[0] if isinstance(args[0], list) else [args[0]]
+        else:
+            _queries = kwargs.get("queries", [])
+
+        if not _queries:
+            return "No queries provided in the arguments or kwargs"
src/core/agents/tools/janitor_agent/JanitorAgentSearchEntitiesTool.py-129-135 (1)

129-135: Vector search is never reached due to embed_text return type mismatch.

EmbeddingsAdapter.embed_text returns a list[float], but the code expects .embeddings, causing an exception and skipping vector enrichment. This effectively disables vector search.

✅ Suggested fix
-                query_vector = self.embeddings.embed_text(_query)
-                if query_vector.embeddings and len(query_vector.embeddings) > 0:
-                    v_results = self.vector_store.search_vectors(
-                        query_vector.embeddings,
+                query_vector = self.embeddings.embed_text(_query)
+                if query_vector and len(query_vector) > 0:
+                    v_results = self.vector_store.search_vectors(
+                        query_vector,
                         store="nodes",
                         brain_id=self.brain_id,
                     )
src/core/search/entity_sibilings.py-32-34 (1)

32-34: Unused polarity parameter indicates incomplete implementation.

The polarity parameter (expected values: "same" or "opposite") is declared and documented as a directive to filter relation types, but is never used in the method body. The method builds connections without any filtering logic based on this parameter, suggesting the implementation is incomplete. Either remove the parameter if it's not needed, or implement the filtering logic to use it.

src/core/agents/scout_agent.py-92-108 (1)

92-108: Reset token counters per run to keep per-run usage accurate.
input_tokens, output_tokens, cached_tokens, and reasoning_tokens accumulate across multiple run() calls, but the response docstring implies per-run counts. Reset them (and token_detail) at the start of run().

🧮 Reset counters at run start
     def run(
         self,
         text: str,
         targeting: Optional[Node] = None,
         brain_id: str = "default",
         timeout: int = 90,
         max_retries: int = 3,
     ) -> ScoutAgentResponse:
         """
@@
         Raises:
             TimeoutError: If a single invocation exceeds `timeout`, or if all retry attempts fail due to timeouts.
         """
+        self.input_tokens = 0
+        self.output_tokens = 0
+        self.cached_tokens = 0
+        self.reasoning_tokens = 0
+        self.token_detail = None

Also applies to: 140-252, 254-275

src/core/agents/tools/architect_agent/ArchitectAgentMarkEntitiesAsUsedTool.py-76-94 (1)

76-94: Positional string UUIDs are never recognized.
The docstring claims a single string is supported, but a positional "uuid" currently falls through and gets ignored.

✅ Handle single string positional input
-        elif args and len(args) > 0 and isinstance(args[0], list):
-            entities_to_mark = args[0]
+        elif args and len(args) > 0 and isinstance(args[0], str):
+            entities_to_mark = [args[0]]
+        elif args and len(args) > 0 and isinstance(args[0], list):
+            entities_to_mark = args[0]
src/services/api/controllers/entities.py-124-133 (1)

124-133: Guard against missing nodes from get_by_uuids.
If the vector store points to a UUID not present in the graph, [0] will raise. Skip missing IDs before selecting.

🐛 Suggested fix
-    for target_node_v in target_node_vs:
-        target_node_id = target_node_v.metadata.get("uuid")
-        target_node = graph_adapter.get_by_uuids([target_node_id], brain_id=brain_id)[0]
+    for target_node_v in target_node_vs:
+        target_node_id = target_node_v.metadata.get("uuid")
+        if not target_node_id:
+            continue
+        nodes = graph_adapter.get_by_uuids([target_node_id], brain_id=brain_id)
+        if not nodes:
+            continue
+        target_node = nodes[0]
src/core/saving/auto_kg.py-122-126 (1)

122-126: Guard cost-per-character against empty input.
len(input) can be 0, which raises ZeroDivisionError during cost reporting.

🐛 Suggested fix
-    print(
-        f"> Cost/Character -> ${((token_details.input.total * config.pricing.input_token_price + token_details.output.total * config.pricing.output_token_price) / len(input)):.12f}"
-    )
+    total_cost = (
+        token_details.input.total * config.pricing.input_token_price
+        + token_details.output.total * config.pricing.output_token_price
+    )
+    cost_per_char = (total_cost / len(input)) if len(input) > 0 else 0.0
+    print(f"> Cost/Character -> ${cost_per_char:.12f}")
src/core/saving/ingestion_manager.py-59-76 (1)

59-76: Cache key and early return skip v_id propagation.
Keying by name can collide across entities, and the early return never applies the cached v_id back to node_data.properties, so downstream ingestion can miss vector IDs. Prefer keying by UUID (or name+type) and set the cached id on hit.

🐛 Suggested fix
-        if node_data.name in self.resolved_cache:
-            return node_data.uuid
+        cached_v_id = self.resolved_cache.get(node_data.uuid)
+        if cached_v_id:
+            node_data.properties = {
+                **(node_data.properties or {}),
+                "v_id": cached_v_id,
+            }
+            return node_data.uuid
@@
-            self.resolved_cache[node_data.name] = v_ids[0]
+            self.resolved_cache[node_data.uuid] = v_ids[0]
src/core/agents/tools/architect_agent/ArchitectAgentCreateRelationshipTool.py-423-457 (1)

423-457: Skip ingestion when Janitor reports wrong relationships.
You enqueue ingestion and mutate relationships_set before checking wrong_relationships, so error paths still write to the KG. Move the error check ahead of side effects.

🐛 Suggested fix
-        relationships_data = [
-            rel.model_dump(mode="json")
-            for rel in output_rels
-            if isinstance(rel, ArchitectAgentRelationship)
-        ]
-
-        if relationships_data:
-            from src.workers.tasks.ingestion import process_architect_relationships
-
-            print(
-                "[DEBUG (architect_agent_create_relationship)]: Sending relationships to ingestion task"
-            )
-            process_architect_relationships.delay(
-                {
-                    "relationships": relationships_data,
-                    "brain_id": self.brain_id,
-                }
-            )
-
-        self.architect_agent.relationships_set.extend(output_rels)
-
-        wrong_relationships = []
-        if getattr(janitor_response, "wrong_relationships", []):
-            wrong_relationships = getattr(janitor_response, "wrong_relationships", [])
-
-        if len(wrong_relationships) > 0:
+        wrong_relationships = getattr(janitor_response, "wrong_relationships", []) or []
+        if wrong_relationships:
             print(
                 "[DEBUG (architect_agent_create_relationship)]: Wrong relationships: ",
                 getattr(janitor_response, "wrong_relationships", []),
             )
             return {
                 "status": "ERROR",
                 "wrong_relationships": janitor_response.wrong_relationships,
                 "newly_created_nodes": newly_created_nodes,
             }
+
+        relationships_data = [
+            rel.model_dump(mode="json")
+            for rel in output_rels
+            if isinstance(rel, ArchitectAgentRelationship)
+        ]
+
+        if relationships_data:
+            from src.workers.tasks.ingestion import process_architect_relationships
+
+            print(
+                "[DEBUG (architect_agent_create_relationship)]: Sending relationships to ingestion task"
+            )
+            process_architect_relationships.delay(
+                {
+                    "relationships": relationships_data,
+                    "brain_id": self.brain_id,
+                }
+            )
+
+        self.architect_agent.relationships_set.extend(output_rels)
src/adapters/graph.py-429-453 (1)

429-453: v_id extraction never finds IDs and can raise KeyError.
getattr(n.properties, "v_id", None) on a dict is always None, so v_ids is empty; direct indexing can also throw when the property is missing. Use .get with guards.

🐛 Suggested fix
-        v_ids = [
-            n.properties["v_id"] for n in nodes if getattr(n.properties, "v_id", None)
-        ]
+        v_ids = [
+            getattr(n, "properties", {}).get("v_id")
+            for n in nodes
+            if getattr(n, "properties", {}).get("v_id")
+        ]
@@
-        all_fd_v_ids: List[str] = [
-            fd[1].properties["v_id"]
-            for fds in all_fd_nodes.values()
-            for fd in fds
-            if fd[1].properties["v_id"]
-        ]  # TODO: [missing_property] check why sometime v_id is not present
+        all_fd_v_ids: List[str] = [
+            getattr(fd[1], "properties", {}).get("v_id")
+            for fds in all_fd_nodes.values()
+            for fd in fds
+            if getattr(fd[1], "properties", {}).get("v_id")
+        ]  # TODO: [missing_property] check why sometime v_id is not present
src/core/agents/tools/architect_agent/ArchitectAgentCreateRelationshipTool.py-289-297 (1)

289-297: Janitor token accounting uses the wrong counters.
The token detail is built from self.architect_agent instead of janitor_agent, which can double-count or miss Janitor usage.

🐛 Suggested fix
-        janitor_token_detail = token_detail_from_token_counts(
-            self.architect_agent.input_tokens,
-            self.architect_agent.output_tokens,
-            self.architect_agent.cached_tokens,
-            self.architect_agent.reasoning_tokens,
-        )
+        janitor_token_detail = token_detail_from_token_counts(
+            janitor_agent.input_tokens,
+            janitor_agent.output_tokens,
+            janitor_agent.cached_tokens,
+            janitor_agent.reasoning_tokens,
+        )
src/services/api/routes/retrieve.py-448-465 (1)

448-465: Mutable default argument will cause unexpected behavior.

Line 451 uses a mutable default types: Optional[List[str]] = []. This is a common Python pitfall where the same list instance is shared across all calls. Use None as the default and initialize inside the function.

🐛 Proposed fix
 `@retrieve_router.get`(path="/entity/status")
 async def get_entity_status(
     target: str,
-    types: Optional[List[str]] = [],
+    types: Optional[List[str]] = None,
     brain_id: str = "default",
 ):
src/core/agents/janitor_agent.py-419-421 (1)

419-421: Same per-run counter reset needed here.

Without a reset, token accounting will include previous calls.

🐛 Proposed fix
         accumulated_messages = []
+        self.input_tokens = 0
+        self.output_tokens = 0
+        self.cached_tokens = 0
+        self.reasoning_tokens = 0
+        self.token_detail = None
src/core/agents/janitor_agent.py-592-594 (1)

592-594: Reset counters before atomic-janitor invocation.

Otherwise token_detail and counters may include earlier runs.

🐛 Proposed fix
         accumulated_messages = []
+        self.input_tokens = 0
+        self.output_tokens = 0
+        self.cached_tokens = 0
+        self.reasoning_tokens = 0
+        self.token_detail = None
src/core/agents/architect_agent.py-257-259 (1)

257-259: Reset cached/reasoning counters per run.

Only input/output are reset; cached/reasoning and token_detail can leak across runs and skew accounting.

🐛 Proposed fix
         self.input_tokens = 0
         self.output_tokens = 0
+        self.cached_tokens = 0
+        self.reasoning_tokens = 0
+        self.token_detail = None
src/core/agents/architect_agent.py-529-541 (1)

529-541: Clear tooler state between invocations.

relationships_set, used_entities_set, and token counters persist across runs; run_tooler returns the accumulated set, so stale data can leak.

🐛 Proposed fix
         entities_dict = {entity.uuid: entity for entity in entities}
         self.entities = entities_dict
+        self.relationships_set = []
+        self.used_entities_set = []
+        self.input_tokens = 0
+        self.output_tokens = 0
+        self.cached_tokens = 0
+        self.reasoning_tokens = 0
+        self.token_detail = None

         self._get_agent(
             type_="tooler",
             text=text,
             brain_id=brain_id,
src/core/agents/janitor_agent.py-233-234 (1)

233-234: Reset token counters at the start of this run.

Counters currently accumulate across invocations, so token_detail can reflect prior runs. Apply the same reset in run and run_atomic_janitor as well.

🐛 Proposed fix
         accumulated_messages = []
+        self.input_tokens = 0
+        self.output_tokens = 0
+        self.cached_tokens = 0
+        self.reasoning_tokens = 0
+        self.token_detail = None
src/core/agents/janitor_agent.py-141-176 (1)

141-176: Fix type_ parameter signature and add missing "atomic-janitor" type.

The parameter signature has a type annotation mismatch: it declares type_: str but assigns a Literal[...] typing object as the default, which would cause a ValueError if any caller omits the parameter. Additionally, the Literal is missing "atomic-janitor" even though the docstring and function logic both handle this type (lines 159, 171–174).

Proposed fix
-        type_: str = Literal["janitor", "graph-janitor"],
+        type_: Literal["janitor", "graph-janitor", "atomic-janitor"] = "janitor",
🟡 Minor comments (10)
src/core/search/entity_sibilings.py-60-63 (1)

60-63: Missing guard for empty neighbors dictionary.

If get_neighbors returns an empty dict or the target_node_id key is missing, accessing _neighbors[target_node_id] will raise a KeyError.

🐛 Proposed fix
         _neighbors = graph_adapter.get_neighbors(
             [target_node_id], brain_id=self.brain_id
         )
-        neighbors = _neighbors[target_node_id]
+        neighbors = _neighbors.get(target_node_id, [])
src/core/search/entities.py-34-214 (1)

34-214: Deduplicate the stop-word set (Ruff B033), and consider module-level constants.
There are multiple duplicate entries flagged by Ruff; cleaning them up will avoid lint noise. While updating, consider hoisting the stop-word set and compiled regex to module scope to avoid re-allocating them on every call.

src/core/agents/tools/architect_agent/ArchitectAgentCheckUsedEntitiesTool.py-51-54 (1)

51-54: Handle plain dict entities safely.
If used_entities_set contains raw dicts, entity.dict() will raise. Add an explicit dict branch (or a generic fallback).

🛠️ Safer serialization
-        entities_list = [
-            entity.model_dump() if hasattr(entity, "model_dump") else entity.dict()
-            for entity in self.architect_agent.used_entities_set
-        ]
+        def _serialize_entity(entity):
+            if hasattr(entity, "model_dump"):
+                return entity.model_dump()
+            if isinstance(entity, dict):
+                return entity
+            if hasattr(entity, "dict"):
+                return entity.dict()
+            return dict(entity)
+
+        entities_list = [
+            _serialize_entity(entity)
+            for entity in self.architect_agent.used_entities_set
+        ]
src/adapters/embeddings.py-86-100 (1)

86-100: Docstring promises ordering that isn’t enforced.
The adapter returns the backend’s lists verbatim; unless every VectorStoreClient guarantees sorted similarity, the “ordered by descending similarity” claim is risky. Either sort here or soften the docstring.

✏️ Suggested docstring tweak
-            Each list contains at most `limit` items, includes only vectors with similarity >= `min_similarity`,
-            and is ordered by descending similarity.
+            Each list contains at most `limit` items and includes only vectors with similarity >= `min_similarity`.
+            Ordering is determined by the underlying vector store implementation.
src/core/agents/tools/architect_agent/ArchitectAgentCheckUsedEntitiesTool.py-42-42 (1)

42-42: Silence Ruff ARG002 for unused args/kwargs.
The method doesn’t use args/kwargs; add a short guard to avoid lint failures.

🧹 Minimal fix
     def _run(self, *args, **kwargs) -> str:
+        _ = args, kwargs
         """
src/core/agents/tools/architect_agent/ArchitectAgentMarkEntitiesAsUsedTool.py-19-32 (1)

19-32: Mark args_schema as a ClassVar to avoid mutable-field warnings.
Ruff RUF012 will flag this; marking it as ClassVar prevents accidental mutation and model-field treatment.

🧹 ClassVar tweak
-import json
-from langchain.tools import BaseTool
+import json
+from typing import ClassVar
+from langchain.tools import BaseTool
@@
-    args_schema: dict = {
+    args_schema: ClassVar[dict] = {
src/core/agents/scout_agent.py-109-118 (1)

109-118: Silence Ruff ARG002 for unused brain_id.
Rename the parameter to _brain_id (or add a no-op assignment) to avoid lint failures.

🧹 Quick fix
-    def _get_tools(self, brain_id: str = "default") -> List[BaseTool]:
+    def _get_tools(self, _brain_id: str = "default") -> List[BaseTool]:
         """
src/services/api/controllers/entities.py-100-104 (1)

100-104: Change types parameter default from mutable list to None.

Using [] as a default argument creates a shared mutable object across all function calls, violating Python best practices. Change to Optional[List[str]] = None and initialize with types = types or [] inside the function body.

♻️ Suggested fix
-from typing import List, Literal
+from typing import List, Literal, Optional
@@
-async def get_entity_status(
-    target: str,
-    types: List[str] = [],
-    brain_id: str = "default",
-) -> GetEntityStatusResponse:
+async def get_entity_status(
+    target: str,
+    types: Optional[List[str]] = None,
+    brain_id: str = "default",
+) -> GetEntityStatusResponse:
@@
     """
     Retrieve status information for an entity matching the provided target text.
@@ -115,6 +115,7 @@
     """
 
+    types = types or []
     target_embeddings = embeddings_adapter.embed_text(target)
src/core/agents/kg_agent.py-438-442 (1)

438-442: Add exception chaining for proper traceback preservation.

When re-raising TimeoutError from FutureTimeoutError, use raise ... from to preserve the exception chain for debugging.

🐛 Proposed fix
             except FutureTimeoutError:
-                raise TimeoutError(
+                raise TimeoutError(
                     f"Graph consolidator operator invoke timed out after {timeout} seconds. "
                     "This may indicate a network issue or the LLM service is unresponsive."
-                )
+                ) from None
src/workers/tasks/ingestion.py-532-569 (1)

532-569: Add missing else-branch or guard for None structured_data.

When existing_structured_data is None (line 538 condition is false), new_structured_data remains None (line 536) and is passed to save_structured_data at line 567. The function's type hint expects StructuredData, not Optional[StructuredData], and the implementation calls .model_dump() on the input, which will fail with AttributeError if None is passed. Either add an else-branch to create new structured data when none exists, or add a guard to skip saving when new_structured_data is None.

🧹 Nitpick comments (18)
src/core/agents/tools/janitor_agent/JanitorAgentSearchEntitiesTool.py (1)

147-148: Avoid bare Exception + print; use narrowed exceptions and structured logging.

Catching Exception hides unexpected errors and print bypasses app logging. Prefer handling known embedding/vector-store errors and logging through your configured logger.

src/config.py (1)

199-220: Normalize RUN_GRAPH_CONSOLIDATOR parsing for common truthy values.

Right now only the exact string "true" enables the flag. Consider accepting common variants to avoid misconfiguration when ops uses "True", "1", or "yes".

♻️ Proposed tweak
-        self.run_graph_consolidator = (
-            os.getenv("RUN_GRAPH_CONSOLIDATOR", "true") == "true"
-        )
+        self.run_graph_consolidator = (
+            os.getenv("RUN_GRAPH_CONSOLIDATOR", "true").strip().lower()
+            in {"true", "1", "yes"}
+        )
src/core/agents/tools/architect_agent/ArchitectAgentGetRemainingEntitiesToProcessTool.py (1)

41-53: Make serialization resilient to non-Pydantic entities.

Other tools already guard against entities lacking model_dump. Mirroring that pattern avoids potential AttributeError if the entity set contains plain dict-like objects.

♻️ Proposed tweak
-        remaining_entities = json.dumps(
-            [
-                entity.model_dump(mode="json")
-                for entity in self.architect_agent.entities.values()
-            ]
-        )
+        remaining_entities = json.dumps(
+            [
+                entity.model_dump(mode="json")
+                if hasattr(entity, "model_dump")
+                else (entity.dict() if hasattr(entity, "dict") else entity)
+                for entity in self.architect_agent.entities.values()
+            ]
+        )
src/utils/tokens.py (1)

25-44: Consider extracting the zero-initialized TokenDetail into a helper.

The same zero-initialized TokenDetail construction appears twice (lines 26-33 and 37-44). This could be extracted into a private helper or a module-level constant to reduce duplication.

♻️ Suggested refactor
+def _empty_token_detail() -> TokenDetail:
+    return TokenDetail(
+        input=TokenInputDetail(total=0, uncached=0, cached=0, cache_percentage=0.0),
+        output=TokenOutputDetail(
+            total=0, regular=0, reasoning=0, reasoning_percentage=0.0
+        ),
+        grand_total=0,
+        effective_total=0,
+    )
+
+
 def merge_token_details(token_details: list[TokenDetail]) -> TokenDetail:
     # ...docstring...
     if not token_details:
-        return TokenDetail(
-            input=TokenInputDetail(total=0, uncached=0, cached=0, cache_percentage=0.0),
-            output=TokenOutputDetail(
-                total=0, regular=0, reasoning=0, reasoning_percentage=0.0
-            ),
-            grand_total=0,
-            effective_total=0,
-        )
+        return _empty_token_detail()

     token_details = [detail for detail in token_details if detail is not None]
     if not token_details:
-        return TokenDetail(
-            input=TokenInputDetail(total=0, uncached=0, cached=0, cache_percentage=0.0),
-            output=TokenOutputDetail(
-                total=0, regular=0, reasoning=0, reasoning_percentage=0.0
-            ),
-            grand_total=0,
-            effective_total=0,
-        )
+        return _empty_token_detail()
src/lib/llm/client_small.py (2)

109-111: Use explicit None union type for optional parameters.

Per PEP 484, use int | None instead of defaulting to None with a bare int type hint.

♻️ Proposed fix
     def generate_text(
-        self, prompt: str, max_new_tokens: int = 65536, timeout: int = None
+        self, prompt: str, max_new_tokens: int = 65536, timeout: int | None = None
     ) -> str:

186-192: Use explicit None union type for optional timeout parameter.

Same PEP 484 issue as in generate_text.

♻️ Proposed fix
     def generate_json(
         self,
         prompt: str,
         max_new_tokens: int = 65536,
         max_retries: int = 3,
-        timeout: int = None,
+        timeout: int | None = None,
     ) -> dict:
src/core/search/entity_sibilings.py (3)

11-11: Unused import: dataclass.

The dataclass decorator is imported but never used in this file.

♻️ Proposed fix
-from dataclasses import dataclass
 from typing import List, Literal, Tuple

133-137: Loop variable v_id is unused; use _ to indicate intent.

The loop iterates over all_similar_dict.items() but only uses similar_vectors. Use _ for the unused key to clarify intent.

♻️ Proposed fix
-        for v_id, similar_vectors in all_similar_dict.items():
-            for similar_node_v in similar_vectors:
+        for _, similar_vectors in all_similar_dict.items():
+            for similar_node_v in similar_vectors:

22-23: Typos in class and method names: "Sinergy" → "Synergy", "sibilings" → "siblings".

The class EntitySinergyRetriever and method retrieve_sibilings contain spelling errors that may cause confusion and make the API harder to discover.

Also applies to: 32-33

src/workers/app.py (1)

48-54: 24-hour task limits and visibility timeout raise reliability concerns.

Increasing task_time_limit, task_soft_time_limit, and visibility_timeout to 86400 seconds (24 hours) has implications:

  1. Worker crash recovery: If a worker dies mid-task, the task won't be retried until the visibility timeout expires—24 hours later.
  2. Resource utilization: Long-running tasks block worker slots for extended periods.
  3. Debugging difficulty: Tasks that hang will take a day to surface.

If certain tasks genuinely require 24 hours, consider:

  • Using separate queues with different timeouts for long-running vs. normal tasks
  • Implementing periodic heartbeats/checkpoints for long tasks
  • Adding monitoring/alerting for tasks exceeding expected durations
src/core/search/entities.py (1)

268-270: Use keyword arguments for graph_adapter.search_entities.
This avoids subtle bugs if the adapter signature changes or the call order is misunderstood.

🔧 Safer call site
-    result = graph_adapter.search_entities(
-        brain_id, limit, skip, node_labels, query_text
-    )
+    result = graph_adapter.search_entities(
+        brain_id=brain_id,
+        limit=limit,
+        skip=skip,
+        node_labels=node_labels,
+        query_text=query_text,
+    )
src/core/layers/graph_consolidation/graph_consolidation.py (3)

12-22: Remove unused imports.

The following imports are not used in this file: json, concurrent.futures (ThreadPoolExecutor, FutureTimeoutError), and tenacity modules (retry, stop_after_attempt, wait_exponential, retry_if_exception_type, RetryError). Additionally, HumanMessage and RemoveMessage from langchain_core are unused.

♻️ Proposed cleanup
-import json
 from typing import List
-from concurrent.futures import ThreadPoolExecutor, TimeoutError as FutureTimeoutError
-from tenacity import (
-    retry,
-    stop_after_attempt,
-    wait_exponential,
-    retry_if_exception_type,
-    RetryError,
-)

 from pydantic import BaseModel
-from langchain_core.messages import HumanMessage, RemoveMessage
 from src.config import config

86-89: Replace debug print statements with structured logging.

Multiple print statements are used for debugging (lines 86-89, 111-114, 149-156). Consider using the project's logging utilities (src/utils/logging) for consistent, configurable log output.


26-42: Additional unused imports from project modules.

Several imports appear unused: ArchitectAgentEntity, GraphConsolidatorOutput, Node, Predicate, JANITOR_AGENT_GRAPH_NORMALIZATOR_PROMPT, HISTORY_MAX_MESSAGES, HISTORY_MAX_MESSAGES_DELETE, and JanitorAgentExecuteGraphReadOperationTool.

src/core/search/entity_info.py (1)

50-60: Remove stale commented-out type hint.

Line 60 has a commented-out return type # -> List[SynergyPath]: that doesn't match the actual return type. The docstring correctly describes the return type. Remove the stale comment.

♻️ Proposed fix
     def _recursive_explorer(
         self,
         current_node_id: str,
         query_embedding: List[float],
         depth: int,
         visited_ids: Set[str],
         most_similar_conn_rel_tuple: Optional[
             Tuple[Tuple[Predicate, Node], float]
         ] = None,
-    ):
-        # -> List[SynergyPath]:
+    ) -> List[Tuple[Tuple[Predicate, Node], float]]:
src/workers/tasks/ingestion.py (1)

262-262: Unused variable v_id from tuple unpacking.

The v_id variable is unpacked but never used. Prefix with underscore to indicate intentional discard.

♻️ Proposed fix
-                    v_id, v_rel_id = future.result(timeout=180)
+                    _v_id, v_rel_id = future.result(timeout=180)
src/core/agents/kg_agent.py (1)

444-455: Unused response variable.

The response variable at line 445 is assigned but never used. The function always returns "OK" regardless of the response content. Either use the response or prefix with underscore.

♻️ Proposed fix
         try:
-            response = _invoke_agent_with_retry()
+            _invoke_agent_with_retry()
         except RetryError as e:
src/lib/neo4j/client.py (1)

1607-1616: Remove unnecessary f-string prefix.

The Cypher query string at line 1607 uses an f-string prefix but contains no placeholders. Remove the f prefix.

♻️ Proposed fix
-        cypher_query = f"""
+        cypher_query = """
         MATCH ()-[r]-(m)-[r2]-(b)
         WHERE r.uuid = $predicate_uuid
         AND r2.flow_key = $flow_key

"""


def create_backup(brain_id: str) -> Backup:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Undefined type Backup will cause import failure.

The return type annotation references Backup, but this type is neither imported nor defined in the module. This will raise a NameError when Python evaluates the annotation at import time.

Either import the Backup type from its definition module, define it locally, or use a forward reference string annotation ("Backup") with from __future__ import annotations if the type will be defined later.

Proposed fix (assuming Backup is defined elsewhere)
+from __future__ import annotations
+from typing import TYPE_CHECKING
+
+if TYPE_CHECKING:
+    from src.core.backups.models import Backup  # adjust import path as needed
+

 def create_backup(brain_id: str) -> Backup:

Or if Backup doesn't exist yet:

+from __future__ import annotations
+
+
 def create_backup(brain_id: str) -> Backup:
🧰 Tools
🪛 Ruff (0.14.14)

13-13: Undefined name Backup

(F821)

🤖 Prompt for AI Agents
In `@src/core/backups/backup_creator.py` at line 13, The function create_backup
declares a return type Backup which is not defined or imported in this module;
fix by either importing Backup from its defining module (add an import for
Backup), declaring Backup locally if intended, or enable postponed evaluation of
annotations by adding from __future__ import annotations and changing the
annotation to a forward-reference string "Backup" (or just use "Backup" with the
future import); update the create_backup signature accordingly so the module no
longer raises NameError on import.

Comment on lines +137 to +145
token_details.append(
token_detail_from_token_counts(
kg_agent.input_tokens,
kg_agent.output_tokens,
kg_agent.cached_tokens,
kg_agent.reasoning_tokens,
)
)
kg_agent.run_graph_consolidator_operator(task, brain_id=brain_id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Token accounting captured before agent execution returns zero tokens.

The token_detail_from_token_counts is called immediately after KGAgent instantiation but before run_graph_consolidator_operator executes. At this point, kg_agent.input_tokens, output_tokens, etc. are all initialized to 0, so the captured token details will always be empty.

Move the token collection after the agent operation completes:

🐛 Proposed fix
         for task in tasks:
             kg_agent = KGAgent(
                 llm_adapter=llm_small_adapter,
                 cache_adapter=cache_adapter,
                 kg=graph_adapter,
                 vector_store=vector_store_adapter,
                 embeddings=embeddings_adapter,
                 database_desc=_neo4j_client.graphdb_description,
             )
+            kg_agent.run_graph_consolidator_operator(task, brain_id=brain_id)
             token_details.append(
                 token_detail_from_token_counts(
                     kg_agent.input_tokens,
                     kg_agent.output_tokens,
                     kg_agent.cached_tokens,
                     kg_agent.reasoning_tokens,
                 )
             )
-            kg_agent.run_graph_consolidator_operator(task, brain_id=brain_id)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
token_details.append(
token_detail_from_token_counts(
kg_agent.input_tokens,
kg_agent.output_tokens,
kg_agent.cached_tokens,
kg_agent.reasoning_tokens,
)
)
kg_agent.run_graph_consolidator_operator(task, brain_id=brain_id)
kg_agent.run_graph_consolidator_operator(task, brain_id=brain_id)
token_details.append(
token_detail_from_token_counts(
kg_agent.input_tokens,
kg_agent.output_tokens,
kg_agent.cached_tokens,
kg_agent.reasoning_tokens,
)
)
🤖 Prompt for AI Agents
In `@src/core/layers/graph_consolidation/graph_consolidation.py` around lines 137
- 145, The token snapshot is taken too early:
token_detail_from_token_counts(kg_agent.input_tokens, kg_agent.output_tokens,
kg_agent.cached_tokens, kg_agent.reasoning_tokens) is invoked before
kg_agent.run_graph_consolidator_operator(...) runs so all counts are zero; move
the token collection/ token_details.append call to immediately after
kg_agent.run_graph_consolidator_operator(task, brain_id=brain_id) so it reads
the populated kg_agent.input_tokens, output_tokens, cached_tokens and
reasoning_tokens values returned/updated by the operation.

Comment on lines +32 to +66
def get_context(
self, context_depth: int = 3
) -> Tuple[Node, list[dict], list[dict]]:
"""
Retrieve contextual information related to the target entity up to a specified depth.

Returns the target node, its neighborhood structure, a list of unique textual descriptions from the node and predicate metadata, and a structured natural language representation of the neighborhood.

@param context_depth The depth of the neighborhood context to retrieve around the target entity.

@returns A tuple containing:
- The target node object.
- A list representing the neighborhood of the target node.
- A list of unique descriptive texts collected from the nodes and predicates.
- A structured list suitable for natural language processing, representing relationships and descriptions within the neighborhood.
"""

text_contexts = set()
natural_language_web = list()

target_embedding = embeddings_adapter.embed_text(self.target)
target_node_vs = vector_store_adapter.search_vectors(
target_embedding.embeddings, store="nodes", brain_id=self.brain_id
)

if not target_node_vs:
return (None, [])

target_node_v = target_node_vs[0]
target_node_id = target_node_v.metadata.get("uuid")

target_node = graph_adapter.get_by_uuid(target_node_id, brain_id=self.brain_id)

if not target_node:
return (None, [])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Return tuple size is inconsistent with callers.
On a miss, you return a 2‑tuple, but callers unpack 4 values; the annotation also only covers 3. This will raise at runtime. Return a consistent 4‑tuple and make the type explicit.

🐛 Suggested fix
-from typing import Tuple
+from typing import Optional, Tuple
@@
-    def get_context(
-        self, context_depth: int = 3
-    ) -> Tuple[Node, list[dict], list[dict]]:
+    def get_context(
+        self, context_depth: int = 3
+    ) -> Tuple[Optional[Node], list[dict], list[str], list[dict]]:
@@
-        if not target_node_vs:
-            return (None, [])
+        if not target_node_vs:
+            return (None, [], [], [])
@@
-        target_node_id = target_node_v.metadata.get("uuid")
+        target_node_id = target_node_v.metadata.get("uuid")
+        if not target_node_id:
+            return (None, [], [], [])
@@
-        if not target_node:
-            return (None, [])
+        if not target_node:
+            return (None, [], [], [])
🤖 Prompt for AI Agents
In `@src/core/search/entity_context.py` around lines 32 - 66, The get_context
method returns inconsistent tuple sizes and has an incorrect type annotation;
change its signature to return Tuple[Optional[Node], list[dict], list[dict],
list[dict]] (import Optional if needed), and update all early-return paths
(e.g., where it currently returns (None, []) after target_node_vs or target_node
is missing) to return a consistent four-element tuple like (None, [], [], []);
ensure the final successful return also matches this 4-tuple shape so callers
that unpack four values do not fail.

Comment on lines +1412 to +1426
event_names_query = "MATCH (n) WHERE labels(n) CONTAINS 'EVENT' RETURN n.name"

labels_result = self.driver.execute_query(labels_query, database_=brain_id)
relationships_result = self.driver.execute_query(
relationships_query, database_=brain_id
)
event_names_result = self.driver.execute_query(
event_names_query, database_=brain_id
)

labels = [record["label"] for record in labels_result.records]
relationships = [
record["relationshipType"] for record in relationships_result.records
]
event_names = [record["name"] for record in event_names_result.records]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Cypher syntax error and record key mismatch in get_schema.

  1. Line 1412: labels(n) CONTAINS 'EVENT' is invalid Cypher. labels(n) returns a list, and CONTAINS is for string matching. Use 'EVENT' IN labels(n) instead.

  2. Line 1426: The query returns n.name but the code accesses record["name"]. The alias should match.

🐛 Proposed fix
-        event_names_query = "MATCH (n) WHERE labels(n) CONTAINS 'EVENT' RETURN n.name"
+        event_names_query = "MATCH (n) WHERE 'EVENT' IN labels(n) RETURN n.name as name"
         ...
-        event_names = [record["name"] for record in event_names_result.records]
+        event_names = [record["name"] for record in event_names_result.records]
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
event_names_query = "MATCH (n) WHERE labels(n) CONTAINS 'EVENT' RETURN n.name"
labels_result = self.driver.execute_query(labels_query, database_=brain_id)
relationships_result = self.driver.execute_query(
relationships_query, database_=brain_id
)
event_names_result = self.driver.execute_query(
event_names_query, database_=brain_id
)
labels = [record["label"] for record in labels_result.records]
relationships = [
record["relationshipType"] for record in relationships_result.records
]
event_names = [record["name"] for record in event_names_result.records]
event_names_query = "MATCH (n) WHERE 'EVENT' IN labels(n) RETURN n.name as name"
labels_result = self.driver.execute_query(labels_query, database_=brain_id)
relationships_result = self.driver.execute_query(
relationships_query, database_=brain_id
)
event_names_result = self.driver.execute_query(
event_names_query, database_=brain_id
)
labels = [record["label"] for record in labels_result.records]
relationships = [
record["relationshipType"] for record in relationships_result.records
]
event_names = [record["name"] for record in event_names_result.records]
🤖 Prompt for AI Agents
In `@src/lib/neo4j/client.py` around lines 1412 - 1426, In get_schema, change the
event_names_query Cypher to use list membership ('EVENT' IN labels(n)) instead
of CONTAINS and ensure the returned field matches the record key: update the
query to RETURN n.name AS name (so record["name"] works) or adjust the record
access to the actual alias; specifically fix event_names_query and the code that
builds event_names (refer to event_names_query and the list comprehension for
event_names) so the Cypher uses "'EVENT' IN labels(n)" and returns an aliased
name that matches record["name"].

Comment on lines +131 to 142
@retrieve_router.get("/context")
async def get_context(request):
"""
Get the context of an entity.
Handle an HTTP request to retrieve an entity's contextual information.

Parameters:
request: The incoming FastAPI request containing the parameters (query string or body) used to identify the target entity and any context options.

Returns:
The HTTP response payload to be returned to the client containing the entity context.
"""

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Incomplete endpoint implementation.

The get_context endpoint has a docstring but no implementation body. It will implicitly return None, which is likely not the intended behavior. Either implement the endpoint or mark it as not implemented.

🐛 Proposed fix - mark as not implemented
 `@retrieve_router.get`("/context")
 async def get_context(request):
     """
     Handle an HTTP request to retrieve an entity's contextual information.
     ...
     """
+    raise NotImplementedError("This endpoint is not yet implemented")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
@retrieve_router.get("/context")
async def get_context(request):
"""
Get the context of an entity.
Handle an HTTP request to retrieve an entity's contextual information.
Parameters:
request: The incoming FastAPI request containing the parameters (query string or body) used to identify the target entity and any context options.
Returns:
The HTTP response payload to be returned to the client containing the entity context.
"""
`@retrieve_router.get`("/context")
async def get_context(request):
"""
Handle an HTTP request to retrieve an entity's contextual information.
Parameters:
request: The incoming FastAPI request containing the parameters (query string or body) used to identify the target entity and any context options.
Returns:
The HTTP response payload to be returned to the client containing the entity context.
"""
raise NotImplementedError("This endpoint is not yet implemented")
🤖 Prompt for AI Agents
In `@src/services/api/routes/retrieve.py` around lines 131 - 142, The get_context
endpoint currently only has a docstring and returns None; either implement its
logic or explicitly mark it as unimplemented—import and raise
fastapi.HTTPException(status_code=501, detail="get_context not implemented") (or
return fastapi.responses.JSONResponse with 501) from the get_context function to
avoid implicit None responses; reference the get_context function and ensure you
add the necessary import (HTTPException or JSONResponse) at the top of the
module.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants