-
Notifications
You must be signed in to change notification settings - Fork 0
Featured graph consolidation layer (5th agent) #6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
📝 WalkthroughWalkthroughThis PR bumps the project to 2.2.0 and adds extensive features: new graph traversal and existence APIs, agent refactors with token accounting and tool modes, graph consolidation and ingestion managers, expanded search/retrieval endpoints, token utilities, and multiple new tools and background tasks. No breaking removals. Changes
Sequence Diagram(s)sequenceDiagram
participant Client as Client
participant Architect as ArchitectAgent<br/>(tooler)
participant Tools as Architect Tools
participant Janitor as JanitorAgent
participant KG as GraphAdapter
participant Emb as EmbeddingsAdapter
Client->>Architect: run_tooler(text, entities)
Architect->>Tools: request tool actions (create/get remaining/mark used)
Tools->>Janitor: validate/normalize relationships
Janitor->>KG: neighborhood/exists checks & normalization
Janitor-->>Tools: return fixes / status
Tools->>Architect: update relationships_set / used_entities_set
Architect->>Emb: similarity checks for deduplication
Architect-->>Client: List[ArchitectAgentRelationship]
sequenceDiagram
participant Client as Client
participant API as Retrieve Controller
participant Retriever as EventSynergyRetriever
participant Emb as EmbeddingsAdapter
participant VS as VectorStoreAdapter
participant KG as GraphAdapter
Client->>API: GET /retrieve/entity/info (target, query, depth)
API->>Retriever: retrieve_matches(target, query, depth)
Retriever->>Emb: embed(query)
Emb-->>Retriever: query_embedding
Retriever->>Emb: embed(target)
Emb-->>Retriever: target_embedding
Retriever->>VS: search(target_embedding)
VS-->>Retriever: target_node_vector
Retriever->>KG: get_nodes_by_uuid(target_uuid)
KG-->>Retriever: target_node
loop depth
Retriever->>KG: get_neighbors(current_node)
KG-->>Retriever: neighbors
Retriever->>Emb: embed(neighbor descriptions)
Emb-->>Retriever: neighbor_embeddings
Retriever->>Retriever: compute similarities and build MatchPath
end
Retriever-->>API: MatchPath
API-->>Client: GetEntityInfoResponse
Estimated code review effort🎯 4 (Complex) | ⏱️ ~55 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
Note
Due to the large number of review comments, Critical severity comments were prioritized as inline comments.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (8)
src/lib/llm/client_small.py (1)
98-114: Defaultmax_new_tokens=100000exceeds Gemini model limits and will cause API failures.The hardcoded 100,000 default exceeds the output token limit for all current Gemini models (most support 65,536 max; Gemini 2.0 Flash supports only 8,192). Requests with
max_output_tokensabove the model's limit will fail parameter validation. Default toNoneinstead and only sendmax_output_tokenswhen explicitly provided.Also fix the conditional check from
if max_new_tokens else {}toif max_new_tokens is not Noneto correctly handlemax_new_tokens=0.🔧 Suggested adjustment
def generate_text( - self, prompt: str, max_new_tokens: int = 100000, timeout: int = None + self, prompt: str, max_new_tokens: int | None = None, timeout: int = None ) -> str: if not prompt or len(prompt.strip()) == 0: return "Input prompt is empty" timeout = timeout or self.default_timeout from google.genai.types import GenerateContentConfig def _generate(): return self.client.models.generate_content( model=self.model, contents=[prompt], config=GenerateContentConfig( response_mime_type="text/plain", - **({"max_output_tokens": max_new_tokens} if max_new_tokens else {}), + **( + {"max_output_tokens": max_new_tokens} + if max_new_tokens is not None + else {} + ), ), ) def generate_json( self, prompt: str, - max_new_tokens: int = 100000, + max_new_tokens: int | None = None, max_retries: int = 3, timeout: int = None, ) -> dict: timeout = timeout or self.default_timeout from google.genai.types import GenerateContentConfig def _generate(): _response = self.client.models.generate_content( model=self.model, contents=[prompt], config=GenerateContentConfig( response_mime_type="application/json", - **({"max_output_tokens": max_new_tokens} if max_new_tokens else {}), + **( + {"max_output_tokens": max_new_tokens} + if max_new_tokens is not None + else {} + ), ), )src/workers/app.py (1)
48-54:visibility_timeoutis much lower than task time limits.The
visibility_timeout(7200s = 2 hours) inbroker_transport_optionsis significantly lower than the task time limits (86400s = 24 hours). If a task runs longer than the visibility timeout, the broker may re-deliver it to another worker, causing duplicate execution. Consider increasingvisibility_timeoutto at least matchtask_time_limit.🐛 Proposed fix
broker_transport_options={ - "visibility_timeout": 7200, + "visibility_timeout": 86400, # Match task_time_limit to prevent duplicate execution "fanout_prefix": True, "fanout_patterns": True, },src/core/agents/tools/janitor_agent/JanitorAgentSearchEntitiesTool.py (1)
71-79: Fix_queriesinitialization and empty-result check.
_queriesis undefined when args are provided (UnboundLocalError), andlen(found_entities)is the wrong predicate; it should checkfounds, otherwise the first query always gets the “not found” message.🐛 Proposed fix
- _query = "" - if len(args) > 0: - _query = args[0] - else: - _queries = kwargs.get("queries", []) + _queries = [] + if len(args) > 0: + _queries = args[0] if isinstance(args[0], list) else [args[0]] + else: + _queries = kwargs.get("queries", []) @@ - if len(found_entities) == 0: + if len(founds) == 0: found_entities[_clean_query(_query)] = ( "Knowledge graph does not contain any entities that match the query" ) else: found_entities[_clean_query(_query)] = [ f.model_dump(mode="json") for f in founds ]Also applies to: 119-126
src/services/api/routes/retrieve.py (1)
122-127: Incomplete endpoint implementation.The
/contextendpoint has no implementation - the function body is empty with no return statement. This will returnNoneimplicitly, which FastAPI will convert to anullJSON response.Suggested fix
Either implement the endpoint or remove it if it's not ready:
`@retrieve_router.get`("/context") async def get_context(request): """ Get the context of an entity. """ + raise NotImplementedError("Endpoint not yet implemented")Or remove the endpoint entirely if not needed.
src/core/agents/kg_agent.py (1)
171-171: Passtype_="normal"to_get_agentto avoid a misconfigured prompt.
_get_agentnow requirestype_, but several callers still pass the old positional args. This results insystem_prompt=None, which can break agent behavior.🐛 Suggested fix
- self._get_agent(identification_params, metadata, brain_id=brain_id) + self._get_agent( + type_="normal", + identification_params=identification_params, + metadata=metadata, + brain_id=brain_id, + )- self._get_agent(identification_params, metadata={}, brain_id=brain_id) + self._get_agent( + type_="normal", + identification_params=identification_params, + metadata={}, + brain_id=brain_id, + )- self._get_agent( - identification_params={}, - metadata={}, - tools=[ - KGAgentExecuteGraphOperationTool( - self, - self.kg, - self.database_desc, - ) - ], - output_schema=RetrieveNeighborsOutputSchema, - extra_system_prompt=extra_system_prompt_str, - ) + self._get_agent( + type_="normal", + identification_params={}, + metadata={}, + tools=[ + KGAgentExecuteGraphOperationTool( + self, + self.kg, + self.database_desc, + ) + ], + output_schema=RetrieveNeighborsOutputSchema, + extra_system_prompt=extra_system_prompt_str, + )Also applies to: 210-210, 249-261
src/lib/neo4j/client.py (2)
202-233: Avoid overwriting ingestion metadata when node metadata exists.
metadatafrom ingestion is replaced bynode.metadata, which drops caller-supplied metadata. Merge both and avoid re-overwriting in the attribute loop.🛠️ Proposed fix
- all_properties = { - **(node.properties or {}), - "metadata": metadata, - } + merged_metadata = {**(metadata or {}), **(node.metadata or {})} + all_properties = { + **(node.properties or {}), + "metadata": merged_metadata or None, + } @@ - for attr in attributes: - if getattr(node, attr, None): - all_properties[attr] = getattr(node, attr) + for attr in attributes: + if attr == "metadata": + continue + if getattr(node, attr, None) is not None: + all_properties[attr] = getattr(node, attr)
486-504: Return the newly added node fields inget_by_uuids.You now map
polarity,happened_at,last_updated,observations, andmetadatafrom the record, but the Cypher doesn’t return them—so they’ll always be defaults.🛠️ Proposed fix
- RETURN n.uuid as uuid, n.name as name, labels(n) as labels, n.description as description, properties(n) as properties + RETURN n.uuid as uuid, n.name as name, labels(n) as labels, n.description as description, + properties(n) as properties, + n.polarity as polarity, n.happened_at as happened_at, n.last_updated as last_updated, + n.observations as observations, n.metadata as metadatasrc/core/agents/architect_agent.py (1)
370-379: Include required token fields inArchitectAgentResponse.
ArchitectAgentResponserequiresinput_tokensandoutput_tokens, but the success path omits them, which will raise validation errors.🛠️ Proposed fix
return ArchitectAgentResponse( new_nodes=[ ArchitectAgentNew(**new_node.model_dump(mode="json")) for new_node in all_new_nodes ], relationships=[ ArchitectAgentRelationship(**relationship.model_dump(mode="json")) for relationship in all_relationships ], + input_tokens=self.input_tokens, + output_tokens=self.output_tokens, )
🤖 Fix all issues with AI agents
In `@src/adapters/graph.py`:
- Around line 537-540: There is an incomplete if statement checking
similarity_threshold and fd_pred["metadata"]["v_id"] in src/adapters/graph.py
which will cause a syntax error; either remove the unused conditional or
complete it by explicitly comparing fd_pred["metadata"]["v_id"] to the intended
value and adding the body (for example, perform the intended filter action such
as continuing the loop or applying a threshold check). Locate the block
referencing similarity_threshold and fd_pred (use those exact symbols) and
either delete the partial "if fd_pred['metadata']['v_id']" line or replace it
with a full condition and a concrete action (e.g., if
fd_pred["metadata"]["v_id"] != v_id: continue) to implement the intended
filtering.
In `@src/core/agents/architect_agent.py`:
- Around line 314-317: The code calls a nonexistent method
self.get_token_summary(...) which will raise at runtime; replace that call with
the correct token-update method (e.g.
self.update_token_summary(m.usage_metadata)) or implement a get_token_summary
wrapper that forwards to the class's existing token-summary logic; update the
loop in architect_agent.py to invoke the actual token-summary method (replace
self.get_token_summary with self.update_token_summary or add a get_token_summary
method that calls update_token_summary) so usage metadata is processed without
raising.
In `@src/core/search/entity_info.py`:
- Around line 201-205: Node.happened_at is Optional[str|None] but the code (in
the recency calculation block using days_ago and recency) does datetime.now() -
node.happened_at which will TypeError when happened_at is a string; update the
block that computes days_ago/recency (referencing Node.happened_at and recency)
to first parse string dates into a datetime (e.g., try parsing with
dateutil.parser.parse or datetime.strptime for expected formats), catch parsing
exceptions and fall back to treating the value as missing (days_ago = 0), and
only perform datetime arithmetic when you have a valid datetime object so the
recency formula remains safe.
In `@src/core/search/entity_sibilings.py`:
- Around line 32-41: The code assumes vector_store_adapter.search_vectors(...)
returns a non-empty list and directly accesses target_node_vs[0], which can
raise an IndexError; update the logic in the function containing
target_embedding/target_node_vs so you first check if target_node_vs is truthy
(or len(target_node_vs) > 0) and if it's empty return the same fallback (e.g.,
return None, []) before accessing target_node_vs[0] and calling
graph_adapter.get_by_uuid(target_node_id, brain_id=self.brain_id).
In `@src/services/api/controllers/retrieve.py`:
- Around line 144-151: The filter condition in retrieve.py that builds
fd_v_neighbors_embeddings_map is wrongly requiring v.id.isdigit(), which never
holds for UUID vector IDs (created via node.uuid/str(uuid4())); update the
if-check in the block using cosine_similarity and v.id to remove the digit-only
constraint so it only checks cosine_similarity(looking_for_v.embeddings,
v.embeddings) > 0.5, that v.id exists, and that not v.id.replace("-",
"").isalpha(); remove the v.id.isdigit() clause so the neighbor map is populated
correctly.
🟠 Major comments (19)
src/core/agents/tools/chat_agent/__init__.py-4-7 (1)
4-7: Remove personal email/PII from source header.Embedding personal email addresses in source code can violate privacy/compliance policies and creates unnecessary PII retention. Prefer SCM metadata or a team alias without PII.
src/core/plugins/__init__.py-1-9 (1)
1-9: Remove or obfuscate personal email addresses to avoid PII exposure.The docstring contains personal email addresses that will be committed to version control and remain in git history. This could raise compliance/privacy concerns (GDPR, CCPA) and violates best practices for handling PII in source code.
Consider using a generic team email, GitHub username, or removing the email entirely.
🔒 Proposed fix to remove email addresses
""" File: /__init__.py Created Date: Monday January 12th 2026 -Author: Christian Nonis <alch.infoemail@gmail.com> +Author: Christian Nonis ----- Last Modified: Monday January 12th 2026 9:22:09 pm -Modified By: the developer formerly known as Christian Nonis at <alch.infoemail@gmail.com> +Modified By: Christian Nonis ----- """src/core/layers/outputs/__init__.py-1-8 (1)
1-8: Remove personal email/PII from module header.The docstring embeds a personal email address, which is a privacy/compliance risk and unnecessary in source headers. Prefer VCS history, CODEOWNERS, or org-level contact info.
🧹 Proposed fix
-""" -File: /__init__.py -Created Date: Tuesday January 13th 2026 -Author: Christian Nonis <alch.infoemail@gmail.com> ------ -Last Modified: Tuesday January 13th 2026 8:48:57 pm -Modified By: the developer formerly known as Christian Nonis at <alch.infoemail@gmail.com> ------ -""" +"""Outputs layer package initializer."""src/core/layers/graph_consolidation/graph_consolidation.py-132-140 (1)
132-140: Token details collected before operation executes.Token counts are appended to
token_detailson lines 132-139 beforerun_graph_consolidator_operatoris called on line 140. This means the captured tokens will be zero or initialization values, not reflecting the actual LLM usage from the operation.🐛 Proposed fix: move token collection after the operation
kg_agent = KGAgent( llm_adapter=llm_small_adapter, cache_adapter=cache_adapter, kg=graph_adapter, vector_store=vector_store_adapter, embeddings=embeddings_adapter, database_desc=_neo4j_client.graphdb_description, ) - token_details.append( - token_detail_from_token_counts( - kg_agent.input_tokens, - kg_agent.output_tokens, - kg_agent.cached_tokens, - kg_agent.reasoning_tokens, - ) - ) kg_agent.run_graph_consolidator_operator(task, brain_id=brain_id) + token_details.append( + token_detail_from_token_counts( + kg_agent.input_tokens, + kg_agent.output_tokens, + kg_agent.cached_tokens, + kg_agent.reasoning_tokens, + ) + )src/core/backups/backup_creator.py-13-31 (1)
13-31: Fix undefinedBackupand avoid silentNonereturns.
Backupis undefined (will cause runtimeNameError) and is not imported or defined anywhere in the codebase. The stub returnsNoneimplicitly viapass, which violates the declared return type contract. Use a forward reference string annotation and raiseNotImplementedErrorto provide a clear signal that the function is not yet implemented.🔧 Proposed fix
-def create_backup(brain_id: str) -> Backup: +def create_backup(brain_id: str) -> "Backup": @@ - pass + raise NotImplementedError("create_backup is not implemented yet")src/core/search/entity_sibilings.py-43-46 (1)
43-46: Potential KeyError when accessing neighbors dictionary.If
target_node_idis not present in_neighbors, line 46 will raise aKeyError. Use.get()with a default value for safer access.🐛 Proposed fix
_neighbors = graph_adapter.get_neighbors( [target_node_id], brain_id=self.brain_id ) - neighbors = _neighbors[target_node_id] + neighbors = _neighbors.get(target_node_id, [])src/workers/app.py-48-49 (1)
48-49: Soft time limit equals hard time limit, defeating graceful shutdown.Setting
task_soft_time_limitequal totask_time_limit(both 86400s) defeats the purpose of having a soft limit. The soft limit should be lower to give tasks time to clean up gracefully before the hard limit kills them. Consider settingtask_soft_time_limitto a lower value (e.g., 82800s = 23 hours).🐛 Proposed fix
- task_time_limit=86400, - task_soft_time_limit=86400, + task_time_limit=86400, + task_soft_time_limit=82800, # 23 hours - allows 1 hour for graceful shutdownsrc/lib/mongo/client.py-400-408 (1)
400-408: Update operation does not verify document existence or update success.The
update_onecall returns the inputstructured_dataregardless of whether a document was actually found and updated. Consider checkingmatched_countormodified_countfrom the update result to handle the case where no document matches the query.🐛 Proposed fix
def update_structured_data( self, structured_data: StructuredData, brain_id: str ) -> StructuredData: collection = self.get_collection("structured_data", database=brain_id) - collection.update_one( + result = collection.update_one( {"id": structured_data.id}, {"$set": structured_data.model_dump(mode="json")}, ) + if result.matched_count == 0: + raise ValueError(f"No structured data found with id: {structured_data.id}") return structured_dataAlternatively, if you want upsert behavior (create if not exists):
- collection.update_one( + collection.update_one( {"id": structured_data.id}, {"$set": structured_data.model_dump(mode="json")}, + upsert=True, )src/core/search/entity_sibilings.py-26-28 (1)
26-28: Thepolarityparameter is unused and indicates incomplete implementation.The method declares
polarity: Literal["same", "opposite"]but never uses it in the method body. The type hint suggests the parameter should filter results by same or opposite polarity connections, but this logic is not implemented. The method always returns the same results regardless of the polarity argument value. This needs to be completed or removed.src/core/search/entity_context.py-25-53 (1)
25-53: Return arity/type mismatch inget_context.The signature says 3-tuple, early returns return 2 values, and the final return has 4 values. Also,
text_contextsis aset[str], notlist[dict]. This will break callers expecting a consistent tuple shape.🐛 Proposed fix
-from typing import Tuple +from typing import Optional, Tuple, List @@ - def get_context( - self, context_depth: int = 3 - ) -> Tuple[Node, list[dict], list[dict]]: + def get_context( + self, context_depth: int = 3 + ) -> Tuple[Optional[Node], list[dict], list[str], list[dict]]: @@ - if not target_node_vs: - return (None, []) + if not target_node_vs: + return (None, [], [], []) @@ - if not target_node: - return (None, []) + if not target_node: + return (None, [], [], [])Also applies to: 87-89
src/core/search/entity_context.py-55-81 (1)
55-81: Guard against missing predicate/node before attribute access.
predicate.description,predicate.direction, andnode.nameare used unconditionally. If any neighborhood entry lacks these, this will throw.🐛 Proposed fix
for nn in neighbors: node = nn.get("node") if node and hasattr(node, "description") and node.description: text_contexts.add(node.description) predicate = nn.get("predicate") @@ - entry = { - "description": predicate.description, - "information_direction": predicate.direction, - predicate.name: { - node.name: node.description, - "info": nested_info, - }, - } + if not node or not predicate: + continue + entry = { + "description": predicate.description, + "information_direction": predicate.direction, + predicate.name: { + node.name: node.description, + "info": nested_info, + }, + }src/services/api/routes/retrieve.py-401-410 (1)
401-410: Mutable default argument.Using
[]as a default argument (line 404) is problematic because mutable defaults are shared across function calls. UseNoneand initialize inside the function.Suggested fix
`@retrieve_router.get`(path="/entity/status") async def get_entity_status( target: str, - types: Optional[List[str]] = [], + types: Optional[List[str]] = None, brain_id: str = "default", ): """ Get the entity status for a given target. """ - return await get_entity_status_controller(target, types, brain_id) + return await get_entity_status_controller(target, types or [], brain_id)src/workers/tasks/ingestion.py-494-532 (1)
494-532: Ensure new structured data is persisted and guardtextual_dataaccess.
new_structured_dataremainsNonewhen there’s no existing record, sosave_structured_datawill likely error or no-op. Also,len(element.textual_data.keys())will raise iftextual_dataisNone.🐛 Suggested fix
- new_structured_data = None - - if existing_structured_data: + new_structured_data = None + + if existing_structured_data: new_structured_data = existing_structured_data.model_copy( update={ "data": { **( existing_structured_data.data if existing_structured_data.data else {} ), **(element.json_data if element.json_data else {}), }, "types": [ *existing_structured_data.types, *element.types, ], "metadata": { **( existing_structured_data.metadata if existing_structured_data.metadata else {} ), **(element.metadata if element.metadata else {}), }, "brain_version": BRAIN_VERSION, }, ) + else: + new_structured_data = StructuredData( + id=uuid, + data=element.json_data if element.json_data else {}, + types=element.types, + metadata=element.metadata if element.metadata else {}, + brain_version=BRAIN_VERSION, + ) ... - ( - json.dumps(element.textual_data, indent=2) - if len(element.textual_data.keys()) > 0 - else description - ), + ( + json.dumps(element.textual_data, indent=2) + if element.textual_data and len(element.textual_data) > 0 + else description + ),Also applies to: 584-589
src/core/saving/auto_kg.py-70-76 (1)
70-76: Cap the timeout derived from input length.
timeout=360 * len(input)becomes extremely large for long inputs and zero for empty input, which can stall workers or force immediate timeouts. Consider a min/max bound.🐛 Suggested fix
+ timeout = max(30, min(1800, 2 * len(input))) architect_agent.run_tooler( input, entities.entities, targeting=targeting, brain_id=brain_id, - timeout=360 * len(input), + timeout=timeout, )src/core/agents/tools/architect_agent/ArchitectAgentCreateRelationshipTool.py-400-413 (1)
400-413: Avoid enqueuing ingestion before validatingwrong_relationships.The ingestion task is dispatched and relationships are appended even if
janitor_response.wrong_relationshipsis non-empty (checked later). This can persist invalid relationships despite returning an error.🐛 Suggested fix
- if relationships_data: - from src.workers.tasks.ingestion import process_architect_relationships - - print( - "[DEBUG (architect_agent_create_relationship)]: Sending relationships to ingestion task" - ) - process_architect_relationships.delay( - { - "relationships": relationships_data, - "brain_id": self.brain_id, - } - ) - - self.architect_agent.relationships_set.extend(output_rels) - - wrong_relationships = [] - if getattr(janitor_response, "wrong_relationships", []): - wrong_relationships = getattr(janitor_response, "wrong_relationships", []) + wrong_relationships = getattr(janitor_response, "wrong_relationships", []) or [] + if wrong_relationships: + return { + "status": "ERROR", + "wrong_relationships": janitor_response.wrong_relationships, + "newly_created_nodes": newly_created_nodes, + } + + if relationships_data: + from src.workers.tasks.ingestion import process_architect_relationships + + print( + "[DEBUG (architect_agent_create_relationship)]: Sending relationships to ingestion task" + ) + process_architect_relationships.delay( + { + "relationships": relationships_data, + "brain_id": self.brain_id, + } + ) + + self.architect_agent.relationships_set.extend(output_rels)src/services/api/controllers/entities.py-89-118 (1)
89-118: Guard against empty graph lookups and missing neighbors.
get_by_uuids(...)[0]andrel_tuples[target_node.uuid]can raise when the vector index is stale or a node has been deleted. Add defensive checks to keep the endpoint from 500s.🐛 Suggested fix
- for target_node_v in target_node_vs: - target_node_id = target_node_v.metadata.get("uuid") - target_node = graph_adapter.get_by_uuids([target_node_id], brain_id=brain_id)[0] + for target_node_v in target_node_vs: + target_node_id = target_node_v.metadata.get("uuid") + if not target_node_id: + continue + nodes = graph_adapter.get_by_uuids([target_node_id], brain_id=brain_id) + if not nodes: + continue + target_node = nodes[0] ... - rel_tuples = graph_adapter.get_neighbors([target_node], brain_id=brain_id) + rel_tuples = graph_adapter.get_neighbors([target_node], brain_id=brain_id) + neighbors = rel_tuples.get(target_node.uuid, []) ... - has_relationships=len(rel_tuples[target_node.uuid]) > 0, - relationships=rel_tuples[target_node.uuid], + has_relationships=len(neighbors) > 0, + relationships=neighbors,src/constants/prompts/architect_agent.py-40-94 (1)
40-94: Make example JSON valid to avoid malformed outputs.The sample outputs contain
"relationships: [(missing quote) and//comments, which contradict “Return ONLY JSON” and can prompt invalid responses from the model.🛠️ Proposed fix
- "relationships: [ + "relationships": [ @@ - "new_nodes": [] // No new nodes were created in this example + "new_nodes": [] @@ - "reason": "The entity was missing from the entities found by the scout." // Why the node was created by you + "reason": "The entity was missing from the entities found by the scout."Also applies to: 112-144
src/lib/neo4j/client.py-280-303 (1)
280-303: Only use predicate attributes when populating relationship fields.The current loop copies subject/object attributes onto the relationship, which can overwrite relationship metadata and leak node properties into
r.*.🛠️ Proposed fix
- objects = [subject, to_object, predicate] + objects = [predicate] @@ - if value: + if value is not None: extra_ops += f""" SET r.{attr} = {self._format_value(value)} """src/constants/agents.py-78-85 (1)
78-85: Makeflow_keyoptional or provide a default.Upstream outputs (
_ArchitectAgentRelationship) don’t includeflow_key, so constructingArchitectAgentRelationshipwill fail unless every response includes it.🛠️ Proposed fix
- flow_key: str + flow_key: Optional[str] = None
🟡 Minor comments (16)
src/core/plugins/__init__.py-2-2 (1)
2-2: Correct the file path in the docstring.The file path shows
/__init__.pybut the actual file is located atsrc/core/plugins/__init__.py. This inconsistency could cause confusion during maintenance.📝 Proposed fix
-File: /__init__.py +File: /src/core/plugins/__init__.pysrc/lib/milvus/readme.txt-4-4 (1)
4-4: Pin the Docker image tag for reproducibility.The
zilliz/attu:latesttag makes the setup non-deterministic across time. Thedocker-compose.yamlin this directory already pins all other service versions (etcd:v3.5.18,minioandmilvus:v2.6.4), so the documentation should follow the same practice for consistency.🔧 Suggested update
-docker run -p 3500:3000 --network milvus -e MILVUS_URL=standalone:19530 zilliz/attu:latest +docker run -p 3500:3000 --network milvus -e MILVUS_URL=standalone:19530 zilliz/attu:<PINNED_VERSION>Makefile-64-64 (1)
64-64: Shell syntax issue in environment variable export.The
export $(DEBUG_ENVS) ENV="development"syntax is problematic. TheENV="development"after$(DEBUG_ENVS)won't be properly exported—it will be interpreted as an argument toexportbut the lack of separation means it may not set the variable correctly in all shells.🔧 Suggested fix
- bash -c "export $(DEBUG_ENVS) ENV="development" && poetry run celery -A src.workers.app worker --loglevel=info --pool=threads --concurrency=10"; \ + bash -c "export $(DEBUG_ENVS) ENV='development' && poetry run celery -A src.workers.app worker --loglevel=info --pool=threads --concurrency=10"; \Or more explicitly:
- bash -c "export $(DEBUG_ENVS) ENV="development" && poetry run celery -A src.workers.app worker --loglevel=info --pool=threads --concurrency=10"; \ + bash -c "export $(DEBUG_ENVS) && export ENV='development' && poetry run celery -A src.workers.app worker --loglevel=info --pool=threads --concurrency=10"; \src/core/agents/tools/janitor_agent/JanitorAgentGetSchemaTool.py-30-34 (1)
30-34: Update thetargetdescription to includeevent_names.
The enum includesevent_names, but the description still says only node/relationship types.✏️ Suggested fix
- "description": "The target to get the schema of. (node_labels or relationship_types)", + "description": "The target to get the schema of. (node_labels, relationship_types, or event_names)",src/config.py-193-197 (1)
193-197: NormalizeRUN_GRAPH_CONSOLIDATORparsing to avoid case-sensitivity pitfalls.
Common env values likeTrueor1will currently resolve toFalse.🛠️ Suggested fix
- self.run_graph_consolidator = ( - os.getenv("RUN_GRAPH_CONSOLIDATOR", "true") == "true" - ) + self.run_graph_consolidator = os.getenv( + "RUN_GRAPH_CONSOLIDATOR", "true" + ).lower() in {"true", "1", "yes"}src/core/search/entity_sibilings.py-22-28 (1)
22-28: Class and method naming inconsistencies with typos.The class name
EntitySinergyRetrievershould beEntitySynergyRetriever, and the methodretrieve_sibilingsshould beretrieve_siblings. Additionally, the filenameentity_sibilings.pyshould beentity_siblings.py. These naming inconsistencies will make the codebase harder to search and maintain.src/constants/prompts/janitor_agent.py-96-163 (1)
96-163: Additional typos in ATOMIC_JANITOR_AGENT_SYSTEM_PROMPT.
- Line 118: "exaustive" → "exhaustive"
- Line 153: "wiht" → "with"
📝 Proposed fix
-When you execute search operations in the knowledge graph and the results are empty (meaning that entities are not found in the knowledge graph), don't complain, -it simply mean that the entities are new and not in the knowledge graph yet, be exaustive the first time you search with different options for the same entity. +When you execute search operations in the knowledge graph and the results are empty (meaning that entities are not found in the knowledge graph), don't complain, +it simply means that the entities are new and not in the knowledge graph yet, be exhaustive the first time you search with different options for the same entity. ... - "fixed_relationships": [{{...}}], // return here the relationships that you already fixed wiht small fixes. + "fixed_relationships": [{{...}}], // return here the relationships that you already fixed with small fixes.src/constants/prompts/janitor_agent.py-48-86 (1)
48-86: Multiple typos in the prompt text.Several typos exist in this prompt that could affect clarity:
- Line 58: "rapresent" → "represent"
- Line 61: "belong" → "they belong"
- Line 72: "reffer" → "refer"
📝 Proposed fix
- (eg: (PERSON:John {{ "uuid": "uuid_1" }})-[ACCOMPLISHED_ACTION]->(EVENT:Trip {{ "uuid": "uuid_3" }})-[INTO_LOCATION]->(CITY:New York {{ "uuid": "uuid_4" }}) and (PERSON:John {{ "uuid": "uuid_1" }})-[MOVED]->(EVENT:Went to {{ "uuid": "uuid_5" }})-[HAPPENED_WITHIN]->(CITY:New York {{ "uuid": "uuid_4" }})", - In the example above Trip and Went to are unified into a single event node since they rapresent the same thing. + In the example above Trip and Went to are unified into a single event node since they represent the same thing. - Adding connections between nodes that are not directly connected but are related to create a more coherent graph: - Connect nodes that are far from each other in the graph because the belong to different contexts but are related. + Connect nodes that are far from each other in the graph because they belong to different contexts but are related. ... -Each task should include info in also the nodes/relationships/properties/triplets that are involved in the task so that the kg_agent can execute the tasks and reffer to the correct nodes/relationships/properties/triplets. +Each task should include info on the nodes/relationships/properties/triplets that are involved in the task so that the kg_agent can execute the tasks and refer to the correct nodes/relationships/properties/triplets.src/services/api/routes/retrieve.py-44-49 (1)
44-49: Typo in function name: "sibilings" should be "siblings".The import
get_entity_sibilingscontains a typo. This appears to originate from the controller file (src/services/api/controllers/entities.py). Consider renaming for consistency.src/constants/kg.py-69-72 (1)
69-72: Typo in description."identitier" should be "identifier".
Suggested fix
flow_key: Optional[str] = Field( default=None, - description="Unique identitier for contextualizing the predicate into the context flow", + description="Unique identifier for contextualizing the predicate into the context flow", )src/constants/prompts/scout_agent.py-67-68 (1)
67-68: Typo in example."Partecipated" should be "Participated".
Suggested fix
- {{"type": "EVENT", "name": "Partecipated in", "description": "The company was part of a joint venture with Apple and Google", "polarity": "positive"}}, + {{"type": "EVENT", "name": "Participated in", "description": "The company was part of a joint venture with Apple and Google", "polarity": "positive"}},src/constants/prompts/scout_agent.py-54-58 (1)
54-58: Typo and pronoun inconsistency in example.
- "Partecipated" should be "Participated" (line 56)
- "his colleagues" should be "her colleagues" since it refers to Mary (lines 56-58)
Suggested fix
- {{"type": "EVENT", "name": "Partecipated in", "description": "Mary was doing meetings with his colleagues in San Francisco", "polarity": "neutral"}}, - {{"type": "EVENT", "name": "Meetings", "description": "Mary was doing meetings with his colleagues in San Francisco", "polarity": "neutral"}}, - {{"type": "PERSON", "name": "Colleagues", "description": "The colleagues Mary was doing meetings with in San Francisco", "polarity": "neutral"}}, + {{"type": "EVENT", "name": "Participated in", "description": "Mary was doing meetings with her colleagues in San Francisco", "polarity": "neutral"}}, + {{"type": "EVENT", "name": "Meetings", "description": "Mary was doing meetings with her colleagues in San Francisco", "polarity": "neutral"}}, + {{"type": "PERSON", "name": "Colleagues", "description": "The colleagues Mary was doing meetings with in San Francisco", "polarity": "neutral"}},src/core/saving/auto_kg.py-106-118 (1)
106-118: Guard against divide-by-zero when input is empty.
len(input)can be 0, which would raiseZeroDivisionError.🐛 Suggested fix
+ char_count = max(len(input), 1) print( f"> Cost/Character -> ${((token_details.input.total * config.pricing.input_token_price + token_details.output.total * config.pricing.output_token_price) / len(input)):.12f}" )src/core/agents/tools/architect_agent/ArchitectAgentCreateRelationshipTool.py-260-268 (1)
260-268: Token detail should reflect JanitorAgent usage, not ArchitectAgent.
janitor_token_detailis currently computed fromarchitect_agentcounters, which likely double-counts or misses janitor usage.🐛 Suggested fix
- janitor_token_detail = token_detail_from_token_counts( - self.architect_agent.input_tokens, - self.architect_agent.output_tokens, - self.architect_agent.cached_tokens, - self.architect_agent.reasoning_tokens, - ) + janitor_token_detail = token_detail_from_token_counts( + janitor_agent.input_tokens, + janitor_agent.output_tokens, + janitor_agent.cached_tokens, + janitor_agent.reasoning_tokens, + )src/core/agents/tools/architect_agent/ArchitectAgentCreateRelationshipTool.py-424-438 (1)
424-438: Normalize tool return type—all branches should return JSON strings, not dicts.The
_runmethod is declared to returnstr, but the error branch (line 425) returns a dict while the success branch (line 434) returns a JSON string. This type inconsistency can break downstream consumers expecting uniform handling. Convert the error dict tojson.dumps()to match the success path and the declared return type.Suggested fix
- return { - "status": "ERROR", - "wrong_relationships": janitor_response.wrong_relationships, - "newly_created_nodes": newly_created_nodes, - } + return json.dumps( + { + "status": "ERROR", + "wrong_relationships": janitor_response.wrong_relationships, + "newly_created_nodes": newly_created_nodes, + } + )src/core/agents/janitor_agent.py-175-186 (1)
175-186: Reset token counters at the start of run methods.Token counters accumulate across repeated calls via the
+=operator in_update_token_counts. To track per-call metrics, reset them at method entry. Therunmethod inarchitect_agent.pyuses this pattern (lines 195-196), confirming it as the intended approach.🛠️ Proposed fix
def run_graph_consolidator( self, new_relationships: List[ArchitectAgentRelationship], brain_id: str = "default", timeout: int = 90, max_retries: int = 3, ) -> list[str]: """ Runs a final janitor consolidation across the kg snapshot """ + self.input_tokens = 0 + self.output_tokens = 0 + self.cached_tokens = 0 + self.reasoning_tokens = 0 + self.token_detail = None @@ def run( self, input_output: JanitorAgentInputOutput, text: str, targeting: Optional[Node] = None, brain_id: str = "default", timeout: int = 90, max_retries: int = 3, ) -> JanitorAgentInputOutput: """ Run the janitor agent. """ + self.input_tokens = 0 + self.output_tokens = 0 + self.cached_tokens = 0 + self.reasoning_tokens = 0 + self.token_detail = None @@ def run_atomic_janitor( self, input_relationships: List[ArchitectAgentRelationship], text: str, targeting: Optional[Node] = None, brain_id: str = "default", timeout: int = 90, max_retries: int = 3, ) -> AtomicJanitorAgentInputOutput | str: """ Run the atomic janitor agent. Can return istructions on how to correct the wrong relationships or 'OK' if the relationships are correct. """ + self.input_tokens = 0 + self.output_tokens = 0 + self.cached_tokens = 0 + self.reasoning_tokens = 0 + self.token_detail = None
|
Note Docstrings generation - SUCCESS |
Docstrings generation was requested by @ChrisCoder9000. * #6 (comment) The following files were modified: * `src/adapters/data.py` * `src/adapters/embeddings.py` * `src/adapters/graph.py` * `src/adapters/interfaces/data.py` * `src/adapters/interfaces/embeddings.py` * `src/adapters/interfaces/graph.py` * `src/config.py` * `src/core/agents/architect_agent.py` * `src/core/agents/janitor_agent.py` * `src/core/agents/kg_agent.py` * `src/core/agents/scout_agent.py` * `src/core/agents/tools/architect_agent/ArchitectAgentCheckUsedEntitiesTool.py` * `src/core/agents/tools/architect_agent/ArchitectAgentCreateRelationshipTool.py` * `src/core/agents/tools/architect_agent/ArchitectAgentGetRemainingEntitiesToProcessTool.py` * `src/core/agents/tools/architect_agent/ArchitectAgentMarkEntitiesAsUsedTool.py` * `src/core/agents/tools/janitor_agent/JanitorAgentGetSchemaTool.py` * `src/core/agents/tools/janitor_agent/JanitorAgentSearchEntitiesTool.py` * `src/core/agents/tools/kg_agent/KGAgentExecuteGraphOperationTool.py` * `src/core/backups/backup_creator.py` * `src/core/layers/graph_consolidation/graph_consolidation.py` * `src/core/saving/auto_kg.py` * `src/core/saving/ingestion_manager.py` * `src/core/search/entities.py` * `src/core/search/entity_context.py` * `src/core/search/entity_info.py` * `src/core/search/entity_sibilings.py` * `src/lib/embeddings/client_small.py` * `src/lib/llm/client_small.py` * `src/lib/mongo/client.py` * `src/lib/neo4j/client.py` * `src/services/api/controllers/entities.py` * `src/services/api/controllers/retrieve.py` * `src/services/api/routes/retrieve.py` * `src/utils/tokens.py` * `src/workers/tasks/ingestion.py`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
Note
Due to the large number of review comments, Critical severity comments were prioritized as inline comments.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (6)
src/lib/llm/client_small.py (3)
224-225: Inconsistent JSON response stripping compared toLLMClientLarge.
LLMClientLarge.generate_jsonuses.strip("```json").strip("```")to handle markdown code blocks, but this implementation uses.strip("").strip("```"). The.strip("")is a no-op, and the missing"```json"strip may leave residual markers if the model returns JSON in a fenced code block with a language tag.🐛 Proposed fix
- _response = _response.text.strip("").strip("```") + _response = _response.text.strip("```json").strip("```")
227-234: Same overly broad retry issue asgenerate_text.The
Exceptionbase class in the retry trigger will mask programming errors. Narrow to specific recoverable exceptions.
152-157: Overly broad exception type in retry logic will mask programming errors.Using
Exceptionas a retry trigger will retry on any exception, including programming errors likeTypeError,AttributeError, orKeyError. This masks bugs and makes debugging difficult. Consider narrowing to specific transient/recoverable exceptions from the Google GenAI SDK (e.g.,google.genai.errors.APIError), or rely on the SDK's built-inHttpRetryOptionsinstead of a manual retry decorator.src/adapters/interfaces/embeddings.py (1)
59-81: Interface signature change is correct; all implementations properly aligned.The return type change from
list[Vector]todict[str, list[Vector]]is correctly implemented across all code paths:
- Interface definition: src/adapters/interfaces/embeddings.py (line 67)
- Adapter delegation: src/adapters/embeddings.py
- Milvus implementation: src/lib/milvus/client.py (line 285)
All implementations match the interface contract and enable per-ID result mapping as intended.
Fix the EN DASH character on line 75: Change "0.0–1.0" to use a hyphen-minus (0.0-1.0) for ASCII consistency in docstrings.
src/core/agents/tools/kg_agent/KGAgentExecuteGraphOperationTool.py (1)
73-87: Avoid clobbering a valid positional query.
Ifargs[0]is valid butkwargs["args"]is empty/partial, the current logic overwrites_querywith"", leading to false “No query provided.” Only parse kwargs when_queryis still empty.✅ Guard kwargs parsing
- if len(kwargs) > 0: + if not _query and len(kwargs) > 0: args_query = kwargs.get("args", {}) if isinstance(args_query, dict): _query = args_query.get("query", "")src/core/agents/architect_agent.py (1)
489-498: Include token counts in the success response.The empty-path includes token counts but the success path omits them, despite the docstring.
🐛 Proposed fix
return ArchitectAgentResponse( new_nodes=[ ArchitectAgentNew(**new_node.model_dump(mode="json")) for new_node in all_new_nodes ], relationships=[ ArchitectAgentRelationship(**relationship.model_dump(mode="json")) for relationship in all_relationships ], + input_tokens=self.input_tokens, + output_tokens=self.output_tokens, )
🤖 Fix all issues with AI agents
In `@src/core/backups/backup_creator.py`:
- Line 13: The function create_backup declares a return type Backup which is not
defined or imported in this module; fix by either importing Backup from its
defining module (add an import for Backup), declaring Backup locally if
intended, or enable postponed evaluation of annotations by adding from
__future__ import annotations and changing the annotation to a forward-reference
string "Backup" (or just use "Backup" with the future import); update the
create_backup signature accordingly so the module no longer raises NameError on
import.
In `@src/core/layers/graph_consolidation/graph_consolidation.py`:
- Around line 137-145: The token snapshot is taken too early:
token_detail_from_token_counts(kg_agent.input_tokens, kg_agent.output_tokens,
kg_agent.cached_tokens, kg_agent.reasoning_tokens) is invoked before
kg_agent.run_graph_consolidator_operator(...) runs so all counts are zero; move
the token collection/ token_details.append call to immediately after
kg_agent.run_graph_consolidator_operator(task, brain_id=brain_id) so it reads
the populated kg_agent.input_tokens, output_tokens, cached_tokens and
reasoning_tokens values returned/updated by the operation.
In `@src/core/search/entity_context.py`:
- Around line 32-66: The get_context method returns inconsistent tuple sizes and
has an incorrect type annotation; change its signature to return
Tuple[Optional[Node], list[dict], list[dict], list[dict]] (import Optional if
needed), and update all early-return paths (e.g., where it currently returns
(None, []) after target_node_vs or target_node is missing) to return a
consistent four-element tuple like (None, [], [], []); ensure the final
successful return also matches this 4-tuple shape so callers that unpack four
values do not fail.
In `@src/lib/neo4j/client.py`:
- Around line 1412-1426: In get_schema, change the event_names_query Cypher to
use list membership ('EVENT' IN labels(n)) instead of CONTAINS and ensure the
returned field matches the record key: update the query to RETURN n.name AS name
(so record["name"] works) or adjust the record access to the actual alias;
specifically fix event_names_query and the code that builds event_names (refer
to event_names_query and the list comprehension for event_names) so the Cypher
uses "'EVENT' IN labels(n)" and returns an aliased name that matches
record["name"].
In `@src/services/api/routes/retrieve.py`:
- Around line 131-142: The get_context endpoint currently only has a docstring
and returns None; either implement its logic or explicitly mark it as
unimplemented—import and raise fastapi.HTTPException(status_code=501,
detail="get_context not implemented") (or return fastapi.responses.JSONResponse
with 501) from the get_context function to avoid implicit None responses;
reference the get_context function and ensure you add the necessary import
(HTTPException or JSONResponse) at the top of the module.
🟠 Major comments (20)
src/services/api/routes/system.py-37-42 (1)
37-42: Use POST or DELETE instead of GET for destructive operations.The
resetendpoint performs a destructive action (resetting brain state) but usesGET. This violates HTTP semantics where GET should be safe and idempotent. GET requests can be inadvertently triggered by browser prefetching, link previews, crawlers, or cached proxies, potentially causing unintended data loss.Proposed fix
-@system_router.get(path="/brains/{brain_id}/reset") # TODO +@system_router.post(path="/brains/{brain_id}/reset") # TODO async def reset(brain_id: str):src/services/api/routes/system.py-45-50 (1)
45-50: Use DELETE method for the delete endpoint.Similar to the reset endpoint,
deleteis a destructive operation that should not use GET. The DELETE HTTP method is semantically appropriate here, or POST if you prefer a uniform approach for all mutations.Proposed fix
-@system_router.get(path="/brains/{brain_id}/delete") # TODO +@system_router.delete(path="/brains/{brain_id}") # TODO async def delete(brain_id: str):src/core/agents/tools/janitor_agent/JanitorAgentSearchEntitiesTool.py-94-99 (1)
94-99: Fix kwargs path: it always returns before processing queries.The function returns immediately even when
queriesis provided via kwargs, so kwargs-based calls never execute.✅ Suggested fix
- if len(args) > 0: - _queries = args[0] if isinstance(args[0], list) else [args[0]] - else: - _queries = kwargs.get("queries", []) - return "No queries provided in the arguments or kwargs" + if len(args) > 0: + _queries = args[0] if isinstance(args[0], list) else [args[0]] + else: + _queries = kwargs.get("queries", []) + + if not _queries: + return "No queries provided in the arguments or kwargs"src/core/agents/tools/janitor_agent/JanitorAgentSearchEntitiesTool.py-129-135 (1)
129-135: Vector search is never reached due to embed_text return type mismatch.
EmbeddingsAdapter.embed_textreturns alist[float], but the code expects.embeddings, causing an exception and skipping vector enrichment. This effectively disables vector search.✅ Suggested fix
- query_vector = self.embeddings.embed_text(_query) - if query_vector.embeddings and len(query_vector.embeddings) > 0: - v_results = self.vector_store.search_vectors( - query_vector.embeddings, + query_vector = self.embeddings.embed_text(_query) + if query_vector and len(query_vector) > 0: + v_results = self.vector_store.search_vectors( + query_vector, store="nodes", brain_id=self.brain_id, )src/core/search/entity_sibilings.py-32-34 (1)
32-34: Unusedpolarityparameter indicates incomplete implementation.The
polarityparameter (expected values:"same"or"opposite") is declared and documented as a directive to filter relation types, but is never used in the method body. The method builds connections without any filtering logic based on this parameter, suggesting the implementation is incomplete. Either remove the parameter if it's not needed, or implement the filtering logic to use it.src/core/agents/scout_agent.py-92-108 (1)
92-108: Reset token counters per run to keep per-run usage accurate.
input_tokens,output_tokens,cached_tokens, andreasoning_tokensaccumulate across multiplerun()calls, but the response docstring implies per-run counts. Reset them (andtoken_detail) at the start ofrun().🧮 Reset counters at run start
def run( self, text: str, targeting: Optional[Node] = None, brain_id: str = "default", timeout: int = 90, max_retries: int = 3, ) -> ScoutAgentResponse: """ @@ Raises: TimeoutError: If a single invocation exceeds `timeout`, or if all retry attempts fail due to timeouts. """ + self.input_tokens = 0 + self.output_tokens = 0 + self.cached_tokens = 0 + self.reasoning_tokens = 0 + self.token_detail = NoneAlso applies to: 140-252, 254-275
src/core/agents/tools/architect_agent/ArchitectAgentMarkEntitiesAsUsedTool.py-76-94 (1)
76-94: Positional string UUIDs are never recognized.
The docstring claims a single string is supported, but a positional"uuid"currently falls through and gets ignored.✅ Handle single string positional input
- elif args and len(args) > 0 and isinstance(args[0], list): - entities_to_mark = args[0] + elif args and len(args) > 0 and isinstance(args[0], str): + entities_to_mark = [args[0]] + elif args and len(args) > 0 and isinstance(args[0], list): + entities_to_mark = args[0]src/services/api/controllers/entities.py-124-133 (1)
124-133: Guard against missing nodes fromget_by_uuids.
If the vector store points to a UUID not present in the graph,[0]will raise. Skip missing IDs before selecting.🐛 Suggested fix
- for target_node_v in target_node_vs: - target_node_id = target_node_v.metadata.get("uuid") - target_node = graph_adapter.get_by_uuids([target_node_id], brain_id=brain_id)[0] + for target_node_v in target_node_vs: + target_node_id = target_node_v.metadata.get("uuid") + if not target_node_id: + continue + nodes = graph_adapter.get_by_uuids([target_node_id], brain_id=brain_id) + if not nodes: + continue + target_node = nodes[0]src/core/saving/auto_kg.py-122-126 (1)
122-126: Guard cost-per-character against empty input.
len(input)can be 0, which raisesZeroDivisionErrorduring cost reporting.🐛 Suggested fix
- print( - f"> Cost/Character -> ${((token_details.input.total * config.pricing.input_token_price + token_details.output.total * config.pricing.output_token_price) / len(input)):.12f}" - ) + total_cost = ( + token_details.input.total * config.pricing.input_token_price + + token_details.output.total * config.pricing.output_token_price + ) + cost_per_char = (total_cost / len(input)) if len(input) > 0 else 0.0 + print(f"> Cost/Character -> ${cost_per_char:.12f}")src/core/saving/ingestion_manager.py-59-76 (1)
59-76: Cache key and early return skipv_idpropagation.
Keying by name can collide across entities, and the early return never applies the cachedv_idback tonode_data.properties, so downstream ingestion can miss vector IDs. Prefer keying by UUID (or name+type) and set the cached id on hit.🐛 Suggested fix
- if node_data.name in self.resolved_cache: - return node_data.uuid + cached_v_id = self.resolved_cache.get(node_data.uuid) + if cached_v_id: + node_data.properties = { + **(node_data.properties or {}), + "v_id": cached_v_id, + } + return node_data.uuid @@ - self.resolved_cache[node_data.name] = v_ids[0] + self.resolved_cache[node_data.uuid] = v_ids[0]src/core/agents/tools/architect_agent/ArchitectAgentCreateRelationshipTool.py-423-457 (1)
423-457: Skip ingestion when Janitor reports wrong relationships.
You enqueue ingestion and mutaterelationships_setbefore checkingwrong_relationships, so error paths still write to the KG. Move the error check ahead of side effects.🐛 Suggested fix
- relationships_data = [ - rel.model_dump(mode="json") - for rel in output_rels - if isinstance(rel, ArchitectAgentRelationship) - ] - - if relationships_data: - from src.workers.tasks.ingestion import process_architect_relationships - - print( - "[DEBUG (architect_agent_create_relationship)]: Sending relationships to ingestion task" - ) - process_architect_relationships.delay( - { - "relationships": relationships_data, - "brain_id": self.brain_id, - } - ) - - self.architect_agent.relationships_set.extend(output_rels) - - wrong_relationships = [] - if getattr(janitor_response, "wrong_relationships", []): - wrong_relationships = getattr(janitor_response, "wrong_relationships", []) - - if len(wrong_relationships) > 0: + wrong_relationships = getattr(janitor_response, "wrong_relationships", []) or [] + if wrong_relationships: print( "[DEBUG (architect_agent_create_relationship)]: Wrong relationships: ", getattr(janitor_response, "wrong_relationships", []), ) return { "status": "ERROR", "wrong_relationships": janitor_response.wrong_relationships, "newly_created_nodes": newly_created_nodes, } + + relationships_data = [ + rel.model_dump(mode="json") + for rel in output_rels + if isinstance(rel, ArchitectAgentRelationship) + ] + + if relationships_data: + from src.workers.tasks.ingestion import process_architect_relationships + + print( + "[DEBUG (architect_agent_create_relationship)]: Sending relationships to ingestion task" + ) + process_architect_relationships.delay( + { + "relationships": relationships_data, + "brain_id": self.brain_id, + } + ) + + self.architect_agent.relationships_set.extend(output_rels)src/adapters/graph.py-429-453 (1)
429-453:v_idextraction never finds IDs and can raise KeyError.
getattr(n.properties, "v_id", None)on a dict is always None, sov_idsis empty; direct indexing can also throw when the property is missing. Use.getwith guards.🐛 Suggested fix
- v_ids = [ - n.properties["v_id"] for n in nodes if getattr(n.properties, "v_id", None) - ] + v_ids = [ + getattr(n, "properties", {}).get("v_id") + for n in nodes + if getattr(n, "properties", {}).get("v_id") + ] @@ - all_fd_v_ids: List[str] = [ - fd[1].properties["v_id"] - for fds in all_fd_nodes.values() - for fd in fds - if fd[1].properties["v_id"] - ] # TODO: [missing_property] check why sometime v_id is not present + all_fd_v_ids: List[str] = [ + getattr(fd[1], "properties", {}).get("v_id") + for fds in all_fd_nodes.values() + for fd in fds + if getattr(fd[1], "properties", {}).get("v_id") + ] # TODO: [missing_property] check why sometime v_id is not presentsrc/core/agents/tools/architect_agent/ArchitectAgentCreateRelationshipTool.py-289-297 (1)
289-297: Janitor token accounting uses the wrong counters.
The token detail is built fromself.architect_agentinstead ofjanitor_agent, which can double-count or miss Janitor usage.🐛 Suggested fix
- janitor_token_detail = token_detail_from_token_counts( - self.architect_agent.input_tokens, - self.architect_agent.output_tokens, - self.architect_agent.cached_tokens, - self.architect_agent.reasoning_tokens, - ) + janitor_token_detail = token_detail_from_token_counts( + janitor_agent.input_tokens, + janitor_agent.output_tokens, + janitor_agent.cached_tokens, + janitor_agent.reasoning_tokens, + )src/services/api/routes/retrieve.py-448-465 (1)
448-465: Mutable default argument will cause unexpected behavior.Line 451 uses a mutable default
types: Optional[List[str]] = []. This is a common Python pitfall where the same list instance is shared across all calls. UseNoneas the default and initialize inside the function.🐛 Proposed fix
`@retrieve_router.get`(path="/entity/status") async def get_entity_status( target: str, - types: Optional[List[str]] = [], + types: Optional[List[str]] = None, brain_id: str = "default", ):src/core/agents/janitor_agent.py-419-421 (1)
419-421: Same per-run counter reset needed here.Without a reset, token accounting will include previous calls.
🐛 Proposed fix
accumulated_messages = [] + self.input_tokens = 0 + self.output_tokens = 0 + self.cached_tokens = 0 + self.reasoning_tokens = 0 + self.token_detail = Nonesrc/core/agents/janitor_agent.py-592-594 (1)
592-594: Reset counters before atomic-janitor invocation.Otherwise token_detail and counters may include earlier runs.
🐛 Proposed fix
accumulated_messages = [] + self.input_tokens = 0 + self.output_tokens = 0 + self.cached_tokens = 0 + self.reasoning_tokens = 0 + self.token_detail = Nonesrc/core/agents/architect_agent.py-257-259 (1)
257-259: Reset cached/reasoning counters per run.Only input/output are reset; cached/reasoning and token_detail can leak across runs and skew accounting.
🐛 Proposed fix
self.input_tokens = 0 self.output_tokens = 0 + self.cached_tokens = 0 + self.reasoning_tokens = 0 + self.token_detail = Nonesrc/core/agents/architect_agent.py-529-541 (1)
529-541: Clear tooler state between invocations.
relationships_set,used_entities_set, and token counters persist across runs;run_toolerreturns the accumulated set, so stale data can leak.🐛 Proposed fix
entities_dict = {entity.uuid: entity for entity in entities} self.entities = entities_dict + self.relationships_set = [] + self.used_entities_set = [] + self.input_tokens = 0 + self.output_tokens = 0 + self.cached_tokens = 0 + self.reasoning_tokens = 0 + self.token_detail = None self._get_agent( type_="tooler", text=text, brain_id=brain_id,src/core/agents/janitor_agent.py-233-234 (1)
233-234: Reset token counters at the start of this run.Counters currently accumulate across invocations, so token_detail can reflect prior runs. Apply the same reset in
runandrun_atomic_janitoras well.🐛 Proposed fix
accumulated_messages = [] + self.input_tokens = 0 + self.output_tokens = 0 + self.cached_tokens = 0 + self.reasoning_tokens = 0 + self.token_detail = Nonesrc/core/agents/janitor_agent.py-141-176 (1)
141-176: Fixtype_parameter signature and add missing "atomic-janitor" type.The parameter signature has a type annotation mismatch: it declares
type_: strbut assigns aLiteral[...]typing object as the default, which would cause aValueErrorif any caller omits the parameter. Additionally, theLiteralis missing"atomic-janitor"even though the docstring and function logic both handle this type (lines 159, 171–174).Proposed fix
- type_: str = Literal["janitor", "graph-janitor"], + type_: Literal["janitor", "graph-janitor", "atomic-janitor"] = "janitor",
🟡 Minor comments (10)
src/core/search/entity_sibilings.py-60-63 (1)
60-63: Missing guard for empty neighbors dictionary.If
get_neighborsreturns an empty dict or thetarget_node_idkey is missing, accessing_neighbors[target_node_id]will raise aKeyError.🐛 Proposed fix
_neighbors = graph_adapter.get_neighbors( [target_node_id], brain_id=self.brain_id ) - neighbors = _neighbors[target_node_id] + neighbors = _neighbors.get(target_node_id, [])src/core/search/entities.py-34-214 (1)
34-214: Deduplicate the stop-word set (Ruff B033), and consider module-level constants.
There are multiple duplicate entries flagged by Ruff; cleaning them up will avoid lint noise. While updating, consider hoisting the stop-word set and compiled regex to module scope to avoid re-allocating them on every call.src/core/agents/tools/architect_agent/ArchitectAgentCheckUsedEntitiesTool.py-51-54 (1)
51-54: Handle plain dict entities safely.
Ifused_entities_setcontains raw dicts,entity.dict()will raise. Add an explicit dict branch (or a generic fallback).🛠️ Safer serialization
- entities_list = [ - entity.model_dump() if hasattr(entity, "model_dump") else entity.dict() - for entity in self.architect_agent.used_entities_set - ] + def _serialize_entity(entity): + if hasattr(entity, "model_dump"): + return entity.model_dump() + if isinstance(entity, dict): + return entity + if hasattr(entity, "dict"): + return entity.dict() + return dict(entity) + + entities_list = [ + _serialize_entity(entity) + for entity in self.architect_agent.used_entities_set + ]src/adapters/embeddings.py-86-100 (1)
86-100: Docstring promises ordering that isn’t enforced.
The adapter returns the backend’s lists verbatim; unless every VectorStoreClient guarantees sorted similarity, the “ordered by descending similarity” claim is risky. Either sort here or soften the docstring.✏️ Suggested docstring tweak
- Each list contains at most `limit` items, includes only vectors with similarity >= `min_similarity`, - and is ordered by descending similarity. + Each list contains at most `limit` items and includes only vectors with similarity >= `min_similarity`. + Ordering is determined by the underlying vector store implementation.src/core/agents/tools/architect_agent/ArchitectAgentCheckUsedEntitiesTool.py-42-42 (1)
42-42: Silence Ruff ARG002 for unused args/kwargs.
The method doesn’t useargs/kwargs; add a short guard to avoid lint failures.🧹 Minimal fix
def _run(self, *args, **kwargs) -> str: + _ = args, kwargs """src/core/agents/tools/architect_agent/ArchitectAgentMarkEntitiesAsUsedTool.py-19-32 (1)
19-32: Markargs_schemaas a ClassVar to avoid mutable-field warnings.
Ruff RUF012 will flag this; marking it asClassVarprevents accidental mutation and model-field treatment.🧹 ClassVar tweak
-import json -from langchain.tools import BaseTool +import json +from typing import ClassVar +from langchain.tools import BaseTool @@ - args_schema: dict = { + args_schema: ClassVar[dict] = {src/core/agents/scout_agent.py-109-118 (1)
109-118: Silence Ruff ARG002 for unusedbrain_id.
Rename the parameter to_brain_id(or add a no-op assignment) to avoid lint failures.🧹 Quick fix
- def _get_tools(self, brain_id: str = "default") -> List[BaseTool]: + def _get_tools(self, _brain_id: str = "default") -> List[BaseTool]: """src/services/api/controllers/entities.py-100-104 (1)
100-104: Changetypesparameter default from mutable list toNone.Using
[]as a default argument creates a shared mutable object across all function calls, violating Python best practices. Change toOptional[List[str]] = Noneand initialize withtypes = types or []inside the function body.♻️ Suggested fix
-from typing import List, Literal +from typing import List, Literal, Optional @@ -async def get_entity_status( - target: str, - types: List[str] = [], - brain_id: str = "default", -) -> GetEntityStatusResponse: +async def get_entity_status( + target: str, + types: Optional[List[str]] = None, + brain_id: str = "default", +) -> GetEntityStatusResponse: @@ """ Retrieve status information for an entity matching the provided target text. @@ -115,6 +115,7 @@ """ + types = types or [] target_embeddings = embeddings_adapter.embed_text(target)src/core/agents/kg_agent.py-438-442 (1)
438-442: Add exception chaining for proper traceback preservation.When re-raising
TimeoutErrorfromFutureTimeoutError, useraise ... fromto preserve the exception chain for debugging.🐛 Proposed fix
except FutureTimeoutError: - raise TimeoutError( + raise TimeoutError( f"Graph consolidator operator invoke timed out after {timeout} seconds. " "This may indicate a network issue or the LLM service is unresponsive." - ) + ) from Nonesrc/workers/tasks/ingestion.py-532-569 (1)
532-569: Add missing else-branch or guard forNonestructured_data.When
existing_structured_dataisNone(line 538 condition is false),new_structured_dataremainsNone(line 536) and is passed tosave_structured_dataat line 567. The function's type hint expectsStructuredData, notOptional[StructuredData], and the implementation calls.model_dump()on the input, which will fail withAttributeErrorifNoneis passed. Either add an else-branch to create new structured data when none exists, or add a guard to skip saving whennew_structured_dataisNone.
🧹 Nitpick comments (18)
src/core/agents/tools/janitor_agent/JanitorAgentSearchEntitiesTool.py (1)
147-148: Avoid bareException+Catching
Exceptionhides unexpected errors andsrc/config.py (1)
199-220: NormalizeRUN_GRAPH_CONSOLIDATORparsing for common truthy values.Right now only the exact string
"true"enables the flag. Consider accepting common variants to avoid misconfiguration when ops uses"True","1", or"yes".♻️ Proposed tweak
- self.run_graph_consolidator = ( - os.getenv("RUN_GRAPH_CONSOLIDATOR", "true") == "true" - ) + self.run_graph_consolidator = ( + os.getenv("RUN_GRAPH_CONSOLIDATOR", "true").strip().lower() + in {"true", "1", "yes"} + )src/core/agents/tools/architect_agent/ArchitectAgentGetRemainingEntitiesToProcessTool.py (1)
41-53: Make serialization resilient to non-Pydantic entities.Other tools already guard against entities lacking
model_dump. Mirroring that pattern avoids potentialAttributeErrorif the entity set contains plain dict-like objects.♻️ Proposed tweak
- remaining_entities = json.dumps( - [ - entity.model_dump(mode="json") - for entity in self.architect_agent.entities.values() - ] - ) + remaining_entities = json.dumps( + [ + entity.model_dump(mode="json") + if hasattr(entity, "model_dump") + else (entity.dict() if hasattr(entity, "dict") else entity) + for entity in self.architect_agent.entities.values() + ] + )src/utils/tokens.py (1)
25-44: Consider extracting the zero-initialized TokenDetail into a helper.The same zero-initialized
TokenDetailconstruction appears twice (lines 26-33 and 37-44). This could be extracted into a private helper or a module-level constant to reduce duplication.♻️ Suggested refactor
+def _empty_token_detail() -> TokenDetail: + return TokenDetail( + input=TokenInputDetail(total=0, uncached=0, cached=0, cache_percentage=0.0), + output=TokenOutputDetail( + total=0, regular=0, reasoning=0, reasoning_percentage=0.0 + ), + grand_total=0, + effective_total=0, + ) + + def merge_token_details(token_details: list[TokenDetail]) -> TokenDetail: # ...docstring... if not token_details: - return TokenDetail( - input=TokenInputDetail(total=0, uncached=0, cached=0, cache_percentage=0.0), - output=TokenOutputDetail( - total=0, regular=0, reasoning=0, reasoning_percentage=0.0 - ), - grand_total=0, - effective_total=0, - ) + return _empty_token_detail() token_details = [detail for detail in token_details if detail is not None] if not token_details: - return TokenDetail( - input=TokenInputDetail(total=0, uncached=0, cached=0, cache_percentage=0.0), - output=TokenOutputDetail( - total=0, regular=0, reasoning=0, reasoning_percentage=0.0 - ), - grand_total=0, - effective_total=0, - ) + return _empty_token_detail()src/lib/llm/client_small.py (2)
109-111: Use explicitNoneunion type for optional parameters.Per PEP 484, use
int | Noneinstead of defaulting toNonewith a bareinttype hint.♻️ Proposed fix
def generate_text( - self, prompt: str, max_new_tokens: int = 65536, timeout: int = None + self, prompt: str, max_new_tokens: int = 65536, timeout: int | None = None ) -> str:
186-192: Use explicitNoneunion type for optionaltimeoutparameter.Same PEP 484 issue as in
generate_text.♻️ Proposed fix
def generate_json( self, prompt: str, max_new_tokens: int = 65536, max_retries: int = 3, - timeout: int = None, + timeout: int | None = None, ) -> dict:src/core/search/entity_sibilings.py (3)
11-11: Unused import:dataclass.The
dataclassdecorator is imported but never used in this file.♻️ Proposed fix
-from dataclasses import dataclass from typing import List, Literal, Tuple
133-137: Loop variablev_idis unused; use_to indicate intent.The loop iterates over
all_similar_dict.items()but only usessimilar_vectors. Use_for the unused key to clarify intent.♻️ Proposed fix
- for v_id, similar_vectors in all_similar_dict.items(): - for similar_node_v in similar_vectors: + for _, similar_vectors in all_similar_dict.items(): + for similar_node_v in similar_vectors:
22-23: Typos in class and method names: "Sinergy" → "Synergy", "sibilings" → "siblings".The class
EntitySinergyRetrieverand methodretrieve_sibilingscontain spelling errors that may cause confusion and make the API harder to discover.Also applies to: 32-33
src/workers/app.py (1)
48-54: 24-hour task limits and visibility timeout raise reliability concerns.Increasing
task_time_limit,task_soft_time_limit, andvisibility_timeoutto 86400 seconds (24 hours) has implications:
- Worker crash recovery: If a worker dies mid-task, the task won't be retried until the visibility timeout expires—24 hours later.
- Resource utilization: Long-running tasks block worker slots for extended periods.
- Debugging difficulty: Tasks that hang will take a day to surface.
If certain tasks genuinely require 24 hours, consider:
- Using separate queues with different timeouts for long-running vs. normal tasks
- Implementing periodic heartbeats/checkpoints for long tasks
- Adding monitoring/alerting for tasks exceeding expected durations
src/core/search/entities.py (1)
268-270: Use keyword arguments forgraph_adapter.search_entities.
This avoids subtle bugs if the adapter signature changes or the call order is misunderstood.🔧 Safer call site
- result = graph_adapter.search_entities( - brain_id, limit, skip, node_labels, query_text - ) + result = graph_adapter.search_entities( + brain_id=brain_id, + limit=limit, + skip=skip, + node_labels=node_labels, + query_text=query_text, + )src/core/layers/graph_consolidation/graph_consolidation.py (3)
12-22: Remove unused imports.The following imports are not used in this file:
json,concurrent.futures(ThreadPoolExecutor,FutureTimeoutError), andtenacitymodules (retry,stop_after_attempt,wait_exponential,retry_if_exception_type,RetryError). Additionally,HumanMessageandRemoveMessagefrom langchain_core are unused.♻️ Proposed cleanup
-import json from typing import List -from concurrent.futures import ThreadPoolExecutor, TimeoutError as FutureTimeoutError -from tenacity import ( - retry, - stop_after_attempt, - wait_exponential, - retry_if_exception_type, - RetryError, -) from pydantic import BaseModel -from langchain_core.messages import HumanMessage, RemoveMessage from src.config import config
86-89: Replace debug print statements with structured logging.Multiple
src/utils/logging) for consistent, configurable log output.
26-42: Additional unused imports from project modules.Several imports appear unused:
ArchitectAgentEntity,GraphConsolidatorOutput,Node,Predicate,JANITOR_AGENT_GRAPH_NORMALIZATOR_PROMPT,HISTORY_MAX_MESSAGES,HISTORY_MAX_MESSAGES_DELETE, andJanitorAgentExecuteGraphReadOperationTool.src/core/search/entity_info.py (1)
50-60: Remove stale commented-out type hint.Line 60 has a commented-out return type
# -> List[SynergyPath]:that doesn't match the actual return type. The docstring correctly describes the return type. Remove the stale comment.♻️ Proposed fix
def _recursive_explorer( self, current_node_id: str, query_embedding: List[float], depth: int, visited_ids: Set[str], most_similar_conn_rel_tuple: Optional[ Tuple[Tuple[Predicate, Node], float] ] = None, - ): - # -> List[SynergyPath]: + ) -> List[Tuple[Tuple[Predicate, Node], float]]:src/workers/tasks/ingestion.py (1)
262-262: Unused variablev_idfrom tuple unpacking.The
v_idvariable is unpacked but never used. Prefix with underscore to indicate intentional discard.♻️ Proposed fix
- v_id, v_rel_id = future.result(timeout=180) + _v_id, v_rel_id = future.result(timeout=180)src/core/agents/kg_agent.py (1)
444-455: Unusedresponsevariable.The
responsevariable at line 445 is assigned but never used. The function always returns"OK"regardless of the response content. Either use the response or prefix with underscore.♻️ Proposed fix
try: - response = _invoke_agent_with_retry() + _invoke_agent_with_retry() except RetryError as e:src/lib/neo4j/client.py (1)
1607-1616: Remove unnecessary f-string prefix.The Cypher query string at line 1607 uses an f-string prefix but contains no placeholders. Remove the
fprefix.♻️ Proposed fix
- cypher_query = f""" + cypher_query = """ MATCH ()-[r]-(m)-[r2]-(b) WHERE r.uuid = $predicate_uuid AND r2.flow_key = $flow_key
| """ | ||
|
|
||
|
|
||
| def create_backup(brain_id: str) -> Backup: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Undefined type Backup will cause import failure.
The return type annotation references Backup, but this type is neither imported nor defined in the module. This will raise a NameError when Python evaluates the annotation at import time.
Either import the Backup type from its definition module, define it locally, or use a forward reference string annotation ("Backup") with from __future__ import annotations if the type will be defined later.
Proposed fix (assuming Backup is defined elsewhere)
+from __future__ import annotations
+from typing import TYPE_CHECKING
+
+if TYPE_CHECKING:
+ from src.core.backups.models import Backup # adjust import path as needed
+
def create_backup(brain_id: str) -> Backup:Or if Backup doesn't exist yet:
+from __future__ import annotations
+
+
def create_backup(brain_id: str) -> Backup:🧰 Tools
🪛 Ruff (0.14.14)
13-13: Undefined name Backup
(F821)
🤖 Prompt for AI Agents
In `@src/core/backups/backup_creator.py` at line 13, The function create_backup
declares a return type Backup which is not defined or imported in this module;
fix by either importing Backup from its defining module (add an import for
Backup), declaring Backup locally if intended, or enable postponed evaluation of
annotations by adding from __future__ import annotations and changing the
annotation to a forward-reference string "Backup" (or just use "Backup" with the
future import); update the create_backup signature accordingly so the module no
longer raises NameError on import.
| token_details.append( | ||
| token_detail_from_token_counts( | ||
| kg_agent.input_tokens, | ||
| kg_agent.output_tokens, | ||
| kg_agent.cached_tokens, | ||
| kg_agent.reasoning_tokens, | ||
| ) | ||
| ) | ||
| kg_agent.run_graph_consolidator_operator(task, brain_id=brain_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Token accounting captured before agent execution returns zero tokens.
The token_detail_from_token_counts is called immediately after KGAgent instantiation but before run_graph_consolidator_operator executes. At this point, kg_agent.input_tokens, output_tokens, etc. are all initialized to 0, so the captured token details will always be empty.
Move the token collection after the agent operation completes:
🐛 Proposed fix
for task in tasks:
kg_agent = KGAgent(
llm_adapter=llm_small_adapter,
cache_adapter=cache_adapter,
kg=graph_adapter,
vector_store=vector_store_adapter,
embeddings=embeddings_adapter,
database_desc=_neo4j_client.graphdb_description,
)
+ kg_agent.run_graph_consolidator_operator(task, brain_id=brain_id)
token_details.append(
token_detail_from_token_counts(
kg_agent.input_tokens,
kg_agent.output_tokens,
kg_agent.cached_tokens,
kg_agent.reasoning_tokens,
)
)
- kg_agent.run_graph_consolidator_operator(task, brain_id=brain_id)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| token_details.append( | |
| token_detail_from_token_counts( | |
| kg_agent.input_tokens, | |
| kg_agent.output_tokens, | |
| kg_agent.cached_tokens, | |
| kg_agent.reasoning_tokens, | |
| ) | |
| ) | |
| kg_agent.run_graph_consolidator_operator(task, brain_id=brain_id) | |
| kg_agent.run_graph_consolidator_operator(task, brain_id=brain_id) | |
| token_details.append( | |
| token_detail_from_token_counts( | |
| kg_agent.input_tokens, | |
| kg_agent.output_tokens, | |
| kg_agent.cached_tokens, | |
| kg_agent.reasoning_tokens, | |
| ) | |
| ) |
🤖 Prompt for AI Agents
In `@src/core/layers/graph_consolidation/graph_consolidation.py` around lines 137
- 145, The token snapshot is taken too early:
token_detail_from_token_counts(kg_agent.input_tokens, kg_agent.output_tokens,
kg_agent.cached_tokens, kg_agent.reasoning_tokens) is invoked before
kg_agent.run_graph_consolidator_operator(...) runs so all counts are zero; move
the token collection/ token_details.append call to immediately after
kg_agent.run_graph_consolidator_operator(task, brain_id=brain_id) so it reads
the populated kg_agent.input_tokens, output_tokens, cached_tokens and
reasoning_tokens values returned/updated by the operation.
| def get_context( | ||
| self, context_depth: int = 3 | ||
| ) -> Tuple[Node, list[dict], list[dict]]: | ||
| """ | ||
| Retrieve contextual information related to the target entity up to a specified depth. | ||
|
|
||
| Returns the target node, its neighborhood structure, a list of unique textual descriptions from the node and predicate metadata, and a structured natural language representation of the neighborhood. | ||
|
|
||
| @param context_depth The depth of the neighborhood context to retrieve around the target entity. | ||
|
|
||
| @returns A tuple containing: | ||
| - The target node object. | ||
| - A list representing the neighborhood of the target node. | ||
| - A list of unique descriptive texts collected from the nodes and predicates. | ||
| - A structured list suitable for natural language processing, representing relationships and descriptions within the neighborhood. | ||
| """ | ||
|
|
||
| text_contexts = set() | ||
| natural_language_web = list() | ||
|
|
||
| target_embedding = embeddings_adapter.embed_text(self.target) | ||
| target_node_vs = vector_store_adapter.search_vectors( | ||
| target_embedding.embeddings, store="nodes", brain_id=self.brain_id | ||
| ) | ||
|
|
||
| if not target_node_vs: | ||
| return (None, []) | ||
|
|
||
| target_node_v = target_node_vs[0] | ||
| target_node_id = target_node_v.metadata.get("uuid") | ||
|
|
||
| target_node = graph_adapter.get_by_uuid(target_node_id, brain_id=self.brain_id) | ||
|
|
||
| if not target_node: | ||
| return (None, []) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Return tuple size is inconsistent with callers.
On a miss, you return a 2‑tuple, but callers unpack 4 values; the annotation also only covers 3. This will raise at runtime. Return a consistent 4‑tuple and make the type explicit.
🐛 Suggested fix
-from typing import Tuple
+from typing import Optional, Tuple
@@
- def get_context(
- self, context_depth: int = 3
- ) -> Tuple[Node, list[dict], list[dict]]:
+ def get_context(
+ self, context_depth: int = 3
+ ) -> Tuple[Optional[Node], list[dict], list[str], list[dict]]:
@@
- if not target_node_vs:
- return (None, [])
+ if not target_node_vs:
+ return (None, [], [], [])
@@
- target_node_id = target_node_v.metadata.get("uuid")
+ target_node_id = target_node_v.metadata.get("uuid")
+ if not target_node_id:
+ return (None, [], [], [])
@@
- if not target_node:
- return (None, [])
+ if not target_node:
+ return (None, [], [], [])🤖 Prompt for AI Agents
In `@src/core/search/entity_context.py` around lines 32 - 66, The get_context
method returns inconsistent tuple sizes and has an incorrect type annotation;
change its signature to return Tuple[Optional[Node], list[dict], list[dict],
list[dict]] (import Optional if needed), and update all early-return paths
(e.g., where it currently returns (None, []) after target_node_vs or target_node
is missing) to return a consistent four-element tuple like (None, [], [], []);
ensure the final successful return also matches this 4-tuple shape so callers
that unpack four values do not fail.
| event_names_query = "MATCH (n) WHERE labels(n) CONTAINS 'EVENT' RETURN n.name" | ||
|
|
||
| labels_result = self.driver.execute_query(labels_query, database_=brain_id) | ||
| relationships_result = self.driver.execute_query( | ||
| relationships_query, database_=brain_id | ||
| ) | ||
| event_names_result = self.driver.execute_query( | ||
| event_names_query, database_=brain_id | ||
| ) | ||
|
|
||
| labels = [record["label"] for record in labels_result.records] | ||
| relationships = [ | ||
| record["relationshipType"] for record in relationships_result.records | ||
| ] | ||
| event_names = [record["name"] for record in event_names_result.records] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cypher syntax error and record key mismatch in get_schema.
-
Line 1412:
labels(n) CONTAINS 'EVENT'is invalid Cypher.labels(n)returns a list, andCONTAINSis for string matching. Use'EVENT' IN labels(n)instead. -
Line 1426: The query returns
n.namebut the code accessesrecord["name"]. The alias should match.
🐛 Proposed fix
- event_names_query = "MATCH (n) WHERE labels(n) CONTAINS 'EVENT' RETURN n.name"
+ event_names_query = "MATCH (n) WHERE 'EVENT' IN labels(n) RETURN n.name as name"
...
- event_names = [record["name"] for record in event_names_result.records]
+ event_names = [record["name"] for record in event_names_result.records]📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| event_names_query = "MATCH (n) WHERE labels(n) CONTAINS 'EVENT' RETURN n.name" | |
| labels_result = self.driver.execute_query(labels_query, database_=brain_id) | |
| relationships_result = self.driver.execute_query( | |
| relationships_query, database_=brain_id | |
| ) | |
| event_names_result = self.driver.execute_query( | |
| event_names_query, database_=brain_id | |
| ) | |
| labels = [record["label"] for record in labels_result.records] | |
| relationships = [ | |
| record["relationshipType"] for record in relationships_result.records | |
| ] | |
| event_names = [record["name"] for record in event_names_result.records] | |
| event_names_query = "MATCH (n) WHERE 'EVENT' IN labels(n) RETURN n.name as name" | |
| labels_result = self.driver.execute_query(labels_query, database_=brain_id) | |
| relationships_result = self.driver.execute_query( | |
| relationships_query, database_=brain_id | |
| ) | |
| event_names_result = self.driver.execute_query( | |
| event_names_query, database_=brain_id | |
| ) | |
| labels = [record["label"] for record in labels_result.records] | |
| relationships = [ | |
| record["relationshipType"] for record in relationships_result.records | |
| ] | |
| event_names = [record["name"] for record in event_names_result.records] |
🤖 Prompt for AI Agents
In `@src/lib/neo4j/client.py` around lines 1412 - 1426, In get_schema, change the
event_names_query Cypher to use list membership ('EVENT' IN labels(n)) instead
of CONTAINS and ensure the returned field matches the record key: update the
query to RETURN n.name AS name (so record["name"] works) or adjust the record
access to the actual alias; specifically fix event_names_query and the code that
builds event_names (refer to event_names_query and the list comprehension for
event_names) so the Cypher uses "'EVENT' IN labels(n)" and returns an aliased
name that matches record["name"].
| @retrieve_router.get("/context") | ||
| async def get_context(request): | ||
| """ | ||
| Get the context of an entity. | ||
| Handle an HTTP request to retrieve an entity's contextual information. | ||
|
|
||
| Parameters: | ||
| request: The incoming FastAPI request containing the parameters (query string or body) used to identify the target entity and any context options. | ||
|
|
||
| Returns: | ||
| The HTTP response payload to be returned to the client containing the entity context. | ||
| """ | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incomplete endpoint implementation.
The get_context endpoint has a docstring but no implementation body. It will implicitly return None, which is likely not the intended behavior. Either implement the endpoint or mark it as not implemented.
🐛 Proposed fix - mark as not implemented
`@retrieve_router.get`("/context")
async def get_context(request):
"""
Handle an HTTP request to retrieve an entity's contextual information.
...
"""
+ raise NotImplementedError("This endpoint is not yet implemented")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| @retrieve_router.get("/context") | |
| async def get_context(request): | |
| """ | |
| Get the context of an entity. | |
| Handle an HTTP request to retrieve an entity's contextual information. | |
| Parameters: | |
| request: The incoming FastAPI request containing the parameters (query string or body) used to identify the target entity and any context options. | |
| Returns: | |
| The HTTP response payload to be returned to the client containing the entity context. | |
| """ | |
| `@retrieve_router.get`("/context") | |
| async def get_context(request): | |
| """ | |
| Handle an HTTP request to retrieve an entity's contextual information. | |
| Parameters: | |
| request: The incoming FastAPI request containing the parameters (query string or body) used to identify the target entity and any context options. | |
| Returns: | |
| The HTTP response payload to be returned to the client containing the entity context. | |
| """ | |
| raise NotImplementedError("This endpoint is not yet implemented") |
🤖 Prompt for AI Agents
In `@src/services/api/routes/retrieve.py` around lines 131 - 142, The get_context
endpoint currently only has a docstring and returns None; either implement its
logic or explicitly mark it as unimplemented—import and raise
fastapi.HTTPException(status_code=501, detail="get_context not implemented") (or
return fastapi.responses.JSONResponse with 501) from the get_context function to
avoid implicit None responses; reference the get_context function and ensure you
add the necessary import (HTTPException or JSONResponse) at the top of the
module.
Summary by CodeRabbit
New Features
Improvements
Updates
✏️ Tip: You can customize this high-level summary in your review settings.