Skip to content

Add support for include_directive in C#596

Open
gkorland wants to merge 2 commits intostagingfrom
backend/add-include-directive-support
Open

Add support for include_directive in C#596
gkorland wants to merge 2 commits intostagingfrom
backend/add-include-directive-support

Conversation

@gkorland
Copy link
Contributor

@gkorland gkorland commented Mar 10, 2026

Migrated from falkordb/code-graph-backend#57

Summary

Add support for processing include_directive in C files, creating edges between files.

Changes:

  • Added process_include_directive method to C analyzer
  • Modified first pass to process include directive nodes
  • Added test case for include directive relationship tracking

Resolves #544


Originally authored by @gkorland in falkordb/code-graph-backend#57

Summary by CodeRabbit

  • New Features

    • Implemented comprehensive C code analysis with extraction of function definitions, parameter details, struct definitions, and include directives. Supports dependency tracking and call relationship resolution across C files.
  • Tests

    • Added test coverage for include directive handling and file dependency verification.

Migrated from FalkorDB/code-graph-backend PR #57.
Original issue: FalkorDB/code-graph-backend#46
Resolves #544

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@vercel
Copy link

vercel bot commented Mar 10, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
code-graph Error Error Mar 10, 2026 9:01am

Request Review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 10, 2026

📝 Walkthrough

Walkthrough

Replaced stubbed C analyzer with a fully functional Tree-sitter-based implementation that extracts function definitions, struct definitions, include directives, and parameter details. The analyzer uses two-pass processing: first pass identifies definitions and includes, second pass resolves function call relationships.

Changes

Cohort / File(s) Summary
C Analyzer Implementation
api/analyzers/c/analyzer.py
Complete CAnalyzer implementation with 11 new methods supporting Tree-sitter parsing, two-pass analysis (first_pass for definitions/includes, second_pass for call relationships), and entity extraction for functions, structs, and include directives with corresponding graph edges.
Test Updates
tests/test_c_analyzer.py
Updated public API invocation from analyze_local_folder() to analyze() and added test coverage for include_directive edge creation with header file validation.
Configuration
pyproject.toml
Minor project configuration updates.

Sequence Diagram

sequenceDiagram
    participant User
    participant CAnalyzer
    participant Parser as Tree-sitter Parser
    participant Graph
    participant FileSystem

    User->>CAnalyzer: analyze(path, graph)
    CAnalyzer->>FileSystem: Read .c/.h files
    FileSystem-->>CAnalyzer: File contents

    rect rgba(100, 150, 200, 0.5)
    Note over CAnalyzer,Graph: First Pass: Definitions & Includes
    CAnalyzer->>Parser: Parse C code
    Parser-->>CAnalyzer: AST
    CAnalyzer->>CAnalyzer: Extract functions, structs, includes
    CAnalyzer->>Graph: Add Function/Struct/File entities
    CAnalyzer->>Graph: Create DEFINES edges
    CAnalyzer->>Graph: Create INCLUDES edges
    end

    rect rgba(200, 150, 100, 0.5)
    Note over CAnalyzer,Graph: Second Pass: Call Relationships
    CAnalyzer->>Parser: Re-parse for function calls
    Parser-->>CAnalyzer: AST
    CAnalyzer->>CAnalyzer: Identify call sites
    CAnalyzer->>Graph: Create CALLS edges
    end

    CAnalyzer-->>User: Graph populated
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 A hop through the C-code, with Tree-sitter we dance,
Functions and structs in a two-pass advance,
Include directives link files with care,
The graph springs to life through the forest so fair! 🌲✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Out of Scope Changes check ❓ Inconclusive Changes include a complete CAnalyzer implementation replacing stub code, which extends beyond the narrow include_directive requirement. However, this appears to be preparatory refactoring needed to support include processing and maintain code functionality. Clarify whether the extensive CAnalyzer refactoring (function definitions, struct processing, two-pass analysis) was required for include_directive support or should have been a separate PR.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main change: adding support for include_directive processing in the C analyzer, which aligns with the primary objective of creating file-to-file include edges.
Linked Issues check ✅ Passed The PR successfully implements all coding requirements from issue #544: detect include_directive nodes in C files and create edges between files. The new process_include_directive method, integrated first_pass logic, and test validation confirm completion of stated objectives.
Docstring Coverage ✅ Passed Docstring coverage is 83.33% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch backend/add-include-directive-support
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

logger.info(f"Processing {path}")

# Create file entity
file = File(os.path.dirname(path), path.name, path.suffix)
import logging
logger = logging.getLogger('code_graph')

class CAnalyzer(AbstractAnalyzer):
import os
from ..utils import *
from pathlib import Path
from ...entities import *
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds C #include directive handling intended to create INCLUDES edges between files in the graph, along with a unit test update to validate the relationship.

Changes:

  • Replaced the (previously commented) C analyzer implementation with a concrete analyzer that attempts to parse C AST and create graph nodes/edges.
  • Added process_include_directive and wired it into the C analyzer “first pass”.
  • Updated tests/test_c_analyzer.py to assert INCLUDES edges for an included header.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 11 comments.

File Description
api/analyzers/c/analyzer.py Introduces a C analyzer implementation that attempts to create File/Function/Struct nodes and INCLUDES edges from #include.
tests/test_c_analyzer.py Adds assertions for INCLUDES neighbors and updates the test to call an analyze(...) method.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +90 to +145
def process_function_definition_node(self, node: Node, path: Path,
source_code: str) -> Optional[Function]:
"""
Processes a function definition node to extract function details.

Args:
node (Node): The AST node representing a function definition.
path (Path): The file path where the function is defined.

Returns:
Optional[Function]: A Function object containing details about the function, or None if the function name cannot be determined.
"""

# Extract function name
res = find_child_of_type(node, 'function_declarator')
if res is None:
return None

function_declarator = res[0]

res = find_child_of_type(function_declarator, 'identifier')
if res is None:
return None

identifier = res[0]
function_name = identifier.text.decode('utf-8')
logger.info(f"Function declaration: {function_name}")

# Extract function return type
res = find_child_of_type(node, 'primitive_type')
ret_type = 'Unknown'
if res is not None:
ret_type = res[0]
ret_type = ret_type.text.decode('utf-8')

# Extract function parameters
args = []
res = find_child_of_type(function_declarator, 'parameter_list')
if res is not None:
parameters = res[0]

# Extract arguments and their types
for child in parameters.children:
if child.type == 'parameter_declaration':
arg = self.process_parameter_declaration(child)
args.append(arg)

# Extract function definition line numbers
start_line = node.start_point[0]
end_line = node.end_point[0]

# Create Function object
docs = ''
src = source_code[node.start_byte:node.end_byte]
f = Function(str(path), function_name, docs, ret_type, src, start_line, end_line)

Comment on lines +394 to +400
# Create file entity
file = File(os.path.dirname(path), path.name, path.suffix)
graph.add_file(file)

# Parse file
source_code = f.read()
tree = self.parser.parse(source_code)
Comment on lines +63 to +70
# Test for include_directive edge creation
included_file = g.get_file('', 'myheader.h', '.h')
self.assertIsNotNone(included_file)

includes = g.get_neighbors([f.id], rel='INCLUDES')
self.assertEqual(len(includes), 3)
included_files = [node['properties']['name'] for node in includes['nodes']]
self.assertIn('myheader.h', included_files)
if entity is not None:
# Add Function object to the graph
try:
graph.add_function(entity)
Comment on lines +525 to +528
# Create missing function
# Assuming this is a call to a native function e.g. 'printf'
callee_f = Function('/', callee_name, None, None, None, 0, 0)
graph.add_function(callee_f)
Comment on lines +399 to +406
source_code = f.read()
tree = self.parser.parse(source_code)
try:
source_code = source_code.decode('utf-8')
except Exception as e:
logger.error(f"Failed decoding source code: {e}")
source_code = ''

Comment on lines +348 to +356
if len(splitted) < 2:
logger.warning("Include path has no extension: %s", included_file_path)
return

# Create file entity for the included file
path = os.path.dirname(normalized_path)
name = os.path.basename(normalized_path)
ext = splitted[1]
included_file = File(path, name, ext)
Comment on lines +352 to +360
# Create file entity for the included file
path = os.path.dirname(normalized_path)
name = os.path.basename(normalized_path)
ext = splitted[1]
included_file = File(path, name, ext)
graph.add_file(included_file)

# Connect the parent file to the included file
graph.connect_entities('INCLUDES', parent.id, included_file.id)

g = Graph("c")
analyzer.analyze_local_folder(path, g)
analyzer.analyze(path, g)
Comment on lines +19 to +21
def __init__(self) -> None:
self.parser = Parser(C_LANGUAGE)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (3)
api/analyzers/c/analyzer.py (3)

19-20: ⚠️ Potential issue | 🟠 Major

Call the base initializer.

Skipping AbstractAnalyzer.__init__ leaves self.language unset and bypasses the shared analyzer setup.

♻️ Proposed fix
 class CAnalyzer(AbstractAnalyzer):
     def __init__(self) -> None:
-        self.parser = Parser(C_LANGUAGE)
+        super().__init__(C_LANGUAGE)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/analyzers/c/analyzer.py` around lines 19 - 20, The C analyzer's __init__
is not calling the base class initializer, so shared setup (like self.language)
is skipped; update the C analyzer __init__ to call AbstractAnalyzer.__init__ (or
super().__init__()) before setting self.parser = Parser(C_LANGUAGE), or
explicitly set self.language = C_LANGUAGE if the base init requires
parameters—ensure you invoke the base initializer (super().__init__()) in the
__init__ method where Parser(C_LANGUAGE) is currently assigned.

394-405: ⚠️ Potential issue | 🔴 Critical

Parse first, then construct File(path, tree) from bytes.

File is created before the tree exists, and f is typed as io.TextIOWrapper, so f.read() returns str. Line 402 then always falls into the exception path because str.decode() does not exist, which wipes every extracted function body.

♻️ Proposed fix
-        # Create file entity
-        file = File(os.path.dirname(path), path.name, path.suffix)
-        graph.add_file(file)
-
         # Parse file
-        source_code = f.read()
-        tree = self.parser.parse(source_code)
-        try:
-            source_code = source_code.decode('utf-8')
-        except Exception as e:
-            logger.error(f"Failed decoding source code: {e}")
-            source_code = ''
+        source_bytes = f.buffer.read()
+        tree = self.parser.parse(source_bytes)
+        source_code = source_bytes.decode('utf-8', errors='replace')
+
+        # Create file entity
+        file = File(path, tree)
+        graph.add_file(file)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/analyzers/c/analyzer.py` around lines 394 - 405, The code creates a File
object before parsing and incorrectly calls str.decode(), which always raises
for io.TextIOWrapper reads and clears source_code; fix by first reading and
parsing the source (use self.parser.parse on the string returned by f.read()),
then construct and add the File entity (File(...)) afterwards using the original
path and parsed tree as needed, and remove the decode() call — handle bytes only
if f is opened in binary mode (decode then), otherwise treat f.read() as str and
pass it directly to parser.parse and downstream logic (refer to File,
graph.add_file, self.parser.parse, and source_code variables).

3-5: ⚠️ Potential issue | 🟠 Major

Replace the wildcard imports.

This file now trips Ruff F403/F405, and the * imports hide the actual symbols the analyzer depends on.

♻️ Proposed fix
-from ..utils import *
+from ..utils import find_child_of_type
 from pathlib import Path
-from ...entities import *
+from ...entities import File, Function, Struct
 from ...graph import Graph
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/analyzers/c/analyzer.py` around lines 3 - 5, Replace the wildcard imports
in analyzer.py: remove "from ..utils import *" and "from ...entities import *"
and instead import only the specific functions/classes the module uses (inspect
usages of symbols in analyzer.py such as any utility helpers or entity classes
referenced) and list them explicitly (e.g., from ..utils import func_a,
helper_b; from ...entities import EntityX, EntityY). Ensure any renamed or
aliased imports match existing references in functions like the analyzer class
or helper functions, and update any __all__ or re-exports if present.
🧹 Nitpick comments (1)
tests/test_c_analyzer.py (1)

63-70: Please add the new coverage in pytest style.

This extends a unittest.TestCase suite, but backend tests under tests/ are expected to use pytest.

As per coding guidelines, "Backend tests should use pytest and be organized in tests/ directory".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_c_analyzer.py` around lines 63 - 70, Convert the unittest-style
assertions in tests/test_c_analyzer.py into a pytest-style test function: create
a standalone function (e.g., test_include_directive_edge_creation) that uses the
existing fixture/object g, replace self.assertIsNotNone(included_file) with
assert included_file is not None, replace self.assertEqual(len(includes), 3)
with assert len(includes) == 3, and replace self.assertIn('myheader.h',
included_files) with assert 'myheader.h' in included_files; remove any
unittest.TestCase class wrapper and self usage so the test runs under pytest.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@api/analyzers/c/analyzer.py`:
- Around line 341-357: The code creates a synthetic File node from the raw
include text (using os.path.dirname(normalized_path)) instead of resolving the
include relative to the including file; change the logic to first resolve the
include against the including file's directory (e.g. resolved =
Path(parent_of_including_file) / normalized_path), then check the graph for an
existing File entity for that resolved path and reuse it (lookup via whatever
graph method exists, e.g. graph.get_file_by_path/find_file), and only create a
new File and call graph.add_file(File(...)) if no real entity is found; update
references to use included_file = existing_or_new_file so INCLUDES point to the
actual analyzed header rather than a top-level synthetic node.
- Around line 408-419: first_pass() and second_pass() incorrectly call
query.captures() on a Query (which doesn't exist) and treat the result as a
tuple; replace those calls with the existing helper or a QueryCursor: use
self._captures(tree, query) (the inherited helper that returns a dict keyed by
capture name) when invoking C_LANGUAGE.query("(function_definition) `@function`")
and iterate captures['function'] to call process_function_definition(file, node,
path, graph, source_code); in second_pass() also remove any indexing like [0] on
capture results and instead access the dict entries returned by _captures() or
use a QueryCursor created from the Query and call
cursor.captures(tree.root_node) before processing nodes.

In `@tests/test_c_analyzer.py`:
- Line 24: The test calls analyzer.analyze(path, g) but SourceAnalyzer does not
implement analyze(); either add a public analyze(self, path, g) wrapper on
SourceAnalyzer that delegates to the appropriate existing method (for example
call analyze_local_folder(path, g) or analyze_sources(path, g) based on input)
or update the test to call an existing method such as analyze_local_folder(path,
g); additionally ensure the analyzers dispatch map enables CAnalyzer by
uncommenting the ".c" and ".h" entries in the analyzers dict so C files are
actually handled when the wrapper or chosen method delegates to CAnalyzer.

---

Duplicate comments:
In `@api/analyzers/c/analyzer.py`:
- Around line 19-20: The C analyzer's __init__ is not calling the base class
initializer, so shared setup (like self.language) is skipped; update the C
analyzer __init__ to call AbstractAnalyzer.__init__ (or super().__init__())
before setting self.parser = Parser(C_LANGUAGE), or explicitly set self.language
= C_LANGUAGE if the base init requires parameters—ensure you invoke the base
initializer (super().__init__()) in the __init__ method where Parser(C_LANGUAGE)
is currently assigned.
- Around line 394-405: The code creates a File object before parsing and
incorrectly calls str.decode(), which always raises for io.TextIOWrapper reads
and clears source_code; fix by first reading and parsing the source (use
self.parser.parse on the string returned by f.read()), then construct and add
the File entity (File(...)) afterwards using the original path and parsed tree
as needed, and remove the decode() call — handle bytes only if f is opened in
binary mode (decode then), otherwise treat f.read() as str and pass it directly
to parser.parse and downstream logic (refer to File, graph.add_file,
self.parser.parse, and source_code variables).
- Around line 3-5: Replace the wildcard imports in analyzer.py: remove "from
..utils import *" and "from ...entities import *" and instead import only the
specific functions/classes the module uses (inspect usages of symbols in
analyzer.py such as any utility helpers or entity classes referenced) and list
them explicitly (e.g., from ..utils import func_a, helper_b; from ...entities
import EntityX, EntityY). Ensure any renamed or aliased imports match existing
references in functions like the analyzer class or helper functions, and update
any __all__ or re-exports if present.

---

Nitpick comments:
In `@tests/test_c_analyzer.py`:
- Around line 63-70: Convert the unittest-style assertions in
tests/test_c_analyzer.py into a pytest-style test function: create a standalone
function (e.g., test_include_directive_edge_creation) that uses the existing
fixture/object g, replace self.assertIsNotNone(included_file) with assert
included_file is not None, replace self.assertEqual(len(includes), 3) with
assert len(includes) == 3, and replace self.assertIn('myheader.h',
included_files) with assert 'myheader.h' in included_files; remove any
unittest.TestCase class wrapper and self usage so the test runs under pytest.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: db7fbc7c-059d-4357-be05-71bacb011725

📥 Commits

Reviewing files that changed from the base of the PR and between 887b82f and 996c299.

📒 Files selected for processing (2)
  • api/analyzers/c/analyzer.py
  • tests/test_c_analyzer.py

Comment on lines +341 to +357
# Normalize and validate path
normalized_path = os.path.normpath(included_file_path)
except UnicodeDecodeError as e:
logger.error("Failed to decode include path in %s: %s", path, e)
return

splitted = os.path.splitext(normalized_path)
if len(splitted) < 2:
logger.warning("Include path has no extension: %s", included_file_path)
return

# Create file entity for the included file
path = os.path.dirname(normalized_path)
name = os.path.basename(normalized_path)
ext = splitted[1]
included_file = File(path, name, ext)
graph.add_file(included_file)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Resolve include targets relative to the including file.

os.path.dirname(normalized_path) only reflects the raw include text, so #include "myheader.h" inside src/foo.c is recorded as a top-level myheader.h node instead of src/myheader.h. That can collapse distinct headers onto the same node and point INCLUDES at a synthetic file instead of the header that was actually analyzed. Resolve path.parent / normalized_path first and then reuse/look up the real file entity.

🧰 Tools
🪛 Ruff (0.15.6)

[error] 356-356: File may be undefined, or defined from star imports

(F405)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/analyzers/c/analyzer.py` around lines 341 - 357, The code creates a
synthetic File node from the raw include text (using
os.path.dirname(normalized_path)) instead of resolving the include relative to
the including file; change the logic to first resolve the include against the
including file's directory (e.g. resolved = Path(parent_of_including_file) /
normalized_path), then check the graph for an existing File entity for that
resolved path and reuse it (lookup via whatever graph method exists, e.g.
graph.get_file_by_path/find_file), and only create a new File and call
graph.add_file(File(...)) if no real entity is found; update references to use
included_file = existing_or_new_file so INCLUDES point to the actual analyzed
header rather than a top-level synthetic node.

Comment on lines +408 to +419
query = C_LANGUAGE.query("(function_definition) @function")
captures = query.captures(tree.root_node)
# captures: {'function':
# [<Node type=function_definition, start_point=(0, 0), end_point=(7, 1)>,
# <Node type=function_definition, start_point=(15, 0), end_point=(18, 1)>
# ]
# }

if 'function' in captures:
functions = captures['function']
for node in functions:
self.process_function_definition(file, node, path, graph, source_code)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# First, check the file exists and get its structure
git ls-files | grep -E "api/analyzers/c/analyzer\.py"

Repository: FalkorDB/code-graph

Length of output: 90


🏁 Script executed:

# Read the file to examine the code around lines 408-419 and 511-519
wc -l api/analyzers/c/analyzer.py

Repository: FalkorDB/code-graph

Length of output: 94


🏁 Script executed:

# Get context around lines 408-419
sed -n '400,425p' api/analyzers/c/analyzer.py | cat -n

Repository: FalkorDB/code-graph

Length of output: 1261


🏁 Script executed:

# Get context around lines 511-519
sed -n '500,530p' api/analyzers/c/analyzer.py | cat -n

Repository: FalkorDB/code-graph

Length of output: 1642


🏁 Script executed:

# Get a broader view to find first_pass and second_pass function names
grep -n "def first_pass\|def second_pass" api/analyzers/c/analyzer.py

Repository: FalkorDB/code-graph

Length of output: 232


🏁 Script executed:

# Check how tree-sitter is imported and which version
grep -n "import.*tree_sitter\|from tree_sitter" api/analyzers/c/analyzer.py

Repository: FalkorDB/code-graph

Length of output: 143


🏁 Script executed:

# Check if there's any documentation or setup file indicating tree-sitter version
grep -r "tree.sitter\|tree-sitter" --include="*.py" --include="*.txt" --include="*.md" | head -20

Repository: FalkorDB/code-graph

Length of output: 1063


🌐 Web query:

tree-sitter python bindings query.captures() return type

💡 Result:

In tree-sitter Python bindings (py-tree-sitter), there is no query.captures method directly on the Query class. The Query class is used to define patterns but does not execute them to retrieve captures. To get captures, create a QueryCursor from the Query and call captures on it: query_cursor.captures(node). The return type of QueryCursor.captures(node) is a dictionary (dict) where: - Keys are capture names (strings, e.g., "@function.def") - Values are lists of captured Node objects (list[tree_sitter.Node]) Example usage: from tree_sitter import Language, Parser, Query, QueryCursor # ... setup language, parser, tree, query ... query_cursor = QueryCursor(query) captures = query_cursor.captures(tree.root_node) # captures: dict[str, list[Node]] for capture_name, nodes in captures.items: for node in nodes: print(node) Note: QueryCursor.matches(node) returns grouped matches as list[tuple[int, dict[str, list[Node]]]].

Citations:


🏁 Script executed:

# Check the base analyzer.py for tree-sitter query wrapper
sed -n '1,100p' api/analyzers/analyzer.py | cat -n

Repository: FalkorDB/code-graph

Length of output: 3847


🏁 Script executed:

# Search for query wrapper methods in base analyzer
grep -n "def.*query\|\.captures" api/analyzers/analyzer.py

Repository: FalkorDB/code-graph

Length of output: 102


🏁 Script executed:

# Get more context around the query method if it exists
grep -A 10 "def.*query" api/analyzers/analyzer.py

Repository: FalkorDB/code-graph

Length of output: 45


Both first_pass() and second_pass() use an incorrect tree-sitter API that will fail at runtime.

query.captures() does not exist on Query objects. The correct API is to create a QueryCursor from the query and call captures() on that, which returns a dict keyed by capture name (as shown in the base AbstractAnalyzer._captures() method at lines 15–19 of api/analyzers/analyzer.py). Both methods should use the inherited _captures() helper or explicitly create a QueryCursor. The second_pass code will fail immediately when it tries to index results with [0], treating captures as tuples rather than accessing the dict.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/analyzers/c/analyzer.py` around lines 408 - 419, first_pass() and
second_pass() incorrectly call query.captures() on a Query (which doesn't exist)
and treat the result as a tuple; replace those calls with the existing helper or
a QueryCursor: use self._captures(tree, query) (the inherited helper that
returns a dict keyed by capture name) when invoking
C_LANGUAGE.query("(function_definition) `@function`") and iterate
captures['function'] to call process_function_definition(file, node, path,
graph, source_code); in second_pass() also remove any indexing like [0] on
capture results and instead access the dict entries returned by _captures() or
use a QueryCursor created from the Query and call
cursor.captures(tree.root_node) before processing nodes.


g = Graph("c")
analyzer.analyze_local_folder(path, g)
analyzer.analyze(path, g)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -e
sed -n '1,240p' api/analyzers/source_analyzer.py
if [ -f api/__init__.py ]; then
  sed -n '1,120p' api/__init__.py
fi
rg -n -C2 "def analyze\(|def analyze_local_folder\(|analyzers\s*:|'\.c'|'\.h'" api

Repository: FalkorDB/code-graph

Length of output: 12620


🏁 Script executed:

cat -n tests/test_c_analyzer.py | head -50

Repository: FalkorDB/code-graph

Length of output: 2080


Test cannot run: SourceAnalyzer has no analyze() method.

Line 24 calls analyzer.analyze(path, g), but SourceAnalyzer exposes only analyze_local_folder(), analyze_files(), analyze_sources(), and analyze_local_repository(). The test will fail with AttributeError before reaching any analyzer code.

Additionally, .c and .h are disabled (commented out) in the analyzers dict, so even if the method existed, CAnalyzer would not be invoked.

Either add an analyze() method to SourceAnalyzer, use an existing public method like analyze_local_folder(), or enable .c/.h in the dispatcher if they should be supported.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_c_analyzer.py` at line 24, The test calls analyzer.analyze(path,
g) but SourceAnalyzer does not implement analyze(); either add a public
analyze(self, path, g) wrapper on SourceAnalyzer that delegates to the
appropriate existing method (for example call analyze_local_folder(path, g) or
analyze_sources(path, g) based on input) or update the test to call an existing
method such as analyze_local_folder(path, g); additionally ensure the analyzers
dispatch map enables CAnalyzer by uncommenting the ".c" and ".h" entries in the
analyzers dict so C files are actually handled when the wrapper or chosen method
delegates to CAnalyzer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add support for include_directive in C

2 participants