Skip to content

Conversation

@tanii1125
Copy link

@tanii1125 tanii1125 commented Jan 16, 2026

Closes #237

📝 Description

Modified the repository indexing flow to be read-only during static analysis.
Replaced outdated Tree-sitter query usage, removed unnecessary debug logic, and fixed incorrect failed and pending indexing states caused by wrong HTML error messages returned from Supabase, which led to inconsistent node values.

🔧 Changes Made

1.Modified repository indexing to operate in read-only mode
2. Replaced old Tree-sitter module query usage
3. Fixed incorrect handling of failed and pending states triggered by Supabase HTML error responses
4. Corrected invalid node value parsing and status transitions

📷 Screenshots or Visual Changes (if applicable)

Before Changes-

image

Indexing failed due to HTML error responses being incorrectly parsed, causing repository mutation, broken node values, and repositories getting stuck in failed or pending states.

After Changes-

image

Repository indexes successfully with correct status handling and clean node extraction.

🤝 Collaboration

Collaborated with: None

✅ Checklist

  • I have read the contributing guidelines.
  • I have validated the fix through existing indexing flows.
  • I have added or updated logs/comments where applicable.
  • Any dependent changes have been merged and published in downstream modules.

Summary by CodeRabbit

  • New Features

    • Added a static-only analysis mode to optionally skip dependency scanning.
  • Bug Fixes

    • Repository analysis now returns standardized error responses on failure.
    • Improved backend response handling with safe JSON parsing and clear fallback for malformed responses.
  • Refactor

    • Unified symbol-querying logic across analyzers for more consistent extraction.
    • Dependency extraction workflow temporarily disabled (read-only/short-circuited).

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 16, 2026

📝 Walkthrough

Walkthrough

Adds a default read-only static_only mode to skip dependency analysis, consolidates tree-sitter query capture handling into internal _run_query helpers in Java and Python analyzers, and adds error handling plus safer backend response parsing.

Changes

Cohort / File(s) Summary
Language Analyzer Query Consolidation
backend/app/database/falkor/code-graph-backend/api/analyzers/java/analyzer.py, backend/app/database/falkor/code-graph-backend/api/analyzers/python/analyzer.py
Add internal _run_query(self, query, node) and import QueryCursor; replace direct query.captures(...) usage with _run_query(...) across symbol extraction paths; short-circuit dependency extraction (commented/no-op + early return).
Static-Only Analysis Mode
backend/app/database/falkor/code-graph-backend/api/analyzers/source_analyzer.py
SourceAnalyzer.__init__(static_only: bool = True) added; dependency collection in first_pass and second_pass invocation in analyze_sources are conditional on static_only, enabling read-only indexing by default.
API Error Handling
backend/app/database/falkor/code-graph-backend/api/index.py
Wrap analyze_repo() in try/except; on exception log and return {"status":"error","error": <message>} with HTTP 500; success response unchanged.
Service Layer Response Handling
backend/app/services/codegraph/repo_service.py
Read response text then parse with json.loads (with parse-error handling) and standardize non-200 error messages to "Backend returned status {response.status}".

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant API as index.py
    participant SourceAnalyzer
    participant Analyzer as LanguageAnalyzer
    participant Service as RepoService

    Client->>API: POST /analyze_repo
    rect rgba(100,150,200,0.5)
    API->>SourceAnalyzer: __init__(static_only=True)
    SourceAnalyzer->>SourceAnalyzer: store flag
    API->>SourceAnalyzer: analyze_sources(repo_path)
    end

    rect rgba(150,100,200,0.5)
    Note over SourceAnalyzer,Analyzer: First pass: symbol extraction always runs
    SourceAnalyzer->>Analyzer: add_symbols(file)
    Analyzer->>Analyzer: _run_query(query, node) → captures
    end

    alt static_only = False
        rect rgba(200,150,100,0.5)
        Note over SourceAnalyzer,Analyzer: Dependency analysis and second pass occur
        SourceAnalyzer->>Analyzer: add_dependencies(file)
        Analyzer->>Analyzer: dependency resolution (may be no-op / short-circuited)
        SourceAnalyzer->>SourceAnalyzer: second_pass()
        end
    else static_only = True
        Note over SourceAnalyzer: Dependency & second pass skipped
    end

    rect rgba(100,200,150,0.5)
    SourceAnalyzer->>API: analysis results
    API->>Service: send results
    Service->>API: parse text → json.loads (handle parse errors)
    API->>Client: {"status":"success","node_count":N,"edge_count":M} or error
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • PR #138: Similar refactor adding _run_query and replacing direct Tree-sitter captures; touches the same analyzer files.

Suggested reviewers

  • chandansgowda

Poem

🐰 I hopped through trees and gathered every capture bright,

Left venvs untouched beneath the pale moonlight.
Queries now queued in a neat little row,
Errors caught gently — indexing stays slow and low.
Read-only hops keep the repo cozy tonight. 🥕

🚥 Pre-merge checks | ✅ 4 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 26.32% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Bugfix/read only repo indexing' directly addresses the main objective from issue #237 to make repository indexing read-only and avoid dependency installs.
Linked Issues check ✅ Passed The changes implement the core requirements from issue #237: introduce static_only mode [SourceAnalyzer], disable dependency resolution [JavaAnalyzer, PythonAnalyzer], and add error handling [index.py, repo_service.py].
Out of Scope Changes check ✅ Passed All changes are directly related to making indexing read-only and handling errors properly; no unrelated modifications detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

🧹 Recent nitpick comments
backend/app/database/falkor/code-graph-backend/api/analyzers/java/analyzer.py (2)

22-30: LGTM!

The _run_query helper correctly aggregates Tree-sitter query captures into a dictionary keyed by capture name. This consolidates query execution logic and aligns with the similar pattern in the Python analyzer.

Consider adding type hints for clarity:

✨ Optional: Add type hints
-    def _run_query(self, query, node):
+    def _run_query(self, query: Query, node: Node) -> dict[str, list[Node]]:
         cursor = QueryCursor(query)
-        captures = {}
+        captures: dict[str, list[Node]] = {}

Note: You'll need to import Query from tree_sitter if not already imported.


32-50: Read-only mode implementation looks correct.

The method is now a no-op as intended per issue #237. The unused path and files parameters (flagged by static analysis) are kept for API compatibility with the abstract method signature.

Consider using explicit unused markers if your linter supports them, or add a brief inline comment:

def add_dependencies(self, path: Path, files: list[Path]):
    # Read-only mode: dependency extraction disabled for static analysis
    _ = path, files  # Unused but required by interface
    return

📜 Recent review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 895e877 and 5236304.

📒 Files selected for processing (1)
  • backend/app/database/falkor/code-graph-backend/api/analyzers/java/analyzer.py
🧰 Additional context used
🪛 Ruff (0.14.11)
backend/app/database/falkor/code-graph-backend/api/analyzers/java/analyzer.py

32-32: Unused method argument: path

(ARG002)


32-32: Unused method argument: files

(ARG002)

🔇 Additional comments (3)
backend/app/database/falkor/code-graph-backend/api/analyzers/java/analyzer.py (3)

11-11: LGTM!

The import is now correctly consolidated, addressing the previous duplicate Node import issue.


80-101: LGTM!

The variable name mismatches from the previous review have been fixed (interfaces_captures, base_class_captures, extends_captures), and the trailing ? has been removed from the extends_query. The symbol extraction logic is now correct and consistent.


102-115: LGTM!

The method/constructor symbol extraction correctly uses _run_query for both call references and parameters. The sequential reuse of the captures variable is appropriate here since the queries are processed independently.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Fix all issues with AI agents
In
`@backend/app/database/falkor/code-graph-backend/api/analyzers/java/analyzer.py`:
- Around line 11-12: Remove the duplicate import of Node from tree_sitter in
analyzer.py by consolidating the two import lines into a single import that
includes Language, QueryCursor, Query, and Node (so only one "Node" is
imported); update the import statement that currently lists "from tree_sitter
import Language, Node" and "from tree_sitter import QueryCursor, Query, Node" to
a single line referencing Language, QueryCursor, Query, and Node to eliminate
the duplicate symbol.
- Around line 84-102: The issue: query results are stored to the local variable
captures but the code later reads undefined names interfaces_captures,
base_class_captures, and extends_captures causing NameError; fix by using the
same variable names returned from _run_query (or assign captures to those
specific names) so the checks and loops reference the actual results—for
example, for interfaces_query, base_class_query, and extends_query ensure you
either keep using captures everywhere or immediately assign captures to
interfaces_captures/base_class_captures/extends_captures before using them;
update the checks (e.g., if 'interface' in ...) and loops that call
entity.add_symbol("implement_interface", ...), entity.add_symbol("base_class",
...), and entity.add_symbol("extend_interface", ...) to reference the corrected
query result variables.

In
`@backend/app/database/falkor/code-graph-backend/api/analyzers/python/analyzer.py`:
- Around line 12-13: The import list duplicates Node; remove the duplicate by
consolidating the imports so Node is only imported once—e.g., keep "from
tree_sitter import Language, Node" and change the second line to "from
tree_sitter import QueryCursor, Query" (or combine all four as "from tree_sitter
import Language, Node, QueryCursor, Query") so that Node is not imported twice;
update imports around the symbols Language, Node, QueryCursor, and Query
accordingly.
🧹 Nitpick comments (1)
backend/app/services/codegraph/repo_service.py (1)

117-123: Robust JSON parsing with graceful fallback.

The change from response.json() to response.text() with manual JSON parsing handles the case where the backend returns HTML error responses instead of JSON. Consider narrowing the exception type for clarity.

♻️ Optional: Use specific exception type
                        raw_text= await response.text()
                        try:
                            import json
                            data = json.loads(raw_text) if raw_text else {}
-                        except Exception:
+                        except json.JSONDecodeError:
                            logger.debug("Failed to parse backend JSON")
                            data= {}
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between aa66519 and 03d85d4.

📒 Files selected for processing (5)
  • backend/app/database/falkor/code-graph-backend/api/analyzers/java/analyzer.py
  • backend/app/database/falkor/code-graph-backend/api/analyzers/python/analyzer.py
  • backend/app/database/falkor/code-graph-backend/api/analyzers/source_analyzer.py
  • backend/app/database/falkor/code-graph-backend/api/index.py
  • backend/app/services/codegraph/repo_service.py
🧰 Additional context used
🧬 Code graph analysis (3)
backend/app/database/falkor/code-graph-backend/api/index.py (3)
backend/app/database/falkor/code-graph-backend/api/project.py (4)
  • Project (40-113)
  • from_git_repository (51-61)
  • analyze_sources (80-95)
  • process_git_history (97-113)
backend/app/database/falkor/code-graph-backend/api/analyzers/source_analyzer.py (1)
  • analyze_sources (167-174)
backend/app/database/falkor/code-graph-backend/api/graph.py (1)
  • stats (592-609)
backend/app/database/falkor/code-graph-backend/api/analyzers/source_analyzer.py (4)
backend/app/database/falkor/code-graph-backend/api/entities/file.py (1)
  • File (7-36)
backend/app/database/falkor/code-graph-backend/api/analyzers/python/analyzer.py (1)
  • add_dependencies (32-57)
backend/app/database/falkor/code-graph-backend/api/analyzers/java/analyzer.py (1)
  • add_dependencies (33-51)
backend/app/database/falkor/code-graph-backend/api/analyzers/analyzer.py (1)
  • add_dependencies (57-66)
backend/app/database/falkor/code-graph-backend/api/analyzers/java/analyzer.py (3)
backend/app/database/falkor/code-graph-backend/api/analyzers/python/analyzer.py (2)
  • _run_query (22-30)
  • add_dependencies (32-57)
backend/app/database/falkor/code-graph-backend/api/analyzers/analyzer.py (1)
  • add_dependencies (57-66)
backend/app/database/falkor/code-graph-backend/api/entities/entity.py (1)
  • add_symbol (12-15)
🪛 Ruff (0.14.11)
backend/app/services/codegraph/repo_service.py

121-121: Do not catch blind exception: Exception

(BLE001)

backend/app/database/falkor/code-graph-backend/api/analyzers/python/analyzer.py

13-13: Redefinition of unused Node from line 12: Node redefined here

Remove definition: Node

(F811)


32-32: Unused method argument: path

(ARG002)


32-32: Unused method argument: files

(ARG002)

backend/app/database/falkor/code-graph-backend/api/analyzers/java/analyzer.py

12-12: Redefinition of unused Node from line 11: Node redefined here

Remove definition: Node

(F811)


33-33: Unused method argument: path

(ARG002)


33-33: Unused method argument: files

(ARG002)


87-87: interfaces_captures may be undefined, or defined from star imports

(F405)


88-88: interfaces_captures may be undefined, or defined from star imports

(F405)


93-93: base_class_captures may be undefined, or defined from star imports

(F405)


94-94: base_class_captures may be undefined, or defined from star imports

(F405)


100-100: extends_captures may be undefined, or defined from star imports

(F405)


101-101: extends_captures may be undefined, or defined from star imports

(F405)

🔇 Additional comments (11)
backend/app/database/falkor/code-graph-backend/api/analyzers/source_analyzer.py (3)

34-36: LGTM! Good default for read-only mode.

The static_only flag with a default of True ensures that repository indexing operates in read-only mode by default, which aligns with the PR objective to prevent dependency installs and repository mutation during static analysis.


86-88: LGTM! Dependency analysis correctly gated.

The conditional check ensures dependency extraction only runs when static_only=False, preventing filesystem mutations during static-only analysis.


171-174: LGTM! Second pass analysis correctly gated.

Skipping the second pass (which uses language servers and resolves symbols) when static_only=True is appropriate for read-only indexing that focuses on AST-only analysis.

backend/app/services/codegraph/repo_service.py (1)

143-143: LGTM! Cleaner error messaging.

Standardizing the error message to just the status code avoids exposing potentially sensitive HTML content from failed backend responses.

backend/app/database/falkor/code-graph-backend/api/analyzers/python/analyzer.py (3)

22-30: LGTM! Clean helper for tree-sitter query captures.

The _run_query helper provides a consistent interface for collecting captures from tree-sitter queries, returning a dictionary keyed by capture name. This mirrors the same pattern in JavaAnalyzer, promoting code consistency across analyzers.


32-57: LGTM! Read-only dependency handling.

Commenting out the dependency installation logic and returning early ensures the analyzer doesn't mutate the filesystem, aligning with the read-only indexing objective.


88-107: LGTM! Symbol extraction refactored to use _run_query.

The refactored symbol extraction correctly uses the new _run_query helper and properly checks for capture keys before iterating.

backend/app/database/falkor/code-graph-backend/api/analyzers/java/analyzer.py (3)

23-31: LGTM! Consistent _run_query helper.

The helper is identical to PythonAnalyzer._run_query, providing consistent query-capture handling across analyzers.


33-51: LGTM! Read-only dependency handling.

The dependency extraction logic is properly disabled for read-only mode.


104-116: LGTM! Correct variable usage for method/constructor symbols.

This section correctly uses captures variable consistently after _run_query calls.

backend/app/database/falkor/code-graph-backend/api/index.py (1)

408-428: LGTM! Proper error handling for repository analysis.

Wrapping the analysis flow in a try/except ensures that failures return a structured 500 response with error details instead of crashing. Using logger.exception captures the full traceback for debugging while the response provides a summary to the caller.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In
`@backend/app/database/falkor/code-graph-backend/api/analyzers/java/analyzer.py`:
- Line 96: The extends_query string in analyzer.py currently includes a trailing
'?' quantifier ("(extends_interfaces (type_list (type_identifier) `@type`))?")
which is redundant and inconsistent with other queries; update the extends_query
assignment to remove the trailing '?' so it becomes "(extends_interfaces
(type_list (type_identifier) `@type`))", leaving the rest of the analyzer.py logic
unchanged and referencing the existing extends_query variable.
♻️ Duplicate comments (1)
backend/app/database/falkor/code-graph-backend/api/analyzers/java/analyzer.py (1)

80-101: Variable name mismatch issue is now fixed.

The previous critical NameError issue has been correctly addressed - interfaces_captures, base_class_captures, and extends_captures are now properly assigned from _run_query and consistently referenced.

🧹 Nitpick comments (1)
backend/app/database/falkor/code-graph-backend/api/analyzers/java/analyzer.py (1)

32-50: LGTM - Read-only mode implemented correctly.

The method now returns early with dependency extraction disabled, aligning with the PR objective to make indexing read-only. The commented code preserves the original logic for potential future use.

The unused path and files arguments are expected since the method is stubbed out. If you want to silence linter warnings, you could prefix them with underscores, but this is optional given the commented implementation remains.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 659dc11 and 895e877.

📒 Files selected for processing (1)
  • backend/app/database/falkor/code-graph-backend/api/analyzers/java/analyzer.py
🧰 Additional context used
🧬 Code graph analysis (1)
backend/app/database/falkor/code-graph-backend/api/analyzers/java/analyzer.py (3)
backend/app/database/falkor/code-graph-backend/api/analyzers/python/analyzer.py (2)
  • _run_query (21-29)
  • add_dependencies (31-56)
backend/app/database/falkor/code-graph-backend/api/analyzers/analyzer.py (1)
  • add_dependencies (57-66)
backend/app/database/falkor/code-graph-backend/api/entities/entity.py (1)
  • add_symbol (12-15)
🪛 Ruff (0.14.11)
backend/app/database/falkor/code-graph-backend/api/analyzers/java/analyzer.py

32-32: Unused method argument: path

(ARG002)


32-32: Unused method argument: files

(ARG002)

🔇 Additional comments (3)
backend/app/database/falkor/code-graph-backend/api/analyzers/java/analyzer.py (3)

11-11: LGTM!

The import consolidation correctly addresses the previous duplicate Node import issue and adds QueryCursor for the new _run_query helper.


22-30: LGTM!

The _run_query helper is correctly implemented and consistent with the Python analyzer's implementation. This consolidates the tree-sitter query execution pattern cleanly.


102-115: LGTM!

The method invocation and parameter extraction logic correctly uses _run_query and properly checks for capture keys before processing.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BUG: Make repository indexing read-only (avoid dependency installs during static analysis)

1 participant