Bugfix/read only repo indexing #246

tanii1125 · 2026-01-16T10:14:13Z

Closes #237

📝 Description

Modified the repository indexing flow to be read-only during static analysis.
Replaced outdated Tree-sitter query usage, removed unnecessary debug logic, and fixed incorrect failed and pending indexing states caused by wrong HTML error messages returned from Supabase, which led to inconsistent node values.

🔧 Changes Made

1.Modified repository indexing to operate in read-only mode
2. Replaced old Tree-sitter module query usage
3. Fixed incorrect handling of failed and pending states triggered by Supabase HTML error responses
4. Corrected invalid node value parsing and status transitions

📷 Screenshots or Visual Changes (if applicable)

Before Changes-

Indexing failed due to HTML error responses being incorrectly parsed, causing repository mutation, broken node values, and repositories getting stuck in failed or pending states.

After Changes-

Repository indexes successfully with correct status handling and clean node extraction.

🤝 Collaboration

Collaborated with: None

✅ Checklist

I have read the contributing guidelines.
I have validated the fix through existing indexing flows.
I have added or updated logs/comments where applicable.
Any dependent changes have been merged and published in downstream modules.

Summary by CodeRabbit

New Features
- Added a static-only analysis mode to optionally skip dependency scanning.
Bug Fixes
- Repository analysis now returns standardized error responses on failure.
- Improved backend response handling with safe JSON parsing and clear fallback for malformed responses.
Refactor
- Unified symbol-querying logic across analyzers for more consistent extraction.
- Dependency extraction workflow temporarily disabled (read-only/short-circuited).

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2026-01-16T10:14:25Z

📝 Walkthrough

Walkthrough

Adds a default read-only static_only mode to skip dependency analysis, consolidates tree-sitter query capture handling into internal _run_query helpers in Java and Python analyzers, and adds error handling plus safer backend response parsing.

Changes

Cohort / File(s)	Summary
Language Analyzer Query Consolidation `backend/app/database/falkor/code-graph-backend/api/analyzers/java/analyzer.py`, `backend/app/database/falkor/code-graph-backend/api/analyzers/python/analyzer.py`	Add internal `_run_query(self, query, node)` and import `QueryCursor`; replace direct `query.captures(...)` usage with `_run_query(...)` across symbol extraction paths; short-circuit dependency extraction (commented/no-op + early return).
Static-Only Analysis Mode `backend/app/database/falkor/code-graph-backend/api/analyzers/source_analyzer.py`	`SourceAnalyzer.__init__(static_only: bool = True)` added; dependency collection in `first_pass` and `second_pass` invocation in `analyze_sources` are conditional on `static_only`, enabling read-only indexing by default.
API Error Handling `backend/app/database/falkor/code-graph-backend/api/index.py`	Wrap `analyze_repo()` in try/except; on exception log and return `{"status":"error","error": <message>}` with HTTP 500; success response unchanged.
Service Layer Response Handling `backend/app/services/codegraph/repo_service.py`	Read response text then parse with `json.loads` (with parse-error handling) and standardize non-200 error messages to `"Backend returned status {response.status}"`.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant API as index.py
    participant SourceAnalyzer
    participant Analyzer as LanguageAnalyzer
    participant Service as RepoService

    Client->>API: POST /analyze_repo
    rect rgba(100,150,200,0.5)
    API->>SourceAnalyzer: __init__(static_only=True)
    SourceAnalyzer->>SourceAnalyzer: store flag
    API->>SourceAnalyzer: analyze_sources(repo_path)
    end

    rect rgba(150,100,200,0.5)
    Note over SourceAnalyzer,Analyzer: First pass: symbol extraction always runs
    SourceAnalyzer->>Analyzer: add_symbols(file)
    Analyzer->>Analyzer: _run_query(query, node) → captures
    end

    alt static_only = False
        rect rgba(200,150,100,0.5)
        Note over SourceAnalyzer,Analyzer: Dependency analysis and second pass occur
        SourceAnalyzer->>Analyzer: add_dependencies(file)
        Analyzer->>Analyzer: dependency resolution (may be no-op / short-circuited)
        SourceAnalyzer->>SourceAnalyzer: second_pass()
        end
    else static_only = True
        Note over SourceAnalyzer: Dependency & second pass skipped
    end

    rect rgba(100,200,150,0.5)
    SourceAnalyzer->>API: analysis results
    API->>Service: send results
    Service->>API: parse text → json.loads (handle parse errors)
    API->>Client: {"status":"success","node_count":N,"edge_count":M} or error
    end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

PR #138: Similar refactor adding _run_query and replacing direct Tree-sitter captures; touches the same analyzer files.

Suggested reviewers

chandansgowda

Poem

🐰 I hopped through trees and gathered every capture bright,

Left venvs untouched beneath the pale moonlight.
Queries now queued in a neat little row,
Errors caught gently — indexing stays slow and low.
Read-only hops keep the repo cozy tonight. 🥕

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 26.32% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'Bugfix/read only repo indexing' directly addresses the main objective from issue `#237` to make repository indexing read-only and avoid dependency installs.
Linked Issues check	✅ Passed	The changes implement the core requirements from issue `#237`: introduce static_only mode [SourceAnalyzer], disable dependency resolution [JavaAnalyzer, PythonAnalyzer], and add error handling [index.py, repo_service.py].
Out of Scope Changes check	✅ Passed	All changes are directly related to making indexing read-only and handling errors properly; no unrelated modifications detected.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧹 Recent nitpick comments

backend/app/database/falkor/code-graph-backend/api/analyzers/java/analyzer.py (2)
22-30: LGTM!

The _run_query helper correctly aggregates Tree-sitter query captures into a dictionary keyed by capture name. This consolidates query execution logic and aligns with the similar pattern in the Python analyzer.

Consider adding type hints for clarity:
✨ Optional: Add type hints
-    def _run_query(self, query, node):
+    def _run_query(self, query: Query, node: Node) -> dict[str, list[Node]]:
         cursor = QueryCursor(query)
-        captures = {}
+        captures: dict[str, list[Node]] = {}
Note: You'll need to import Query from tree_sitter if not already imported.
32-50: Read-only mode implementation looks correct.

The method is now a no-op as intended per issue #237. The unused path and files parameters (flagged by static analysis) are kept for API compatibility with the abstract method signature.

Consider using explicit unused markers if your linter supports them, or add a brief inline comment:
def add_dependencies(self, path: Path, files: list[Path]):
    # Read-only mode: dependency extraction disabled for static analysis
    _ = path, files  # Unused but required by interface
    return

📜 Recent review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 895e877 and 5236304.

📒 Files selected for processing (1)

backend/app/database/falkor/code-graph-backend/api/analyzers/java/analyzer.py

🧰 Additional context used

🪛 Ruff (0.14.11)

backend/app/database/falkor/code-graph-backend/api/analyzers/java/analyzer.py

32-32: Unused method argument: path

(ARG002)

32-32: Unused method argument: files

(ARG002)

🔇 Additional comments (3)

backend/app/database/falkor/code-graph-backend/api/analyzers/java/analyzer.py (3)

11-11: LGTM!

The import is now correctly consolidated, addressing the previous duplicate Node import issue.

80-101: LGTM!

The variable name mismatches from the previous review have been fixed (interfaces_captures, base_class_captures, extends_captures), and the trailing ? has been removed from the extends_query. The symbol extraction logic is now correct and consistent.

102-115: LGTM!

The method/constructor symbol extraction correctly uses _run_query for both call references and parameters. The sequential reuse of the captures variable is appropriate here since the queries are processed independently.

_{✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🤖 Fix all issues with AI agents

In
`@backend/app/database/falkor/code-graph-backend/api/analyzers/java/analyzer.py`:
- Around line 11-12: Remove the duplicate import of Node from tree_sitter in
analyzer.py by consolidating the two import lines into a single import that
includes Language, QueryCursor, Query, and Node (so only one "Node" is
imported); update the import statement that currently lists "from tree_sitter
import Language, Node" and "from tree_sitter import QueryCursor, Query, Node" to
a single line referencing Language, QueryCursor, Query, and Node to eliminate
the duplicate symbol.
- Around line 84-102: The issue: query results are stored to the local variable
captures but the code later reads undefined names interfaces_captures,
base_class_captures, and extends_captures causing NameError; fix by using the
same variable names returned from _run_query (or assign captures to those
specific names) so the checks and loops reference the actual results—for
example, for interfaces_query, base_class_query, and extends_query ensure you
either keep using captures everywhere or immediately assign captures to
interfaces_captures/base_class_captures/extends_captures before using them;
update the checks (e.g., if 'interface' in ...) and loops that call
entity.add_symbol("implement_interface", ...), entity.add_symbol("base_class",
...), and entity.add_symbol("extend_interface", ...) to reference the corrected
query result variables.

In
`@backend/app/database/falkor/code-graph-backend/api/analyzers/python/analyzer.py`:
- Around line 12-13: The import list duplicates Node; remove the duplicate by
consolidating the imports so Node is only imported once—e.g., keep "from
tree_sitter import Language, Node" and change the second line to "from
tree_sitter import QueryCursor, Query" (or combine all four as "from tree_sitter
import Language, Node, QueryCursor, Query") so that Node is not imported twice;
update imports around the symbols Language, Node, QueryCursor, and Query
accordingly.

🧹 Nitpick comments (1)

backend/app/services/codegraph/repo_service.py (1)
117-123: Robust JSON parsing with graceful fallback.

The change from response.json() to response.text() with manual JSON parsing handles the case where the backend returns HTML error responses instead of JSON. Consider narrowing the exception type for clarity.
♻️ Optional: Use specific exception type
                        raw_text= await response.text()
                        try:
                            import json
                            data = json.loads(raw_text) if raw_text else {}
-                        except Exception:
+                        except json.JSONDecodeError:
                            logger.debug("Failed to parse backend JSON")
                            data= {}

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between aa66519 and 03d85d4.

📒 Files selected for processing (5)

backend/app/database/falkor/code-graph-backend/api/analyzers/java/analyzer.py
backend/app/database/falkor/code-graph-backend/api/analyzers/python/analyzer.py
backend/app/database/falkor/code-graph-backend/api/analyzers/source_analyzer.py
backend/app/database/falkor/code-graph-backend/api/index.py
backend/app/services/codegraph/repo_service.py

🧰 Additional context used

🧬 Code graph analysis (3)

backend/app/database/falkor/code-graph-backend/api/index.py (3)

backend/app/database/falkor/code-graph-backend/api/project.py (4)

Project (40-113)

from_git_repository (51-61)

analyze_sources (80-95)

process_git_history (97-113)

backend/app/database/falkor/code-graph-backend/api/analyzers/source_analyzer.py (1)

analyze_sources (167-174)

backend/app/database/falkor/code-graph-backend/api/graph.py (1)

stats (592-609)

backend/app/database/falkor/code-graph-backend/api/analyzers/source_analyzer.py (4)

backend/app/database/falkor/code-graph-backend/api/entities/file.py (1)

File (7-36)

backend/app/database/falkor/code-graph-backend/api/analyzers/python/analyzer.py (1)

add_dependencies (32-57)

backend/app/database/falkor/code-graph-backend/api/analyzers/java/analyzer.py (1)

add_dependencies (33-51)

backend/app/database/falkor/code-graph-backend/api/analyzers/analyzer.py (1)

add_dependencies (57-66)

backend/app/database/falkor/code-graph-backend/api/analyzers/java/analyzer.py (3)

backend/app/database/falkor/code-graph-backend/api/analyzers/python/analyzer.py (2)

_run_query (22-30)

add_dependencies (32-57)

backend/app/database/falkor/code-graph-backend/api/analyzers/analyzer.py (1)

add_dependencies (57-66)

backend/app/database/falkor/code-graph-backend/api/entities/entity.py (1)

add_symbol (12-15)

🪛 Ruff (0.14.11)

backend/app/services/codegraph/repo_service.py

121-121: Do not catch blind exception: Exception

(BLE001)

backend/app/database/falkor/code-graph-backend/api/analyzers/python/analyzer.py

13-13: Redefinition of unused Node from line 12: Node redefined here

Remove definition: Node

(F811)

32-32: Unused method argument: path

(ARG002)

32-32: Unused method argument: files

(ARG002)

backend/app/database/falkor/code-graph-backend/api/analyzers/java/analyzer.py

12-12: Redefinition of unused Node from line 11: Node redefined here

Remove definition: Node

(F811)

33-33: Unused method argument: path

(ARG002)

33-33: Unused method argument: files

(ARG002)

87-87: interfaces_captures may be undefined, or defined from star imports

(F405)

88-88: interfaces_captures may be undefined, or defined from star imports

(F405)

93-93: base_class_captures may be undefined, or defined from star imports

(F405)

94-94: base_class_captures may be undefined, or defined from star imports

(F405)

100-100: extends_captures may be undefined, or defined from star imports

(F405)

101-101: extends_captures may be undefined, or defined from star imports

(F405)

🔇 Additional comments (11)

backend/app/database/falkor/code-graph-backend/api/analyzers/source_analyzer.py (3)

34-36: LGTM! Good default for read-only mode.

The static_only flag with a default of True ensures that repository indexing operates in read-only mode by default, which aligns with the PR objective to prevent dependency installs and repository mutation during static analysis.

86-88: LGTM! Dependency analysis correctly gated.

The conditional check ensures dependency extraction only runs when static_only=False, preventing filesystem mutations during static-only analysis.

171-174: LGTM! Second pass analysis correctly gated.

Skipping the second pass (which uses language servers and resolves symbols) when static_only=True is appropriate for read-only indexing that focuses on AST-only analysis.

backend/app/services/codegraph/repo_service.py (1)

143-143: LGTM! Cleaner error messaging.

Standardizing the error message to just the status code avoids exposing potentially sensitive HTML content from failed backend responses.

backend/app/database/falkor/code-graph-backend/api/analyzers/python/analyzer.py (3)

22-30: LGTM! Clean helper for tree-sitter query captures.

The _run_query helper provides a consistent interface for collecting captures from tree-sitter queries, returning a dictionary keyed by capture name. This mirrors the same pattern in JavaAnalyzer, promoting code consistency across analyzers.

32-57: LGTM! Read-only dependency handling.

Commenting out the dependency installation logic and returning early ensures the analyzer doesn't mutate the filesystem, aligning with the read-only indexing objective.

88-107: LGTM! Symbol extraction refactored to use _run_query.

The refactored symbol extraction correctly uses the new _run_query helper and properly checks for capture keys before iterating.

backend/app/database/falkor/code-graph-backend/api/analyzers/java/analyzer.py (3)

23-31: LGTM! Consistent _run_query helper.

The helper is identical to PythonAnalyzer._run_query, providing consistent query-capture handling across analyzers.

33-51: LGTM! Read-only dependency handling.

The dependency extraction logic is properly disabled for read-only mode.

104-116: LGTM! Correct variable usage for method/constructor symbols.

This section correctly uses captures variable consistently after _run_query calls.

backend/app/database/falkor/code-graph-backend/api/index.py (1)

408-428: LGTM! Proper error handling for repository analysis.

Wrapping the analysis flow in a try/except ensures that failures return a structured 500 response with error details instead of crashing. Using logger.exception captures the full traceback for debugging while the response provides a summary to the caller.

_{✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.}

backend/app/database/falkor/code-graph-backend/api/analyzers/java/analyzer.py

backend/app/database/falkor/code-graph-backend/api/analyzers/python/analyzer.py

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In
`@backend/app/database/falkor/code-graph-backend/api/analyzers/java/analyzer.py`:
- Line 96: The extends_query string in analyzer.py currently includes a trailing
'?' quantifier ("(extends_interfaces (type_list (type_identifier) `@type`))?")
which is redundant and inconsistent with other queries; update the extends_query
assignment to remove the trailing '?' so it becomes "(extends_interfaces
(type_list (type_identifier) `@type`))", leaving the rest of the analyzer.py logic
unchanged and referencing the existing extends_query variable.

♻️ Duplicate comments (1)

backend/app/database/falkor/code-graph-backend/api/analyzers/java/analyzer.py (1)

80-101: Variable name mismatch issue is now fixed.

The previous critical NameError issue has been correctly addressed - interfaces_captures, base_class_captures, and extends_captures are now properly assigned from _run_query and consistently referenced.

🧹 Nitpick comments (1)

backend/app/database/falkor/code-graph-backend/api/analyzers/java/analyzer.py (1)

32-50: LGTM - Read-only mode implemented correctly.

The method now returns early with dependency extraction disabled, aligning with the PR objective to make indexing read-only. The commented code preserves the original logic for potential future use.

The unused path and files arguments are expected since the method is stubbed out. If you want to silence linter warnings, you could prefix them with underscores, but this is optional given the commented implementation remains.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 659dc11 and 895e877.

📒 Files selected for processing (1)

backend/app/database/falkor/code-graph-backend/api/analyzers/java/analyzer.py

🧰 Additional context used

🧬 Code graph analysis (1)

backend/app/database/falkor/code-graph-backend/api/analyzers/java/analyzer.py (3)

backend/app/database/falkor/code-graph-backend/api/analyzers/python/analyzer.py (2)

_run_query (21-29)

add_dependencies (31-56)

backend/app/database/falkor/code-graph-backend/api/analyzers/analyzer.py (1)

add_dependencies (57-66)

backend/app/database/falkor/code-graph-backend/api/entities/entity.py (1)

add_symbol (12-15)

🪛 Ruff (0.14.11)

backend/app/database/falkor/code-graph-backend/api/analyzers/java/analyzer.py

32-32: Unused method argument: path

(ARG002)

32-32: Unused method argument: files

(ARG002)

🔇 Additional comments (3)

backend/app/database/falkor/code-graph-backend/api/analyzers/java/analyzer.py (3)

11-11: LGTM!

The import consolidation correctly addresses the previous duplicate Node import issue and adds QueryCursor for the new _run_query helper.

22-30: LGTM!

The _run_query helper is correctly implemented and consistent with the Python analyzer's implementation. This consolidates the tree-sitter query execution pattern cleanly.

102-115: LGTM!

The method invocation and parameter extraction logic correctly uses _run_query and properly checks for capture keys before processing.

_{✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.}

backend/app/database/falkor/code-graph-backend/api/analyzers/java/analyzer.py

tanii1125 added 3 commits January 16, 2026 15:07

fix(indexing): make repository indexing read-only

7ffd6ef

chore: remove redundant debug logic from indexing fix

f3b7f12

chore: remove redundant debug logic from indexing fix

03d85d4

coderabbitai bot reviewed Jan 16, 2026

View reviewed changes

tanii1125 added 2 commits January 16, 2026 18:40

Removed import and NameError

659dc11

Fixed NameError

895e877

coderabbitai bot reviewed Jan 16, 2026

View reviewed changes

backend/app/database/falkor/code-graph-backend/api/analyzers/java/analyzer.py Outdated Show resolved Hide resolved

Fix query syntax for extends_interfaces in analyzer.py

5236304

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bugfix/read only repo indexing #246

Bugfix/read only repo indexing #246

tanii1125 commented Jan 16, 2026 •

edited

Loading

Uh oh!

coderabbitai bot commented Jan 16, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Bugfix/read only repo indexing #246

Are you sure you want to change the base?

Bugfix/read only repo indexing #246

Conversation

tanii1125 commented Jan 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📝 Description

🔧 Changes Made

📷 Screenshots or Visual Changes (if applicable)

Before Changes-

After Changes-

🤝 Collaboration

✅ Checklist

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jan 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

tanii1125 commented Jan 16, 2026 •

edited

Loading

coderabbitai bot commented Jan 16, 2026 •

edited

Loading