-
Notifications
You must be signed in to change notification settings - Fork 120
Bugfix/read only repo indexing #246
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Bugfix/read only repo indexing #246
Conversation
📝 WalkthroughWalkthroughAdds a default read-only static_only mode to skip dependency analysis, consolidates tree-sitter query capture handling into internal _run_query helpers in Java and Python analyzers, and adds error handling plus safer backend response parsing. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant API as index.py
participant SourceAnalyzer
participant Analyzer as LanguageAnalyzer
participant Service as RepoService
Client->>API: POST /analyze_repo
rect rgba(100,150,200,0.5)
API->>SourceAnalyzer: __init__(static_only=True)
SourceAnalyzer->>SourceAnalyzer: store flag
API->>SourceAnalyzer: analyze_sources(repo_path)
end
rect rgba(150,100,200,0.5)
Note over SourceAnalyzer,Analyzer: First pass: symbol extraction always runs
SourceAnalyzer->>Analyzer: add_symbols(file)
Analyzer->>Analyzer: _run_query(query, node) → captures
end
alt static_only = False
rect rgba(200,150,100,0.5)
Note over SourceAnalyzer,Analyzer: Dependency analysis and second pass occur
SourceAnalyzer->>Analyzer: add_dependencies(file)
Analyzer->>Analyzer: dependency resolution (may be no-op / short-circuited)
SourceAnalyzer->>SourceAnalyzer: second_pass()
end
else static_only = True
Note over SourceAnalyzer: Dependency & second pass skipped
end
rect rgba(100,200,150,0.5)
SourceAnalyzer->>API: analysis results
API->>Service: send results
Service->>API: parse text → json.loads (handle parse errors)
API->>Client: {"status":"success","node_count":N,"edge_count":M} or error
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧹 Recent nitpick comments
📜 Recent review detailsConfiguration used: defaults Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🧰 Additional context used🪛 Ruff (0.14.11)backend/app/database/falkor/code-graph-backend/api/analyzers/java/analyzer.py32-32: Unused method argument: (ARG002) 32-32: Unused method argument: (ARG002) 🔇 Additional comments (3)
✏️ Tip: You can disable this entire section by setting Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In
`@backend/app/database/falkor/code-graph-backend/api/analyzers/java/analyzer.py`:
- Around line 11-12: Remove the duplicate import of Node from tree_sitter in
analyzer.py by consolidating the two import lines into a single import that
includes Language, QueryCursor, Query, and Node (so only one "Node" is
imported); update the import statement that currently lists "from tree_sitter
import Language, Node" and "from tree_sitter import QueryCursor, Query, Node" to
a single line referencing Language, QueryCursor, Query, and Node to eliminate
the duplicate symbol.
- Around line 84-102: The issue: query results are stored to the local variable
captures but the code later reads undefined names interfaces_captures,
base_class_captures, and extends_captures causing NameError; fix by using the
same variable names returned from _run_query (or assign captures to those
specific names) so the checks and loops reference the actual results—for
example, for interfaces_query, base_class_query, and extends_query ensure you
either keep using captures everywhere or immediately assign captures to
interfaces_captures/base_class_captures/extends_captures before using them;
update the checks (e.g., if 'interface' in ...) and loops that call
entity.add_symbol("implement_interface", ...), entity.add_symbol("base_class",
...), and entity.add_symbol("extend_interface", ...) to reference the corrected
query result variables.
In
`@backend/app/database/falkor/code-graph-backend/api/analyzers/python/analyzer.py`:
- Around line 12-13: The import list duplicates Node; remove the duplicate by
consolidating the imports so Node is only imported once—e.g., keep "from
tree_sitter import Language, Node" and change the second line to "from
tree_sitter import QueryCursor, Query" (or combine all four as "from tree_sitter
import Language, Node, QueryCursor, Query") so that Node is not imported twice;
update imports around the symbols Language, Node, QueryCursor, and Query
accordingly.
🧹 Nitpick comments (1)
backend/app/services/codegraph/repo_service.py (1)
117-123: Robust JSON parsing with graceful fallback.The change from
response.json()toresponse.text()with manual JSON parsing handles the case where the backend returns HTML error responses instead of JSON. Consider narrowing the exception type for clarity.♻️ Optional: Use specific exception type
raw_text= await response.text() try: import json data = json.loads(raw_text) if raw_text else {} - except Exception: + except json.JSONDecodeError: logger.debug("Failed to parse backend JSON") data= {}
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
backend/app/database/falkor/code-graph-backend/api/analyzers/java/analyzer.pybackend/app/database/falkor/code-graph-backend/api/analyzers/python/analyzer.pybackend/app/database/falkor/code-graph-backend/api/analyzers/source_analyzer.pybackend/app/database/falkor/code-graph-backend/api/index.pybackend/app/services/codegraph/repo_service.py
🧰 Additional context used
🧬 Code graph analysis (3)
backend/app/database/falkor/code-graph-backend/api/index.py (3)
backend/app/database/falkor/code-graph-backend/api/project.py (4)
Project(40-113)from_git_repository(51-61)analyze_sources(80-95)process_git_history(97-113)backend/app/database/falkor/code-graph-backend/api/analyzers/source_analyzer.py (1)
analyze_sources(167-174)backend/app/database/falkor/code-graph-backend/api/graph.py (1)
stats(592-609)
backend/app/database/falkor/code-graph-backend/api/analyzers/source_analyzer.py (4)
backend/app/database/falkor/code-graph-backend/api/entities/file.py (1)
File(7-36)backend/app/database/falkor/code-graph-backend/api/analyzers/python/analyzer.py (1)
add_dependencies(32-57)backend/app/database/falkor/code-graph-backend/api/analyzers/java/analyzer.py (1)
add_dependencies(33-51)backend/app/database/falkor/code-graph-backend/api/analyzers/analyzer.py (1)
add_dependencies(57-66)
backend/app/database/falkor/code-graph-backend/api/analyzers/java/analyzer.py (3)
backend/app/database/falkor/code-graph-backend/api/analyzers/python/analyzer.py (2)
_run_query(22-30)add_dependencies(32-57)backend/app/database/falkor/code-graph-backend/api/analyzers/analyzer.py (1)
add_dependencies(57-66)backend/app/database/falkor/code-graph-backend/api/entities/entity.py (1)
add_symbol(12-15)
🪛 Ruff (0.14.11)
backend/app/services/codegraph/repo_service.py
121-121: Do not catch blind exception: Exception
(BLE001)
backend/app/database/falkor/code-graph-backend/api/analyzers/python/analyzer.py
13-13: Redefinition of unused Node from line 12: Node redefined here
Remove definition: Node
(F811)
32-32: Unused method argument: path
(ARG002)
32-32: Unused method argument: files
(ARG002)
backend/app/database/falkor/code-graph-backend/api/analyzers/java/analyzer.py
12-12: Redefinition of unused Node from line 11: Node redefined here
Remove definition: Node
(F811)
33-33: Unused method argument: path
(ARG002)
33-33: Unused method argument: files
(ARG002)
87-87: interfaces_captures may be undefined, or defined from star imports
(F405)
88-88: interfaces_captures may be undefined, or defined from star imports
(F405)
93-93: base_class_captures may be undefined, or defined from star imports
(F405)
94-94: base_class_captures may be undefined, or defined from star imports
(F405)
100-100: extends_captures may be undefined, or defined from star imports
(F405)
101-101: extends_captures may be undefined, or defined from star imports
(F405)
🔇 Additional comments (11)
backend/app/database/falkor/code-graph-backend/api/analyzers/source_analyzer.py (3)
34-36: LGTM! Good default for read-only mode.The
static_onlyflag with a default ofTrueensures that repository indexing operates in read-only mode by default, which aligns with the PR objective to prevent dependency installs and repository mutation during static analysis.
86-88: LGTM! Dependency analysis correctly gated.The conditional check ensures dependency extraction only runs when
static_only=False, preventing filesystem mutations during static-only analysis.
171-174: LGTM! Second pass analysis correctly gated.Skipping the second pass (which uses language servers and resolves symbols) when
static_only=Trueis appropriate for read-only indexing that focuses on AST-only analysis.backend/app/services/codegraph/repo_service.py (1)
143-143: LGTM! Cleaner error messaging.Standardizing the error message to just the status code avoids exposing potentially sensitive HTML content from failed backend responses.
backend/app/database/falkor/code-graph-backend/api/analyzers/python/analyzer.py (3)
22-30: LGTM! Clean helper for tree-sitter query captures.The
_run_queryhelper provides a consistent interface for collecting captures from tree-sitter queries, returning a dictionary keyed by capture name. This mirrors the same pattern inJavaAnalyzer, promoting code consistency across analyzers.
32-57: LGTM! Read-only dependency handling.Commenting out the dependency installation logic and returning early ensures the analyzer doesn't mutate the filesystem, aligning with the read-only indexing objective.
88-107: LGTM! Symbol extraction refactored to use_run_query.The refactored symbol extraction correctly uses the new
_run_queryhelper and properly checks for capture keys before iterating.backend/app/database/falkor/code-graph-backend/api/analyzers/java/analyzer.py (3)
23-31: LGTM! Consistent_run_queryhelper.The helper is identical to
PythonAnalyzer._run_query, providing consistent query-capture handling across analyzers.
33-51: LGTM! Read-only dependency handling.The dependency extraction logic is properly disabled for read-only mode.
104-116: LGTM! Correct variable usage for method/constructor symbols.This section correctly uses
capturesvariable consistently after_run_querycalls.backend/app/database/falkor/code-graph-backend/api/index.py (1)
408-428: LGTM! Proper error handling for repository analysis.Wrapping the analysis flow in a try/except ensures that failures return a structured 500 response with error details instead of crashing. Using
logger.exceptioncaptures the full traceback for debugging while the response provides a summary to the caller.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
backend/app/database/falkor/code-graph-backend/api/analyzers/java/analyzer.py
Outdated
Show resolved
Hide resolved
backend/app/database/falkor/code-graph-backend/api/analyzers/java/analyzer.py
Show resolved
Hide resolved
backend/app/database/falkor/code-graph-backend/api/analyzers/python/analyzer.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In
`@backend/app/database/falkor/code-graph-backend/api/analyzers/java/analyzer.py`:
- Line 96: The extends_query string in analyzer.py currently includes a trailing
'?' quantifier ("(extends_interfaces (type_list (type_identifier) `@type`))?")
which is redundant and inconsistent with other queries; update the extends_query
assignment to remove the trailing '?' so it becomes "(extends_interfaces
(type_list (type_identifier) `@type`))", leaving the rest of the analyzer.py logic
unchanged and referencing the existing extends_query variable.
♻️ Duplicate comments (1)
backend/app/database/falkor/code-graph-backend/api/analyzers/java/analyzer.py (1)
80-101: Variable name mismatch issue is now fixed.The previous critical
NameErrorissue has been correctly addressed -interfaces_captures,base_class_captures, andextends_capturesare now properly assigned from_run_queryand consistently referenced.
🧹 Nitpick comments (1)
backend/app/database/falkor/code-graph-backend/api/analyzers/java/analyzer.py (1)
32-50: LGTM - Read-only mode implemented correctly.The method now returns early with dependency extraction disabled, aligning with the PR objective to make indexing read-only. The commented code preserves the original logic for potential future use.
The unused
pathandfilesarguments are expected since the method is stubbed out. If you want to silence linter warnings, you could prefix them with underscores, but this is optional given the commented implementation remains.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
backend/app/database/falkor/code-graph-backend/api/analyzers/java/analyzer.py
🧰 Additional context used
🧬 Code graph analysis (1)
backend/app/database/falkor/code-graph-backend/api/analyzers/java/analyzer.py (3)
backend/app/database/falkor/code-graph-backend/api/analyzers/python/analyzer.py (2)
_run_query(21-29)add_dependencies(31-56)backend/app/database/falkor/code-graph-backend/api/analyzers/analyzer.py (1)
add_dependencies(57-66)backend/app/database/falkor/code-graph-backend/api/entities/entity.py (1)
add_symbol(12-15)
🪛 Ruff (0.14.11)
backend/app/database/falkor/code-graph-backend/api/analyzers/java/analyzer.py
32-32: Unused method argument: path
(ARG002)
32-32: Unused method argument: files
(ARG002)
🔇 Additional comments (3)
backend/app/database/falkor/code-graph-backend/api/analyzers/java/analyzer.py (3)
11-11: LGTM!The import consolidation correctly addresses the previous duplicate
Nodeimport issue and addsQueryCursorfor the new_run_queryhelper.
22-30: LGTM!The
_run_queryhelper is correctly implemented and consistent with the Python analyzer's implementation. This consolidates the tree-sitter query execution pattern cleanly.
102-115: LGTM!The method invocation and parameter extraction logic correctly uses
_run_queryand properly checks for capture keys before processing.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
Closes #237
📝 Description
Modified the repository indexing flow to be read-only during static analysis.
Replaced outdated Tree-sitter query usage, removed unnecessary debug logic, and fixed incorrect failed and pending indexing states caused by wrong HTML error messages returned from
Supabase, which led to inconsistent node values.🔧 Changes Made
1.Modified repository indexing to operate in read-only mode
2. Replaced old Tree-sitter module query usage
3. Fixed incorrect handling of failed and pending states triggered by Supabase HTML error responses
4. Corrected invalid node value parsing and status transitions
📷 Screenshots or Visual Changes (if applicable)
Before Changes-
Indexing failed due to HTML error responses being incorrectly parsed, causing repository mutation, broken node values, and repositories getting stuck in failed or pending states.
After Changes-
Repository indexes successfully with correct status handling and clean node extraction.
🤝 Collaboration
Collaborated with: None
✅ Checklist
Summary by CodeRabbit
New Features
Bug Fixes
Refactor
✏️ Tip: You can customize this high-level summary in your review settings.