feat: add /api/retrieve endpoint for pure vector search#496
feat: add /api/retrieve endpoint for pure vector search#496MuLeiSY2021 wants to merge 1 commit intoAsyncFuncAI:mainfrom
Conversation
Add a new POST /api/retrieve endpoint that performs semantic search
over a repository's indexed code chunks using FAISS, returning the
most relevant source code snippets without calling any LLM.
This enables external tools (e.g. MCP servers, IDE plugins) to
leverage deepwiki-open's RAG vector index as a code search backend,
without requiring LLM API keys or incurring generation costs.
Request: { repo_url, query, type?, token?, top_k? }
Response: { query, total_chunks, results: [{ text, file_path, is_code, token_count }] }
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the deepwiki-open project by introducing a dedicated API endpoint for direct vector-based code retrieval. This change broadens the utility of the existing RAG index, allowing external applications to perform efficient code searches independently of LLM generation, thereby expanding integration possibilities and optimizing resource usage. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a new /api/retrieve endpoint for pure RAG retrieval, which processes repository data, generates embeddings, and uses FAISS to retrieve relevant code chunks. The review identifies several areas for improvement, including a potential security risk by exposing raw exception details in 500 errors, the need to move local imports to the top of the file for better readability and consistency, refactoring a method that uses RAG.__new__(RAG) into a static method, simplifying the query_embedder function by removing unused weakref and redundant variables, and improving API design by using a Pydantic BaseModel for the response.
| raise HTTPException(status_code=400, detail=str(e)) | ||
| except Exception as e: | ||
| logger.error(f"Retrieve error: {str(e)}") | ||
| raise HTTPException(status_code=500, detail=str(e)) |
There was a problem hiding this comment.
Exposing raw exception details (str(e)) to the client in a 500 error can be a security risk, as it might leak internal implementation details, file paths, or library information that could be exploited. It's safer to return a generic error message for 500-level errors.
| raise HTTPException(status_code=500, detail=str(e)) | |
| raise HTTPException(status_code=500, detail="An internal server error occurred during retrieval.") |
| from api.rag import RAG | ||
| from api.data_pipeline import DatabaseManager | ||
| from api.tools.embedder import get_embedder | ||
| from api.config import configs, get_embedder_type | ||
| from adalflow.components.retriever.faiss_retriever import FAISSRetriever |
There was a problem hiding this comment.
These imports are local to the retrieve function. It's generally better to place imports at the top of the file. This improves readability, makes dependencies clear, and allows for import errors to be caught at application startup rather than at runtime when the endpoint is first called. Other endpoints in this file follow this convention, so for consistency it would be good to move these.
| rag_instance = RAG.__new__(RAG) | ||
| valid_docs = rag_instance._validate_and_filter_embeddings(transformed_docs) |
There was a problem hiding this comment.
Using RAG.__new__(RAG) to create an uninitialized instance just to call _validate_and_filter_embeddings is unconventional and can be confusing. The _validate_and_filter_embeddings method in the RAG class does not use any instance-specific state, so it could be refactored into a @staticmethod.
This would allow you to call it directly on the class (RAG._validate_and_filter_embeddings(...)), which is cleaner and avoids misusing __new__. This would require a change in api/rag.py.
| import weakref | ||
| embedder_ref = embedder | ||
| def query_embedder(query): | ||
| if isinstance(query, list): | ||
| query = query[0] | ||
| return embedder_ref(input=query) |
There was a problem hiding this comment.
The weakref module is imported but not used, and embedder_ref is a redundant variable. The query_embedder function can be simplified by removing them and using embedder directly from the outer scope.
def query_embedder(query):
if isinstance(query, list):
query = query[0]
return embedder(input=query)| return { | ||
| "query": request.query, | ||
| "total_chunks": len(valid_docs), | ||
| "results": [ | ||
| { | ||
| "text": doc.text, | ||
| "file_path": doc.meta_data.get("file_path", ""), | ||
| "is_code": doc.meta_data.get("is_code", False), | ||
| "token_count": doc.meta_data.get("token_count", 0), | ||
| } | ||
| for doc in docs | ||
| ] | ||
| } |
There was a problem hiding this comment.
This endpoint returns a raw dictionary. For better type safety, automatic serialization, and API documentation, it's a good practice in FastAPI to define a Pydantic BaseModel for the response and use it in the endpoint decorator with response_model.
You could define models like this above the endpoint definition:
from typing import List
class RetrieveResult(BaseModel):
text: str
file_path: str
is_code: bool
token_count: int
class RetrieveResponse(BaseModel):
query: str
total_chunks: int
results: List[RetrieveResult]And then use it in the endpoint: @app.post("/api/retrieve", response_model=RetrieveResponse). The return statement would then need to return an instance of RetrieveResponse.
Summary
POST /api/retrieveendpoint that performs pure FAISS vector similarity search over indexed repository code chunks without calling any LLMAPI
Response:
{ "query": "authentication middleware", "total_chunks": 500, "results": [ { "text": "func AuthMiddleware(...) { ... }", "file_path": "middleware/auth.go", "is_code": true, "token_count": 350 } ] }Motivation
Currently deepwiki-open's RAG retrieval is tightly coupled with LLM generation in
/chat/completions/stream. This PR separates the retrieval step into its own endpoint, which:Test plan
🤖 Generated with Claude Code