Skip to content

feat: add /api/retrieve endpoint for pure vector search#496

Open
MuLeiSY2021 wants to merge 1 commit intoAsyncFuncAI:mainfrom
MuLeiSY2021:feat/api-retrieve-endpoint
Open

feat: add /api/retrieve endpoint for pure vector search#496
MuLeiSY2021 wants to merge 1 commit intoAsyncFuncAI:mainfrom
MuLeiSY2021:feat/api-retrieve-endpoint

Conversation

@MuLeiSY2021
Copy link

Summary

  • Add POST /api/retrieve endpoint that performs pure FAISS vector similarity search over indexed repository code chunks without calling any LLM
  • Enables external tools (MCP servers, IDE plugins, CLI tools) to leverage deepwiki-open's RAG index as a code search backend
  • No LLM API key required — only uses the existing embedding model for query vectorization

API

POST /api/retrieve
{
  "repo_url": "https://github.com/user/repo",
  "query": "authentication middleware",
  "type": "github",
  "top_k": 5
}

Response:

{
  "query": "authentication middleware",
  "total_chunks": 500,
  "results": [
    {
      "text": "func AuthMiddleware(...) { ... }",
      "file_path": "middleware/auth.go",
      "is_code": true,
      "token_count": 350
    }
  ]
}

Motivation

Currently deepwiki-open's RAG retrieval is tightly coupled with LLM generation in /chat/completions/stream. This PR separates the retrieval step into its own endpoint, which:

  1. Enables MCP integration — external tools like deepwiki-mcp can provide deepwiki's code search as context to Claude Code, Cursor, etc.
  2. Reduces cost — retrieval-only queries don't consume LLM tokens
  3. Enables new use cases — code search APIs, IDE extensions, CI/CD integrations

Test plan

  • Tested locally with indexed repositories
  • Verified embedding validation filters inconsistent vector dimensions
  • Verified Ollama embedder compatibility path
  • Unit tests (TODO)

🤖 Generated with Claude Code

Add a new POST /api/retrieve endpoint that performs semantic search
over a repository's indexed code chunks using FAISS, returning the
most relevant source code snippets without calling any LLM.

This enables external tools (e.g. MCP servers, IDE plugins) to
leverage deepwiki-open's RAG vector index as a code search backend,
without requiring LLM API keys or incurring generation costs.

Request: { repo_url, query, type?, token?, top_k? }
Response: { query, total_chunks, results: [{ text, file_path, is_code, token_count }] }

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the deepwiki-open project by introducing a dedicated API endpoint for direct vector-based code retrieval. This change broadens the utility of the existing RAG index, allowing external applications to perform efficient code searches independently of LLM generation, thereby expanding integration possibilities and optimizing resource usage.

Highlights

  • New API Endpoint: Introduced a POST /api/retrieve endpoint for pure FAISS vector similarity search over indexed repository code chunks, without involving any LLM.
  • External Tool Integration: Enabled external tools, such as MCP servers, IDE plugins, and CLI tools, to leverage deepwiki-open's RAG index as a code search backend.
  • Cost Efficiency: The new retrieval-only endpoint does not require an LLM API key, reducing operational costs by not consuming LLM tokens.
  • Decoupling Retrieval: Decoupled the retrieval step from LLM generation, allowing for more flexible and independent use of the RAG capabilities.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new /api/retrieve endpoint for pure RAG retrieval, which processes repository data, generates embeddings, and uses FAISS to retrieve relevant code chunks. The review identifies several areas for improvement, including a potential security risk by exposing raw exception details in 500 errors, the need to move local imports to the top of the file for better readability and consistency, refactoring a method that uses RAG.__new__(RAG) into a static method, simplifying the query_embedder function by removing unused weakref and redundant variables, and improving API design by using a Pydantic BaseModel for the response.

raise HTTPException(status_code=400, detail=str(e))
except Exception as e:
logger.error(f"Retrieve error: {str(e)}")
raise HTTPException(status_code=500, detail=str(e))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Exposing raw exception details (str(e)) to the client in a 500 error can be a security risk, as it might leak internal implementation details, file paths, or library information that could be exploited. It's safer to return a generic error message for 500-level errors.

Suggested change
raise HTTPException(status_code=500, detail=str(e))
raise HTTPException(status_code=500, detail="An internal server error occurred during retrieval.")

Comment on lines +551 to +555
from api.rag import RAG
from api.data_pipeline import DatabaseManager
from api.tools.embedder import get_embedder
from api.config import configs, get_embedder_type
from adalflow.components.retriever.faiss_retriever import FAISSRetriever
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

These imports are local to the retrieve function. It's generally better to place imports at the top of the file. This improves readability, makes dependencies clear, and allows for import errors to be caught at application startup rather than at runtime when the endpoint is first called. Other endpoints in this file follow this convention, so for consistency it would be good to move these.

Comment on lines +570 to +571
rag_instance = RAG.__new__(RAG)
valid_docs = rag_instance._validate_and_filter_embeddings(transformed_docs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using RAG.__new__(RAG) to create an uninitialized instance just to call _validate_and_filter_embeddings is unconventional and can be confusing. The _validate_and_filter_embeddings method in the RAG class does not use any instance-specific state, so it could be refactored into a @staticmethod.

This would allow you to call it directly on the class (RAG._validate_and_filter_embeddings(...)), which is cleaner and avoids misusing __new__. This would require a change in api/rag.py.

Comment on lines +578 to +583
import weakref
embedder_ref = embedder
def query_embedder(query):
if isinstance(query, list):
query = query[0]
return embedder_ref(input=query)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The weakref module is imported but not used, and embedder_ref is a redundant variable. The query_embedder function can be simplified by removing them and using embedder directly from the outer scope.

            def query_embedder(query):
                if isinstance(query, list):
                    query = query[0]
                return embedder(input=query)

Comment on lines +600 to +612
return {
"query": request.query,
"total_chunks": len(valid_docs),
"results": [
{
"text": doc.text,
"file_path": doc.meta_data.get("file_path", ""),
"is_code": doc.meta_data.get("is_code", False),
"token_count": doc.meta_data.get("token_count", 0),
}
for doc in docs
]
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This endpoint returns a raw dictionary. For better type safety, automatic serialization, and API documentation, it's a good practice in FastAPI to define a Pydantic BaseModel for the response and use it in the endpoint decorator with response_model.

You could define models like this above the endpoint definition:

from typing import List

class RetrieveResult(BaseModel):
    text: str
    file_path: str
    is_code: bool
    token_count: int

class RetrieveResponse(BaseModel):
    query: str
    total_chunks: int
    results: List[RetrieveResult]

And then use it in the endpoint: @app.post("/api/retrieve", response_model=RetrieveResponse). The return statement would then need to return an instance of RetrieveResponse.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant