Vectorless RAG (Retrieval-Augmented Generation)

Cost-Effective, Fast, and Infrastructure-Free RAG

🚀 The Problem: Vector DB Overhead

Traditional RAG pipelines require converting text into embeddings and storing them in a Vector Database (Pinecone, Weaviate, etc.). While powerful, this introduces:

Cost: Embedding APIs and specialized database hosting.
Latency: Heavy indexing and network calls to remote vector stores.
Complexity: Managing infrastructure, vector synchronization, and dimensionality.

💡 The Solution: Vectorless RAG

This project demonstrates that for many use cases (small to medium document sets), you don't need a vector database. By combining BM25 Sparse Retrieval with a Neural Cross-Encoder Reranker, we achieve high precision at zero infrastructure cost.

Technical Architecture

Ingestion: PDFs/Texts are chunked with a sliding window (300 words, 50 overlap).
Retrieval (BM25): Fast keyword-based search using the BM25 algorithm (TF-IDF evolved).
Neural Reranking: The Top-20 results are reranked using the cross-encoder/ms-marco-MiniLM-L-6-v2 model to capture deep semantic relevance.
Synthesis (Groq): The final Top-5 chunks are fed to a Groq-hosted LLM (Llama 3 / Mixtral) for answering.

🛠️ Tech Stack

Retrieval: BM25 (Rank-BM25)
Reranking: Sentence-Transformers (Cross-Encoder)
LLM: Groq (Llama-3.3-70b-versatile, Mixtral-8x7b)
UI: Streamlit with custom CSS & Three.js animations
Evaluation: RAGAS, Recall@k, and Precision@k

📊 Evaluation & Metrics

This project includes a built-in evaluation suite to measure:

Retrieval Quality: Recall@5 and Precision@5 benchmarks.
Generation Quality: Faithfulness and Relevancy scoring using LLM-as-a-judge.
RAGAS Integration: Industry-standard metrics for context recall and answer precision.

⚡ Setup & Installation

# 1. Clone & Setup
git clone https://github.com/addy-2709genius/vectorless-rag.git
cd vectorless-rag
python3 -m venv venv
source venv/bin/activate

# 2. Install Dependencies
pip install -r requirements.txt
python3 -c "import nltk; nltk.download('stopwords'); nltk.download('wordnet'); nltk.download('punkt'); nltk.download('punkt_tab')"

# 3. Run Application
streamlit run app.py

👨‍💻 Author

Gopi Raman Thakur
Full-Stack AI Engineer

Note: A Groq API Key is required for the LLM synthesis step. You can get one for free at console.groq.com.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vectorless RAG (Retrieval-Augmented Generation)

🚀 The Problem: Vector DB Overhead

💡 The Solution: Vectorless RAG

Technical Architecture

🛠️ Tech Stack

📊 Evaluation & Metrics

⚡ Setup & Installation

👨‍💻 Author

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Vectorless RAG (Retrieval-Augmented Generation)

🚀 The Problem: Vector DB Overhead

💡 The Solution: Vectorless RAG

Technical Architecture

🛠️ Tech Stack

📊 Evaluation & Metrics

⚡ Setup & Installation

👨‍💻 Author