Skip to content

Latest commit

Β 

History

History
67 lines (51 loc) Β· 3 KB

File metadata and controls

67 lines (51 loc) Β· 3 KB

Vectorless RAG (Retrieval-Augmented Generation)

Cost-Effective, Fast, and Infrastructure-Free RAG

Python 3.10+ Streamlit 1.32+ Groq LLM Evaluation Ready

πŸš€ The Problem: Vector DB Overhead

Traditional RAG pipelines require converting text into embeddings and storing them in a Vector Database (Pinecone, Weaviate, etc.). While powerful, this introduces:

  • Cost: Embedding APIs and specialized database hosting.
  • Latency: Heavy indexing and network calls to remote vector stores.
  • Complexity: Managing infrastructure, vector synchronization, and dimensionality.

πŸ’‘ The Solution: Vectorless RAG

This project demonstrates that for many use cases (small to medium document sets), you don't need a vector database. By combining BM25 Sparse Retrieval with a Neural Cross-Encoder Reranker, we achieve high precision at zero infrastructure cost.

Technical Architecture

  1. Ingestion: PDFs/Texts are chunked with a sliding window (300 words, 50 overlap).
  2. Retrieval (BM25): Fast keyword-based search using the BM25 algorithm (TF-IDF evolved).
  3. Neural Reranking: The Top-20 results are reranked using the cross-encoder/ms-marco-MiniLM-L-6-v2 model to capture deep semantic relevance.
  4. Synthesis (Groq): The final Top-5 chunks are fed to a Groq-hosted LLM (Llama 3 / Mixtral) for answering.

πŸ› οΈ Tech Stack

  • Retrieval: BM25 (Rank-BM25)
  • Reranking: Sentence-Transformers (Cross-Encoder)
  • LLM: Groq (Llama-3.3-70b-versatile, Mixtral-8x7b)
  • UI: Streamlit with custom CSS & Three.js animations
  • Evaluation: RAGAS, Recall@k, and Precision@k

πŸ“Š Evaluation & Metrics

This project includes a built-in evaluation suite to measure:

  • Retrieval Quality: Recall@5 and Precision@5 benchmarks.
  • Generation Quality: Faithfulness and Relevancy scoring using LLM-as-a-judge.
  • RAGAS Integration: Industry-standard metrics for context recall and answer precision.

⚑ Setup & Installation

# 1. Clone & Setup
git clone https://github.com/addy-2709genius/vectorless-rag.git
cd vectorless-rag
python3 -m venv venv
source venv/bin/activate

# 2. Install Dependencies
pip install -r requirements.txt
python3 -c "import nltk; nltk.download('stopwords'); nltk.download('wordnet'); nltk.download('punkt'); nltk.download('punkt_tab')"

# 3. Run Application
streamlit run app.py

πŸ‘¨β€πŸ’» Author

Gopi Raman Thakur
Full-Stack AI Engineer


Note: A Groq API Key is required for the LLM synthesis step. You can get one for free at console.groq.com.