A modular RAG (Retrieval-Augmented Generation) chatbot with FastAPI backend and Streamlit frontend. Built with a plugin architecture that makes it easy to swap LLMs, embeddings, vector stores, and document loaders.
The system is built with a modular, factory-based architecture:
- Backend (FastAPI): RESTful API handling all business logic
- Frontend (Streamlit): User-friendly chat interface
- Core Interfaces: Abstract base classes for all components
- Implementations: Pluggable providers for LLMs, embeddings, vector stores, and loaders
rag_chatbot/
├── backend/
│ ├── core/ # Abstract interfaces
│ ├── implementations/ # Concrete implementations
│ │ ├── loaders/ # Document loaders (PDF, DOCX, CSV)
│ │ ├── embeddings/ # Embedding providers (Ollama, OpenAI, HF)
│ │ ├── vector_stores/ # Vector stores (FAISS, Chroma, Pinecone)
│ │ └── llms/ # LLM providers (Ollama, OpenAI, Anthropic)
│ ├── api/ # API routes and models
│ ├── utils/ # Utilities
│ ├── config.py # Configuration
│ ├── main.py # FastAPI app
│ ├── run.py # Server entry point
│ └── requirements.txt # Backend dependencies
├── frontend/
│ ├── components/ # UI components
│ ├── services/ # Communication with backend
│ └── app.py # Streamlit app
├── .env.example
├── .gitignore
├── LICENSE
└── README.md
# Clone the repository
git clone <repository_url>
cd rag_chatbot
# Install dependencies
pip install -r backend/requirements.txt
# Copy environment variables
# Copy environment variables
cp .env.example .envEdit .env (in the project root) with your settings:
- For local LLMs: Ensure Ollama is running
- For cloud LLMs: Add your API keys (OpenAI, Anthropic, etc.)
# Using uvicorn from project root (recommended)
uvicorn backend.main:app --reload --host 127.0.0.1 --port 8000
# Or via python module
python -m backend.main
# Or via run script
python backend/run.py# In a new terminal
streamlit run frontend/app.py- Create a new file in
backend/implementations/llms/:
from ...core.llm import LLMProvider
from typing import Generator
class MyLLM(LLMProvider):
def __init__(self, model: str, **kwargs):
self.model = model
# Your initialization
def generate(self, prompt: str, **kwargs) -> str:
# Your implementation
pass
def stream(self, prompt: str, **kwargs) -> Generator[str, None, None]:
# Your streaming implementation
pass
def get_model_name(self) -> str:
return self.model- Register it in
backend/main.py:
from .implementations.llms.my_llm import MyLLM
LLMFactory.register_provider("my_provider", MyLLM)- Create a new file in
backend/implementations/loaders/:
from ...core.document_processor import DocumentLoader, Document
from typing import List, BinaryIO
class MyLoader(DocumentLoader):
supported_extensions = ['.xyz']
def load(self, file: BinaryIO, filename: str) -> List[Document]:
# Your loading logic
pass
def supports_file_type(self, filename: str) -> bool:
return any(filename.lower().endswith(ext) for ext in self.supported_extensions)- Register it in
backend/main.py:
from .implementations.loaders.my_loader import MyLoader
DocumentProcessorFactory.register_loader(MyLoader())- Create a new file in
backend/implementations/embeddings/:
from ...core.embeddings import EmbeddingProvider
from typing import List
class MyEmbeddings(EmbeddingProvider):
def __init__(self, model: str, **kwargs):
self.model = model
def embed_documents(self, texts: List[str]) -> List[List[float]]:
# Batch embedding logic
pass
def embed_query(self, text: str) -> List[float]:
# Single embedding logic
pass
def get_dimension(self) -> int:
# Return embedding dimension
pass- Register it in
backend/main.py:
from .implementations.embeddings.my_embeddings import MyEmbeddings
EmbeddingFactory.register_provider("my_embeddings", MyEmbeddings)- Create a new file in
backend/implementations/vector_stores/:
from ...core.vector_store import VectorStore
from ...core.document_processor import Document
from typing import List, Tuple
class MyVectorStore(VectorStore):
def __init__(self, dimension: int):
self.dimension = dimension
def add_documents(self, documents: List[Document], embeddings: List[List[float]]):
pass
def similarity_search(self, query_embedding: List[float], k: int = 4) -> List[Tuple[Document, float]]:
pass
def clear(self):
pass
def get_count(self) -> int:
pass- Register it in
backend/main.py:
from .implementations.vector_stores.my_store import MyVectorStore
VectorStoreFactory.register_store("my_store", MyVectorStore)- Modular Architecture: Easy to extend and customize
- Multiple LLM Support: Ollama, OpenAI, Anthropic (easily extensible)
- Multiple Embedding Providers: Ollama, OpenAI, HuggingFace
- Vector Store Options: FAISS, Chroma, Pinecone
- Document Loaders: PDF (easily add DOCX, CSV, TXT, etc.)
- Streaming Responses: Real-time chat experience
- Session Management: Multiple chat sessions
- RAG Toggle: Switch between RAG and normal chat
- Configurable Chunking: Adjust chunk size and overlap
- RESTful API: Well-documented endpoints
POST /api/sessions- Create a new chat sessionPOST /api/sessions/{session_id}/upload- Upload a documentPOST /api/sessions/{session_id}/process- Process document into vector storePOST /api/chat- Send a message (with streaming support)GET /api/config- Get current configurationPUT /api/config- Update configuration
You can configure the system through:
- Environment Variables (
.envfile) - Runtime API Calls (
PUT /api/config) - Streamlit UI (Configuration sidebar)
Available settings:
- LLM provider and model
- Embedding provider and model
- Vector store type
- Chunk size and overlap
- API keys for cloud providers
- Never commit your
.envfile - Use environment variables for API keys
- Consider authentication for production deployments
- Validate and sanitize file uploads
To add support for new providers:
- Implement the appropriate interface from
backend/core/ - Add your implementation to
backend/implementations/ - Register it in
backend/main.py - Update this README
MIT License - feel free to use in your projects!
Backend won't start:
- Ensure all dependencies are installed
- Check that ports 8000 is available
- Verify Ollama is running (if using local LLMs)
Streamlit can't connect:
- Ensure backend is running (
http://localhost:8000/healthshould return status) - Check CORS settings in
backend/main.py
Document processing fails:
- Verify the document format is supported
- Check chunk size settings
- Ensure embedding provider is configured correctly
Consider adding:
- SQL database support for document metadata
- More document formats (DOCX, TXT, CSV, JSON)
- Persistent storage for vector databases
- User authentication
- Multi-user support
- Document management UI
- Advanced RAG techniques (hybrid search, re-ranking)