Skip to content

codeantik/rag-chatbot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

2 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ€– RAG Chatbot

A production-style conversational AI chatbot built with LangGraph, RAG, and Streamlit β€” featuring five backend variants from a simple LLM chat all the way to MCP-powered tool-using agents with persistent memory.

Python LangGraph Streamlit OpenAI SQLite


πŸ“Œ What This Project Is

This repo is a fully functional chatbot application built in progressive layers. Each backend adds a new capability on top of the previous one β€” from a bare-bones LangGraph conversational agent to a RAG-powered document chatbot with MCP tool integration, SQLite persistence, real-time streaming, and threaded async UI.

Every backend has a paired Streamlit frontend so you can run and interact with each variant independently.


πŸ—οΈ Architecture Overview

The project follows a clean backend / frontend separation:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                     Streamlit Frontend                      β”‚
β”‚  (streamlit_frontend*.py / streamlit_rag_frontend.py)       β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                      β”‚ calls
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    LangGraph Backend                        β”‚
β”‚  (langgraph_backend.py / langgraph_*_backend.py)            β”‚
β”‚                                                             β”‚
β”‚   StateGraph ──► LLM Node ──► Tool/RAG/MCP Node            β”‚
β”‚                      β”‚                                      β”‚
β”‚              SqliteSaver (chatbot.db)                       β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ—‚οΈ File Structure

rag-chatbot/
β”‚
β”œβ”€β”€ πŸ”§ Backends (LangGraph agents)
β”‚   β”œβ”€β”€ langgraph_backend.py              # Variant 1: Base conversational agent
β”‚   β”œβ”€β”€ langgraph_tool_backend.py         # Variant 2: Agent with custom tools
β”‚   β”œβ”€β”€ langraph_rag_backend.py           # Variant 3: RAG over documents
β”‚   β”œβ”€β”€ langgraph_database_backend.py     # Variant 4: Persistent memory (SQLite)
β”‚   └── langgraph_mcp_backend.py          # Variant 5: MCP tool integration
β”‚
β”œβ”€β”€ πŸ–₯️ Frontends (Streamlit UIs)
β”‚   β”œβ”€β”€ streamlit_frontend.py             # UI for base agent
β”‚   β”œβ”€β”€ streamlit_frontend_tool.py        # UI for tool-using agent
β”‚   β”œβ”€β”€ streamlit_rag_frontend.py         # UI for RAG chatbot
β”‚   β”œβ”€β”€ streamlit_frontend_database.py    # UI for persistent chatbot
β”‚   β”œβ”€β”€ streamlit_frontend_mcp.py         # UI for MCP agent
β”‚   β”œβ”€β”€ streamlit_frontend_streaming.py   # UI with token-by-token streaming
β”‚   └── streamlit_frontend_threading.py   # UI with threaded async execution
β”‚
β”œβ”€β”€ πŸ—„οΈ Database
β”‚   β”œβ”€β”€ chatbot.db                        # SQLite checkpoint store
β”‚   β”œβ”€β”€ chatbot.db-shm                    # Shared memory file (WAL mode)
β”‚   └── chatbot.db-wal                    # Write-ahead log (WAL mode)
β”‚
β”œβ”€β”€ requirements.txt
└── .gitignore

πŸ”„ Backend Variants

Variant 1 β€” Base LangGraph Agent

Files: langgraph_backend.py + streamlit_frontend.py

A clean conversational agent built with LangGraph's MessagesState. Handles multi-turn chat with full message history maintained in graph state.

Stack: StateGraph Β· ChatOpenAI Β· MessagesState Β· MemorySaver


Variant 2 β€” Tool-Using Agent

Files: langgraph_tool_backend.py + streamlit_frontend_tool.py

Extends the base agent with custom tool bindings. The LLM can decide when to invoke tools (web search, calculators, APIs, etc.) using a ReAct-style loop.

Stack: Tool node Β· ToolNode Β· bind_tools Β· conditional edges


Variant 3 β€” RAG Chatbot

Files: langraph_rag_backend.py + streamlit_rag_frontend.py

The core feature of this repo. Ingests documents, builds a FAISS vector index, and grounds every LLM response in retrieved context. Answers questions based on your own knowledge base.

User Query β†’ Embed β†’ FAISS Similarity Search β†’ Retrieve Chunks β†’ LLM β†’ Answer

Stack: PyPDFLoader Β· RecursiveCharacterTextSplitter Β· OpenAIEmbeddings Β· FAISS Β· RetrievalQA


Variant 4 β€” Persistent Memory (Database)

Files: langgraph_database_backend.py + streamlit_frontend_database.py

Adds SQLite-backed checkpointing so conversation state survives across sessions. Each conversation is tracked by a thread_id and can be resumed exactly where it left off.

Stack: SqliteSaver Β· chatbot.db (WAL mode) Β· thread_id config Β· state resumption


Variant 5 β€” MCP Agent

Files: langgraph_mcp_backend.py + streamlit_frontend_mcp.py

The most advanced variant. Integrates Model Context Protocol (MCP) β€” allowing the agent to connect to external MCP servers and use their tools dynamically. Enables filesystem access, API calls, browser automation, and more through a standardised protocol.

Stack: MCP client Β· dynamic tool discovery Β· LangGraph agent loop


πŸ–₯️ Frontend Variants

Frontend Description
streamlit_frontend.py Standard chat UI β€” input, output, message history
streamlit_frontend_tool.py Shows tool calls and results inline in the chat
streamlit_rag_frontend.py RAG UI with document upload and source attribution
streamlit_frontend_database.py Session selector β€” load/resume past conversations
streamlit_frontend_mcp.py MCP-connected UI with live tool status
streamlit_frontend_streaming.py Real-time token-by-token streaming output
streamlit_frontend_threading.py Async execution via threading β€” prevents UI blocking

βš™οΈ Setup & Installation

1. Clone the repo

git clone https://github.com/codeantik/rag-chatbot.git
cd rag-chatbot

2. Create a virtual environment

python -m venv venv
source venv/bin/activate        # macOS/Linux
venv\Scripts\activate           # Windows

3. Install dependencies

pip install -r requirements.txt

4. Set up environment variables

Create a .env file in the root:

OPENAI_API_KEY=your_openai_api_key_here
LANGCHAIN_API_KEY=your_langsmith_api_key_here   # optional, for tracing
LANGCHAIN_TRACING_V2=true
LANGCHAIN_PROJECT=rag-chatbot

▢️ Running the App

Each variant is run by pairing its backend with its frontend. Pick the variant you want:

# Variant 1 β€” Base chatbot
streamlit run streamlit_frontend.py

# Variant 2 β€” Tool-using agent
streamlit run streamlit_frontend_tool.py

# Variant 3 β€” RAG chatbot
streamlit run streamlit_rag_frontend.py

# Variant 4 β€” Persistent memory chatbot
streamlit run streamlit_frontend_database.py

# Variant 5 β€” MCP agent
streamlit run streamlit_frontend_mcp.py

# Streaming UI
streamlit run streamlit_frontend_streaming.py

# Threaded async UI
streamlit run streamlit_frontend_threading.py

The corresponding backend is imported automatically by each frontend β€” no need to run them separately.


πŸ—„οΈ Database & Persistence

The chatbot.db SQLite file stores LangGraph checkpoints using WAL (Write-Ahead Logging) mode for concurrent read safety. The three files serve these roles:

File Role
chatbot.db Main checkpoint database
chatbot.db-shm Shared memory index for WAL
chatbot.db-wal Pending write buffer

To reset all conversation history:

rm chatbot.db chatbot.db-shm chatbot.db-wal

πŸ”‘ Key Dependencies

Package Purpose
langgraph Graph-based agent orchestration
langchain Chains, prompts, retrievers
langchain-openai OpenAI model integration
langchain-community FAISS, document loaders, tools
streamlit Interactive web UI
faiss-cpu Vector similarity search for RAG
pypdf PDF document loading
openai Direct OpenAI API access
python-dotenv Environment variable management

πŸ—ΊοΈ Roadmap

  • Base LangGraph conversational agent
  • Tool-using ReAct agent
  • RAG over documents (FAISS)
  • SQLite persistent memory
  • MCP tool integration
  • Token streaming UI
  • Threaded async UI
  • Multi-document upload support
  • User authentication & multi-user sessions
  • Deploy to Streamlit Cloud / Docker

πŸ‘¨β€πŸ’» Author

Ankit Singh β€” Full Stack Developer & Agentic AI Specialist

GitHub LinkedIn

Also check out the foundational learning repos:


πŸ“„ License

This project is open source and available under the MIT License.

About

A chatbot based on RAG

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages