A production-ready RAG (Retrieval-Augmented Generation) chatbot that answers customer questions using a restaurant's own knowledge base. Built with FastAPI, ChromaDB, and Groq (Llama 3.3 70B).
A customer visits a restaurant's website, clicks the chat bubble, and asks:
"Do you guys deliver to G-9?" "What's your cheapest biryani?" "Are you halal?" "How much does it cost?" (follow-up — the bot remembers context)
The chatbot retrieves relevant information from the restaurant's own documents (menu, policies, FAQ) and generates a natural, accurate answer — no hallucination, no made-up facts.
If the answer isn't in the knowledge base, it gracefully says so and redirects to WhatsApp.
Customers can also build an order directly in the chat and checkout via WhatsApp — no payment integration needed.
The system has two phases:
Documents → Chunking → Embedding → Vector Database
- Collect — text files from
knowledge_base/(menu, policies, FAQ, about) - Chunk — split documents into meaningful sections (menu categories, individual Q&A pairs, policy sections)
- Embed — convert each chunk into a 384-dimensional vector using all-MiniLM-L6-v2
- Store — save vectors + text in ChromaDB for fast similarity search
Question → Embed → Search → Build Prompt (+ history) → LLM → Answer
- Embed the customer's question into a vector
- Search ChromaDB for the 3 most similar chunks
- Build a prompt with the retrieved context + conversation history + system instructions
- Generate an answer using Llama 3.3 70B via Groq
- Return the answer through the REST API
Each conversation is tracked server-side with a session ID. The last 10 messages are sent to the LLM as context, so the bot can handle follow-ups like "How much does it cost?" after asking about biryani. Sessions auto-expire after 30 minutes of inactivity.
When the bot lists menu items, clickable "+ Add" buttons appear next to each item. Customers build a cart, then checkout — the system generates a pre-filled WhatsApp message with their order, name, phone, and delivery address, and opens it in a new tab.
| Component | Technology | Why |
|---|---|---|
| Backend | FastAPI | Async, fast, auto-generates API docs |
| Vector DB | ChromaDB | Free, local, built-in embeddings |
| Embeddings | all-MiniLM-L6-v2 (via ChromaDB) | Lightweight, no PyTorch needed |
| LLM | Llama 3.3 70B via Groq | Free API, very fast inference |
| Frontend | Vanilla HTML/CSS/JS | Zero dependencies, embeddable anywhere |
| Deployment | Railway | Simple, auto-deploy from GitHub |
├── knowledge_base/ # Client's documents (the data source)
│ ├── menu.txt
│ ├── about.txt
│ ├── policies.txt
│ └── faq.txt
├── chatbot_server.py # FastAPI server (retrieval + LLM)
├── chat_widget.html # Embeddable frontend chat widget
├── test_chatbot.py # Test script for API endpoints
├── setup_db.py # Standalone DB builder (optional)
├── chunker.py # Document chunking logic (educational)
├── embedder.py # Embedding logic (educational)
├── requirements.txt
├── Procfile # Railway deployment config
└── .gitignore
git clone https://github.com/YOUR_USERNAME/karachi-bites-chatbot.git
cd karachi-bites-chatbot
pip install -r requirements.txtGo to console.groq.com, sign up, create an API key.
# Linux/Mac
export GROQ_API_KEY="gsk_your_key_here"
# Windows PowerShell
$env:GROQ_API_KEY = "gsk_your_key_here"uvicorn chatbot_server:app --reload --port 8000The server auto-builds the vector database from knowledge_base/ on first startup.
python test_chatbot.pyOr open chat_widget.html in your browser and chat directly.
Send a customer message, get an AI-generated answer. Supports multi-turn conversations via conversation_id.
Request:
{
"message": "What time do you close on Friday?",
"conversation_id": null
}Response:
{
"answer": "We're open until 1:00 AM on Fridays! ...",
"sources": [
{"source": "about.txt", "section": "HOURS", "chunk_id": "about_002"}
],
"conversation_id": "a1b2c3d4-..."
}Send the returned conversation_id in subsequent requests to maintain context.
| Endpoint | Method | Description |
|---|---|---|
/cart/add |
POST | Add an item to cart — { conversation_id, item, quantity } |
/cart/remove |
POST | Remove an item — { conversation_id, item } |
/cart/{conversation_id} |
GET | Get current cart contents and total |
/cart/checkout |
POST | Generate WhatsApp order link — { conversation_id, name, phone, address } |
Health check endpoint.
{
"status": "ok",
"collection": "karachi_bites",
"chunks_in_db": 30,
"llm": "llama-3.3-70b-versatile"
}The chat widget is a self-contained HTML/CSS/JS component. To embed it on any website:
- Copy the widget CSS and JS from
chat_widget.html - Change the
API_URLto your deployed server:const API_URL = "https://your-app.up.railway.app";
- Paste into the client's website HTML
Features:
- Floating chat bubble with open/close animation
- Multi-turn conversation memory (follow-up questions work)
- "+ Add" buttons on menu items in bot responses for quick ordering
- Cart panel with item management (view, remove, totals)
- WhatsApp checkout — generates a pre-filled order message
- Typing indicator while waiting for response
- Mobile responsive
- Graceful error handling (falls back to WhatsApp contact)
- Customizable colors via CSS variables
This is designed to be reused. To build a chatbot for a different client:
- Replace the knowledge base — swap the files in
knowledge_base/with the new client's documents - Update the system prompt — change the restaurant name, tone, and fallback contact in
SYSTEM_PROMPTinsidechatbot_server.py - Customize the widget — update colors (
--kb-primary), name, avatar, and greeting inchat_widget.html - Deploy — push to GitHub, deploy on Railway/Render
The entire pipeline (chunking, embedding, retrieval, generation) works automatically with any text-based knowledge base.
- Push to GitHub
- Connect repo on railway.app
- Add environment variable:
GROQ_API_KEY - Deploy — the server auto-builds the DB on first startup
- Push to GitHub
- Create Web Service on render.com
- Build command:
pip install -r requirements.txt - Start command:
uvicorn chatbot_server:app --host 0.0.0.0 --port $PORT - Add environment variable:
GROQ_API_KEY
Current limitations:
- In-memory session storage (resets on server restart — fine for single-instance deployments)
- Small embedding model occasionally retrieves imperfect chunks
- No admin dashboard for document management
Planned improvements:
- Admin dashboard for uploading/managing documents
- Reranker for better retrieval accuracy
- Support for PDF and DOCX document ingestion
- Streaming responses for better UX
- Persistent session storage (Redis) for multi-instance deployments
MIT