A Streamlit-based AI application that allows users to upload PDFs and chat with them using Retrieval-Augmented Generation (RAG) powered by LangChain, ChromaDB, HuggingFace embeddings, and Groq LLM.
- 📂 Upload PDF files
- 🧠 Ask questions from PDF content
- 🔍 Retrieval-Augmented-Generation (RAG)
- ⚡ Fast embeddings using HuggingFace (
all-MiniLM-L6-v2) - 🤖 LLM responses using Groq (LLaMA 3.1)
- 📚 Page number references in answers
- 🌐 Simple Streamlit UI
- Frontend: Streamlit
- Backend: Python
- LLM: Groq API (LLaMA 3.1 8B Instant)
- Embeddings: HuggingFace Sentence Transformers
- Vector DB: ChromaDB
- Framework: LangChain
- PDF Processing: PyPDF
chat_with_pdf/
│── app.py # Streamlit frontend
│── rag_engine.py # RAG pipeline (ingestion + QA)
│── requirements.txt # Dependencies
│── .env # API keys
│── temp.pdf # Temporary uploaded file
│── chroma_db/ # Vector database storage
git clone https://github.com/your-username/chat-with-pdf.git
cd chat-with-pdfUse Python 3.10 recommended
python -m venv venv
venv\Scripts\activate # Windowspip install --upgrade pip
pip install -r requirements.txtCreate a .env file:
GROQ_API_KEY=your_groq_api_key_herestreamlit run app.py- User uploads a PDF
- PDF is split into chunks
- Chunks are embedded using HuggingFace embeddings
- Stored in Chroma vector database
- User asks a question
- Relevant chunks are retrieved
- Groq LLM generates answer using retrieved context
- “What is this document about?”
- “Summarize page 2”
- “What are the key points?”
- “Explain section 3 in simple terms”
- Use Python 3.10 for best compatibility
- Avoid Python 3.11+ due to ChromaDB build issues
- Ensure stable versions of LangChain packages
- 📄 Multi-PDF chat support
- ⚡ Streaming responses (ChatGPT-like typing effect)
- 📌 Highlight answers in PDF
- 💾 Persistent chat memory per file
Built by Deeksha Shakyawal For learning RAG and real-world LLM apps.