A Retrieval-Augmented Generation (RAG) pipeline that answers technical support questions about a biomedical platform using only local resources. Documents are ingested, chunked, and embedded into a ChromaDB vector store; user queries are matched against this store and answered by a local LLM served through LMStudio.
┌──────────────┐ ┌───────────────┐ ┌────────────┐ ┌───────────┐
│ Documents │────▶│ Ingest & │────▶│ ChromaDB │────▶│ Gradio │
│ (.pdf .docx │ │ Chunk & Embed│ │ Vector │ │ Chat UI │
│ .txt .md) │ │ (ingest.py) │ │ Store │ │ (app.py) │
└──────────────┘ └───────────────┘ └────────────┘ └─────┬─────┘
│
┌────────────────────────────────────────┘
│ query
▼
┌───────────┐ ┌─────────────┐
│ Retriever │────▶│ LMStudio │
│ (top-K) │ │ local LLM │
└───────────┘ └─────────────┘
Key design goal: after the first run (which downloads the embedding model), the system operates fully offline — no calls to HuggingFace Hub or any external service.
BioRAG-from-scratch/
├── .env # Configuration (ports, model names, paths)
├── requirements.txt # Python dependencies
├── run_ingest.py # Entry point: ingest documents into vector store
├── run_test_retriever.py # Entry point: test retriever with a sample query
├── data/
│ └── documentation/ # Place your source documents here
├── models/ # Auto-populated: cached embedding weights
├── vector_store/ # Auto-populated: ChromaDB persistence
├── src/
│ ├── __init__.py
│ ├── app.py # Gradio chat interface (entry point)
│ ├── embeddings.py # Shared embedding loader + offline mode
│ ├── ingest.py # Document loading, chunking, indexing
│ ├── llm_client.py # LMStudio / OpenAI-compatible LLM client
│ └── retriever.py # Vector similarity search
└── tests/
├── __init__.py
└── validate.py # Import & configuration smoke tests
- Python 3.10 – 3.12 (tested; 3.13+ may work but is not verified)
- LMStudio running locally with a loaded model (default endpoint:
http://localhost:1234/v1) - Internet access only for the first run (to download the
all-MiniLM-L6-v2embedding model)
-
Clone or unzip the project and
cdinto it:cd BioRAG-from-scratch -
Create and activate a virtual environment (recommended):
python -m venv .venv source .venv/bin/activate # Linux / macOS .venv\Scripts\activate # Windows
-
Install dependencies:
pip install -r requirements.txt
-
Review
.envand adjust if needed:LMSTUDIO_BASE_URL=http://localhost:1234/v1 LMSTUDIO_MODEL=local-model EMBEDDING_MODEL=all-MiniLM-L6-v2 CHROMA_PERSIST_DIR=./vector_store HF_CACHE_DIR=./models CHUNK_SIZE=800 CHUNK_OVERLAP=200 TOP_K=4
Variable Purpose LMSTUDIO_BASE_URLLMStudio server address LMSTUDIO_MODELModel identifier loaded in LMStudio EMBEDDING_MODELSentence-transformers model for embeddings HF_CACHE_DIRLocal directory for cached model weights CHROMA_PERSIST_DIRLocal directory for the vector store CHUNK_SIZE/CHUNK_OVERLAPText chunking parameters (characters) TOP_KNumber of chunks retrieved per query
Place your PDF, DOCX, TXT, and/or Markdown files into data/documentation/. Subdirectories are supported.
Run the ingestion pipeline to load, chunk, embed, and persist your documents:
python run_ingest.pyOn the first run this will download the embedding model (~80 MB) into ./models/. Every subsequent run loads the model from disk with no network access.
Open LMStudio, load a model (e.g. Mistral, Llama, Phi), and start the local server on port 1234 (default).
python -m src.appOpen the URL printed in the terminal (usually http://127.0.0.1:7860).
Verify that all dependencies are installed correctly and the project modules load without errors:
python -m tests.validateExpected output:
1. Checking core imports …
✔ langchain_community.document_loaders
✔ langchain_community.embeddings.HuggingFaceEmbeddings
✔ langchain_community.vectorstores.Chroma
✔ langchain_text_splitters.RecursiveCharacterTextSplitter
...
All checks passed ✓
After ingesting documents you can test the retriever directly:
python run_test_retriever.pyAfter the first successful run, confirm that no network calls are made by disconnecting from the network (disable Wi-Fi / unplug Ethernet) and running:
python run_test_retriever.pyIt should load the model and return results exactly as before.
| Symptom | Likely cause | Fix |
|---|---|---|
ModuleNotFoundError: No module named 'langchain' |
Old dependency installed | Run pip install -r requirements.txt in a clean venv |
ModuleNotFoundError: No module named 'docx2txt' |
Wrong docx package | pip install docx2txt (not python-docx) |
Connection refused on launch |
LMStudio not running | Start LMStudio and load a model |
OSError: [Errno 28] No space left on device |
Disk full during model download | Free space; ./models/ needs ~80 MB |
| Model re-downloads every time | HF_CACHE_DIR not set or pointing to empty dir |
Check .env — HF_CACHE_DIR=./models |
This project is provided as-is for educational purposes.