Skip to content

NeuroBAU/Bio-rag

Repository files navigation

BioRAG — Biomedical Platform Support (RAG from Scratch)

A Retrieval-Augmented Generation (RAG) pipeline that answers technical support questions about a biomedical platform using only local resources. Documents are ingested, chunked, and embedded into a ChromaDB vector store; user queries are matched against this store and answered by a local LLM served through LMStudio.

Architecture

┌──────────────┐     ┌───────────────┐     ┌────────────┐     ┌───────────┐
│  Documents   │────▶│  Ingest &     │────▶│  ChromaDB  │────▶│  Gradio   │
│  (.pdf .docx │     │  Chunk & Embed│     │  Vector    │     │  Chat UI  │
│   .txt .md)  │     │  (ingest.py)  │     │  Store     │     │  (app.py) │
└──────────────┘     └───────────────┘     └────────────┘     └─────┬─────┘
                                                                    │
                           ┌────────────────────────────────────────┘
                           │  query
                           ▼
                     ┌───────────┐     ┌─────────────┐
                     │ Retriever │────▶│ LMStudio    │
                     │ (top-K)   │     │ local LLM   │
                     └───────────┘     └─────────────┘

Key design goal: after the first run (which downloads the embedding model), the system operates fully offline — no calls to HuggingFace Hub or any external service.

Project Structure

BioRAG-from-scratch/
├── .env                        # Configuration (ports, model names, paths)
├── requirements.txt            # Python dependencies
├── run_ingest.py               # Entry point: ingest documents into vector store
├── run_test_retriever.py       # Entry point: test retriever with a sample query
├── data/
│   └── documentation/          # Place your source documents here
├── models/                     # Auto-populated: cached embedding weights
├── vector_store/               # Auto-populated: ChromaDB persistence
├── src/
│   ├── __init__.py
│   ├── app.py                  # Gradio chat interface (entry point)
│   ├── embeddings.py           # Shared embedding loader + offline mode
│   ├── ingest.py               # Document loading, chunking, indexing
│   ├── llm_client.py           # LMStudio / OpenAI-compatible LLM client
│   └── retriever.py            # Vector similarity search
└── tests/
    ├── __init__.py
    └── validate.py             # Import & configuration smoke tests

Prerequisites

  • Python 3.10 – 3.12 (tested; 3.13+ may work but is not verified)
  • LMStudio running locally with a loaded model (default endpoint: http://localhost:1234/v1)
  • Internet access only for the first run (to download the all-MiniLM-L6-v2 embedding model)

Installation

  1. Clone or unzip the project and cd into it:

    cd BioRAG-from-scratch
  2. Create and activate a virtual environment (recommended):

    python -m venv .venv
    source .venv/bin/activate        # Linux / macOS
    .venv\Scripts\activate           # Windows
  3. Install dependencies:

    pip install -r requirements.txt
  4. Review .env and adjust if needed:

    LMSTUDIO_BASE_URL=http://localhost:1234/v1
    LMSTUDIO_MODEL=local-model
    EMBEDDING_MODEL=all-MiniLM-L6-v2
    CHROMA_PERSIST_DIR=./vector_store
    HF_CACHE_DIR=./models
    CHUNK_SIZE=800
    CHUNK_OVERLAP=200
    TOP_K=4
    Variable Purpose
    LMSTUDIO_BASE_URL LMStudio server address
    LMSTUDIO_MODEL Model identifier loaded in LMStudio
    EMBEDDING_MODEL Sentence-transformers model for embeddings
    HF_CACHE_DIR Local directory for cached model weights
    CHROMA_PERSIST_DIR Local directory for the vector store
    CHUNK_SIZE / CHUNK_OVERLAP Text chunking parameters (characters)
    TOP_K Number of chunks retrieved per query

Usage

Step 1 — Add your documents

Place your PDF, DOCX, TXT, and/or Markdown files into data/documentation/. Subdirectories are supported.

Step 2 — Ingest documents

Run the ingestion pipeline to load, chunk, embed, and persist your documents:

python run_ingest.py

On the first run this will download the embedding model (~80 MB) into ./models/. Every subsequent run loads the model from disk with no network access.

Step 3 — Start LMStudio

Open LMStudio, load a model (e.g. Mistral, Llama, Phi), and start the local server on port 1234 (default).

Step 4 — Launch the chat interface

python -m src.app

Open the URL printed in the terminal (usually http://127.0.0.1:7860).

Testing

Smoke tests (imports & configuration)

Verify that all dependencies are installed correctly and the project modules load without errors:

python -m tests.validate

Expected output:

1. Checking core imports …
  ✔ langchain_community.document_loaders
  ✔ langchain_community.embeddings.HuggingFaceEmbeddings
  ✔ langchain_community.vectorstores.Chroma
  ✔ langchain_text_splitters.RecursiveCharacterTextSplitter
  ...

All checks passed ✓

Manual retriever test

After ingesting documents you can test the retriever directly:

python run_test_retriever.py

Verifying offline mode

After the first successful run, confirm that no network calls are made by disconnecting from the network (disable Wi-Fi / unplug Ethernet) and running:

python run_test_retriever.py

It should load the model and return results exactly as before.

Troubleshooting

Symptom Likely cause Fix
ModuleNotFoundError: No module named 'langchain' Old dependency installed Run pip install -r requirements.txt in a clean venv
ModuleNotFoundError: No module named 'docx2txt' Wrong docx package pip install docx2txt (not python-docx)
Connection refused on launch LMStudio not running Start LMStudio and load a model
OSError: [Errno 28] No space left on device Disk full during model download Free space; ./models/ needs ~80 MB
Model re-downloads every time HF_CACHE_DIR not set or pointing to empty dir Check .envHF_CACHE_DIR=./models

License

This project is provided as-is for educational purposes.

About

RAG enabled app for lab management

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages