GitHub

to start:

docker compose up

access web ui:

http://127.0.0.1:8000/

how it works:

PDF Upload & Processing

PDF Upload (FastAPI /upload endpoint)
- File validation and UUID generation
- Save file and store metadata in MongoDB
- Queue background processing task
PDF Processing Pipeline
- Extract text, images, and tables using PyMuPDF, Camelot, PDFPlumber
- Generate markdown with image placeholders
- Run OCR on images using PaddleOCR
- Clean and normalize markdown content
- Process diagrams using LLaVA vision model
- Split content into chunks and create vector documents
- Generate document summary using Ollama LLM
- Store documents in ChromaDB vector store

Vector Storage & Indexing

Vector Store (ChromaDB)
- Persistent storage with document embeddings
- Metadata tracking (file_id, chunk_index, source_file)
- Embeddings generated by nomic-embed-text model
RAG Pipeline (LlamaIndex)
- Create index from ChromaDB documents
- Configure with Ollama LLM and embeddings
- Set up query engine for similarity search

Question Answering

Question Processing (FastAPI /ask endpoint)
- Receive question and query vector store
- Retrieve top-k similar chunks
- Generate answer using Ollama LLM with retrieved context
- Return structured response with sources and confidence score

Supporting Services

Core Services
- Ollama Service: LLM (llama3.2:3b), embeddings (nomic-embed-text), vision (llava:7b)
- MongoDB: File metadata and status tracking
- OCR Service: PaddleOCR for image text extraction
- Markdown Cleaner: Content normalization

Data Flow: PDF -> Text/Image Extraction -> Markdown -> OCR -> Cleaning -> Chunking -> Vector Store -> Question -> Similarity Search -> LLM Response

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
services		services
templates		templates
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
logging.ini		logging.ini
main.py		main.py
requirements.txt		requirements.txt
restart.sh		restart.sh
schema.png		schema.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

to start:

access web ui:

how it works:

About

Uh oh!

Releases

Packages

Languages

License

roworu/local_rag

Folders and files

Latest commit

History

Repository files navigation

to start:

access web ui:

how it works:

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages