Skip to content

roworu/local_rag

Repository files navigation

to start:

docker compose up

access web ui:

http://127.0.0.1:8000/

how it works:

schema

PDF Upload & Processing

  1. PDF Upload (FastAPI /upload endpoint)

    • File validation and UUID generation
    • Save file and store metadata in MongoDB
    • Queue background processing task
  2. PDF Processing Pipeline

    • Extract text, images, and tables using PyMuPDF, Camelot, PDFPlumber
    • Generate markdown with image placeholders
    • Run OCR on images using PaddleOCR
    • Clean and normalize markdown content
    • Process diagrams using LLaVA vision model
    • Split content into chunks and create vector documents
    • Generate document summary using Ollama LLM
    • Store documents in ChromaDB vector store

Vector Storage & Indexing

  1. Vector Store (ChromaDB)

    • Persistent storage with document embeddings
    • Metadata tracking (file_id, chunk_index, source_file)
    • Embeddings generated by nomic-embed-text model
  2. RAG Pipeline (LlamaIndex)

    • Create index from ChromaDB documents
    • Configure with Ollama LLM and embeddings
    • Set up query engine for similarity search

Question Answering

  1. Question Processing (FastAPI /ask endpoint)
    • Receive question and query vector store
    • Retrieve top-k similar chunks
    • Generate answer using Ollama LLM with retrieved context
    • Return structured response with sources and confidence score

Supporting Services

  1. Core Services
    • Ollama Service: LLM (llama3.2:3b), embeddings (nomic-embed-text), vision (llava:7b)
    • MongoDB: File metadata and status tracking
    • OCR Service: PaddleOCR for image text extraction
    • Markdown Cleaner: Content normalization

Data Flow: PDF -> Text/Image Extraction -> Markdown -> OCR -> Cleaning -> Chunking -> Vector Store -> Question -> Similarity Search -> LLM Response

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages