paddleocr-vl

Here are 6 public repositories matching this topic...

PaddlePaddle / PaddleOCR

Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages.

ocr pdf-parser kie document-translation rag chineseocr ai4science pp-ocr document-parsing pp-structure pdf-extractor-rag pdf2markdown paddleocr-vl

Updated Dec 8, 2025
Python

magicyuan876 / mineru-tianshu

Star

pdf-converter markitdown mineru mcp-server deepseek-ocr paddleocr-vl

Updated Dec 10, 2025
Python

generalMG / DocumentParser

Star

Python data pipeline for arXiv metadata: SQLAlchemy + Alembic schema, PostgreSQL storage, PDF download tracking, and optional PaddleOCR processing.

sqlalchemy postgresql alembic document-parser paddleocr-vl

Updated Dec 9, 2025
Python

EricRollei / PDF-Tools

Star

Advanced PDF processing, OCR, and AI vision analysis nodes for ComfyUI. Extract images from PDFs, perform multilingual OCR with Surya, detect objects with Florence-2, and analyze document layouts.