Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages.
-
Updated
Dec 8, 2025 - Python
Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages.
天枢 - 企业级 AI 一站式数据预处理平台 | PDF/Office转Markdown | 支持MCP协议AI助手集成 | Vue3+FastAPI全栈方案 | 文档解析 | 多模态信息提取
Python data pipeline for arXiv metadata: SQLAlchemy + Alembic schema, PostgreSQL storage, PDF download tracking, and optional PaddleOCR processing.
Advanced PDF processing, OCR, and AI vision analysis nodes for ComfyUI. Extract images from PDFs, perform multilingual OCR with Surya, detect objects with Florence-2, and analyze document layouts.
Истории автомобилей
Native Swift + MLX port of Paddle0CR-VL, a 0.9B doc parsing VLM for OCR, tables, formulas, and charts on Apple Silicon
Add a description, image, and links to the paddleocr-vl topic page so that developers can more easily learn about it.
To associate your repository with the paddleocr-vl topic, visit your repo's landing page and select "manage topics."