Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
-
Updated
Jan 23, 2026 - Python
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
Official Implementation of OCR-free Document Understanding Transformer (Donut) and Synthetic Document Generator (SynthDoG), ECCV 2022
A Repo For Document AI
A curated list of resources for Document Understanding (DU) topic
PDF to markdown using vision LLMs — tables, layouts, and structure preserved
在保留版面、公式与结构的前提下进行 PDF 翻译,适用于科研与技术文档
Official PyTorch implementation of LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding (ACL 2022)
Algorithms, papers, datasets, performance comparisons for Document AI.
ParseBench - A Document Parsing Benchmark for AI Agents
Conversion from Excel to structured JSON (tables, shapes, charts) for LLM/RAG pipelines, and autonomous Excel reading/writing by AI agents via CLI and MCP integration.
ReadingBank: A Benchmark Dataset for Reading Order Detection
Official Implementation of Web-based Visual Corpus Builder (Webvicob), ICDAR 2023
SlideVQA: A Dataset for Document Visual Question Answering on Multiple Images (AAAI2023)
AI Document Assistant for PSPDFKit Demo showcases how to interact with PDFs using natural language commands powered by AI, integrated with PSPDFKit for Web.
[CVPR2025] VDocRAG: Retirval-Augmented Generation over Visually-Rich Documents
A Model Context Protocol (MCP) server implementation that integrates with the Nutrient Document Web Service (DWS) Processor API, providing powerful PDF processing capabilities for AI assistants.
This library has moved to https://github.com/googleapis/google-cloud-python/tree/main/packages/google-cloud-documentai-toolbox
An unofficial PyTorch implementation of "Lin et al. ViBERTgrid: A Jointly Trained Multi-Modal 2D Document Representation for Key Information Extraction from Documents. ICDAR, 2021"
Fast GPU OCR server. 270 img/s on FUNSD. TensorRT FP16, PP-OCRv5, HTTP + gRPC.
This repository includes all computer vision, audio, document AI, and multimodal projects.
Add a description, image, and links to the document-ai topic page so that developers can more easily learn about it.
To associate your repository with the document-ai topic, visit your repo's landing page and select "manage topics."