InvoiceAI is a practical AI-driven backend project built to automatically extract key invoice fields such as invoice number, date, total amount, and customer details.
The objective was to explore OCR (Optical Character Recognition) and AI Agents (LangChain + LLMs) in real-world document understanding workflows.
- Invoice Text Extraction using
pdf2image+pytesseract - AI-Powered Field Extraction using
LangChainand Hugging Face model (flan-t5-small) - REST API built with FastAPI
- Asynchronous processing ready for Docker deployment
- Auto JSON structuring for clean, structured invoice data output
flowchart TD
A[Client: Upload invoice] --> B[FastAPI /upload endpoint]
B --> C{File type}
C -->|PDF| D[pdf2image → images]
C -->|Image| D
D --> E["Tesseract OCR (pytesseract) → raw_text"]
E --> F["ai_parser.extract_invoice_fields(raw_text)"]
F --> G{Parser mode}
G -->|Local LLM| H["HuggingFace pipeline (flan-t5-base / small model)"]
G -->|OpenAI| I["OpenAI API (gpt-3.5-turbo)"]
H --> J[raw_model_output]
I --> J
J --> K["Post-process (optional): regex / heuristics → structured fields"]
K --> L["Save to SQLite (InvoiceText table)"]
L --> M["Response: {id, filename, extracted_text, structured_data}"]
M --> N["Client receives result"]
| Layer | Technology |
|---|---|
| Backend Framework | FastAPI |
| OCR Engine | pytesseract |
| AI Model | LangChain + HuggingFace (flan-t5-small) |
| File Handling | pdf2image |
| Containerization | Docker |
| Language | Python 3.10+ |
git clone https://github.com/vickytilotia/InvoiceAI.git
cd InvoiceAIpython -m venv venv
venv\Scripts\activate # On Windows
source venv/bin/activate # On Linux/Macpip install -r requirements.txtuvicorn app.main:app --reloadThe server will be live at 👉 http://127.0.0.1:8000
docker build -t invoice-ai .docker run -d -p 8000:8000 invoice-aiAccess the API docs at: http://localhost:8000/docs
POST /upload
Upload an invoice PDF to extract structured fields.
Example Response:
{
"invoice_number": "123100401",
"invoice_date": "2024-03-01",
"total_amount": "251.12",
"currency": "EUR",
"vendor": "CPB Software GmbH"
}GET /invoices
Returns a list of stored invoices with id, filename, created_at.
GET /invoices/{id}
Returns single invoice details including extracted text and stored parsed result.
- PDF uploaded → converted into images
- OCR extracts visible text using Tesseract
- LLM processes the text → predicts key fields
- Result structured into JSON format
- Returned via REST API
✅ Successfully integrated OCR with AI model inference.
✅ Demonstrated how LangChain can structure unstructured text intelligently.
✅ Explored a modular and production-ready architecture with FastAPI and Docker.
This project lays a strong foundation for advanced document understanding pipelines like automated invoice validation, expense tracking, or ERP integration.
Vivek Kumar
💼 Backend Developer | 🧠 Exploring AI + Automation