A collection of tooling to facilitate scanning receipts, extracting useful data, archiving the assets, and importing the results into Plain Text Accounting systems.
accipiō : (Classical Latin) [akˈkɪ.pi.oː] to receive, accept
acceptarius : (Latin) allotment-holding : (Medieval) receipt book
---
config:
layout: elk
look: handDrawn
theme: redux-dark
---
flowchart LR
A["Ingest/Scan"]
B["ID (Store)"]
C["Traditional OCR"]
D["Regex Extract"]
E["Rules"]
F["Review/Edit"]
G["Export"]
L1["LLM Vision"]
L2["LLM Extract"]
L3["Retrain"]
A --> B --> C & L1 --> D & L2 --> F --> G
F --> E & L3
E --> D
L3 --> L2
style L1 stroke-dasharray: 5
style L2 stroke-dasharray: 5
style L3 stroke-dasharray: 5
- Scan or import scanned receipts, individually or in bulk.
- Store identifiable scanned assets using Git Annex or pluggable backends (LFS? WebDAV?).
- Optionally extract data via OCR using local LLM tooling (Ollama or pluggable remote tooling).
- Optionally automatically process data into structured transaction info (via local LLM tooling or pattern matching).
- Facilitate either manual data entry or automatic data extraction with review and a chance to chance to edit.
- Optionally use final data to update regex rules or train the LLM model to improve future extractions.
- Export extracted data as transaction(s) via CVS? JSON? (or possibly directly to journal for HLedger, Ledger CLI, Beancount, etc.).
- Automate as many steps as possible to make it easy to handle receipts (and possibly invoices, etc.) in bulk.
- Disable all LLM related features by default and remain functional without them requiring explicit opt-in for use.
- Use only local-first privacy-preserving tooling by default — even where LLMs may be involved.
- Facilitate human review/approval and fully featured editing for any non-deterministic steps like LLM or OCR based meta-data extraction.
- Allow re-processing data from initial assets in the event of improved tooling (better OCR, more journal import rules, etc.).
- Avoid lock-in to any particular PTA solution (pair with HLedger, Ledger CLI, Beancount, or similar journal tools)
- Avoid dictating the entire accounting workflow; people have their own data handling already, we just want to mix in digitized assets.