Skip to content

alerque/acceptarium

Repository files navigation

acceptarium

A collection of tooling to facilitate scanning receipts, extracting useful data, archiving the assets, and importing the results into Plain Text Accounting systems.

accipiō : (Classical Latin) [akˈkɪ.pi.oː] to receive, accept

acceptarius : (Latin) allotment-holding : (Medieval) receipt book


Overview

---
config:
  layout: elk
  look: handDrawn
  theme: redux-dark
---
flowchart LR
  A["Ingest/Scan"]
  B["ID (Store)"]
  C["Traditional OCR"]
  D["Regex Extract"]
  E["Rules"]
  F["Review/Edit"]
  G["Export"]
  L1["LLM Vision"]
  L2["LLM Extract"]
  L3["Retrain"]
  A --> B --> C & L1 --> D & L2 --> F --> G
  F --> E & L3
  E --> D
  L3 --> L2
  style L1 stroke-dasharray: 5
  style L2 stroke-dasharray: 5
  style L3 stroke-dasharray: 5
Loading
  1. Scan or import scanned receipts, individually or in bulk.
  2. Store identifiable scanned assets using Git Annex or pluggable backends (LFS? WebDAV?).
  3. Optionally extract data via OCR using local LLM tooling (Ollama or pluggable remote tooling).
  4. Optionally automatically process data into structured transaction info (via local LLM tooling or pattern matching).
  5. Facilitate either manual data entry or automatic data extraction with review and a chance to chance to edit.
  6. Optionally use final data to update regex rules or train the LLM model to improve future extractions.
  7. Export extracted data as transaction(s) via CVS? JSON? (or possibly directly to journal for HLedger, Ledger CLI, Beancount, etc.).

Goals

  • Automate as many steps as possible to make it easy to handle receipts (and possibly invoices, etc.) in bulk.
  • Disable all LLM related features by default and remain functional without them requiring explicit opt-in for use.
  • Use only local-first privacy-preserving tooling by default — even where LLMs may be involved.
  • Facilitate human review/approval and fully featured editing for any non-deterministic steps like LLM or OCR based meta-data extraction.
  • Allow re-processing data from initial assets in the event of improved tooling (better OCR, more journal import rules, etc.).

Non-goals

  • Avoid lock-in to any particular PTA solution (pair with HLedger, Ledger CLI, Beancount, or similar journal tools)
  • Avoid dictating the entire accounting workflow; people have their own data handling already, we just want to mix in digitized assets.

About

Tools to facilitate scanning receipts, extracting useful data, archiving the assets, and importing the results into plain text accounting systems.

Topics

Resources

Stars

Watchers

Forks

Contributors