Skip to content

Latest commit

 

History

History
113 lines (75 loc) · 5.56 KB

File metadata and controls

113 lines (75 loc) · 5.56 KB

Logseq LaTeX Formula OCR Plugin

Convert LaTeX formula images from clipboard to LaTeX code in Logseq using various OCR providers like Hugging Face Transformers, Google Gemini, or a local Pix2Text server.

Features

  • Formula OCR: Convert images of LaTeX formulas into editable LaTeX code.
  • Table OCR: Convert images of tables into Markdown tables.
  • Multiple OCR Providers: Choose from several backends:
    • Google Gemini: High-quality formula and table recognition.
    • OpenAI Compatible: Connect to any OpenAI-compatible API (e.g., Local LLMs, Groq, OpenRouter).
    • Pix2Text (Local): A private, offline-first OCR server.
    • Hugging Face API: Cloud-based processing using the Nougat model.
    • Docker (Self-hosted): Run the Nougat OCR model in a local Docker container.

Commands

  • /display-formula-ocr: Insert LaTeX code on a new line
  • /inline-formula-ocr: Insert LaTeX code within a paragraph
  • /table-ocr: Insert a Markdown table from an image. Currently works best with the Gemini provider.

Notes:

  • The image in the clipboard must be a LaTex formula image
  • Initial use may be slow due to model loading
  • With the free Hugging Face plan you can make about 30k calls per month
  • The Google Gemini API has a free tier with usage limits. Check the official pricing page for details.

Installation Options

  1. Manual + Gemini (Recomended)

    • Requirements: Google Gemini API Key
    • Download the zip file from releases and unzip it.
    • Enable developer mode: Logseq > Settings > Advanced > Developer mode
    • Import Plugin: Logseq > Plugins > Load unpacked plugin and point to the unzipped folder.
    • Go to plugin settings, select "Gemini" as the OCR Provider.
    • Paste your Google Gemini API Key in the API Key setting field.
  2. Manual + OpenAI Compatible

    • Requirements: An OpenAI-compatible API (e.g., OpenAI, Groq, Local LLM)
    • Download the zip file from releases and unzip it.
    • Enable developer mode: Logseq > Settings > Advanced > Developer mode
    • Import Plugin: Logseq > Plugins > Load unpacked plugin and point to the unzipped folder.
    • Go to plugin settings, select "OpenAI Compatible" as the OCR Provider.
    • Enter your API Key in the API Key field.
    • Enter your API Endpoint in the API Endpoint field (e.g., https://api.openai.com/v1 or http://localhost:11434/v1).
    • (Optional) Set the Model Name (default: gpt-4o).
  3. Manual + Pix2Text (Offline)

    • Install Pix2Text Python package
    • Start the server, eg. p2t serve -l en -H 0.0.0.0 -p 8503
    • Download the zip file from releases and unzip it.
    • Enable developer mode: Logseq > Settings > Advanced > Developer mode
    • Import Plugin: Logseq > Plugins > Load unpacked plugin and point to the unzipped folder.
    • In the plugin settings, select "Local" as the OCR Provider and set the API Endpoint to the appropriate IP address and port (default is http://0.0.0.0:8503)
  4. Manual + Hugging Face

    • Requirements: Node.js, Yarn, Parcel, Hugging Face User Access Token
    • Clone repo: git clone https://github.com/olmobaldoni/logseq-formula-ocr-plugin.git
    • Install dependencies: cd logseq-formula-ocr-plugin && yarn && yarn build
    • Enable developer mode: Logseq > Settings > Advanced > Developer mode
    • Import Plugin: Logseq > Plugins > Load unpacked plugin and point to the cloned repo
  5. Marketplace + Hugging Face

  6. Marketplace + Docker

    • Requirements: Docker
    • Search for LaTeX Formula OCR in the Logseq marketplace and install directly
    • Pull image: docker pull olmobaldoni/nougat-ocr-api:latest
    • Run container: docker run -d -p 80:80 olmobaldoni/nougat-ocr-api:latest

Note: For more information on how to use the other local API visit: https://github.com/olmobaldoni/LaTex-Formula-OCR-API

Settings

Demo

  • Demo 1

Demo 1

  • Demo 2

Demo 2

Known Issues

Hugging Face API may truncate responses (see Issuee #2 and Issue #487)

Note: Docker or Local(Pix2Text) method recommended for full functionality

Credits

This plugin is based on nougat-latex-base, a fine-tuning of facebook/nougat-base with im2latex-100k, and made by NormXU.

Pix2Text: Used for the local OCR server.

Google Gemini: Used as one of the OCR providers.

In addition, this plugin was also inspired by xxchan and its plugin logseq-ocr

License

MIT