add text output format #20

Adityav369 · 2025-12-03T18:56:26Z

No description provided.

jazzberry-ai · 2025-12-03T18:56:30Z

Bug Report

Name: Missing Tesseract dependency for OCR functionality
Severity: High
Example test case:

Ingest a document with images.
Call db.retrieve_chunks with use_colpali=True and output_format="text".
Observe that the OCR conversion fails because Tesseract is not installed.
Description: The output_format="text" option in retrieve_chunks and retrieve_chunks_grouped relies on Tesseract for OCR conversion. However, Tesseract is not included in the base Docker image and is not explicitly installed on all platforms (specifically, not mentioned in the self-hosting guide for Windows). This will cause the OCR functionality to fail, especially in Docker and potentially on Windows, leading to a broken user experience. The documentation should be updated to reflect the need to install Tesseract and it should be included by default in the Docker image.

_{Comments? Email us.}

add text output format

3336d0c

Adityav369 merged commit ad9fa73 into main Dec 3, 2025
7 checks passed

mintlify bot deployed to staging December 3, 2025 18:57 View deployment