morphik-org · Adityav369 · Dec 3, 2025 · Dec 3, 2025
diff --git a/concepts/colpali.mdx b/concepts/colpali.mdx
@@ -5,11 +5,11 @@

 ## Introduction

 Upto now, we've seen RAG techniques that **i)** parse a given document, **ii)** convert it to text, and **iii)** embed the text for retrieval. These techniques have been particualrly text-heavy. Embedding models expect text in, knowledge graphs expect text in, and prasers break down when provided with documents that aren't text-dominant. This motivates the question:

 > When was the last time you looked at a document and only saw text?

 Most business documents, research papers, reports, and presentations we encounter daily are rich visual experiences: tables organizing crucial data, charts illuminating trends, infographics explaining complex concepts, and visual layouts that guide our understanding. These visual elements aren't just decorative—they're fundamental to how information is communicated. 

 However, most RAG systems treat these elements as second-class citizens. They are either ignored or captioned and embedded as text. This leads to poor retrieval performance - especially for tasks that require visual reasoning.

@@ -26,7 +26,7 @@
 ## How does it work?

 ### Embedding Process
 The embedding process for ColPali borrows heavily from models like CLIP.  That is, the vision encoder part of the model (as seen in the diagram above) is trained via a technique called **Contrastive Learning**. As we've discussed in previous explainers, an encoder is a function (usually a neural network or a transformer) that maps a given input to a fixed-length vector. Contrastive learning is a technique that allows us to train two encoders of different input types (such as image and text) to produce vectors in the "same embedding space". That is, the embedding of the word "dog" would be very close the embedding of the image of a dog. The way we can achieve this is simple in theory:

 1) Take a large dataset of image and text pairs. 
 2) Pass the image and text through the vision and text encoders respectively.
@@ -40,7 +40,7 @@

 ### Retrieval Process

 The retrieval process for ColPali borrows from late-interaction based reranking techniques such as [ColBERT](https://arxiv.org/abs/2004.12832). The idea is that instead of directly embedding an image or an entire block of text, we can embed individual patches or tokens instead. Then, instead of using the regular dot product or the cosine similarity, we can employ a slightly different scoring function. This scoring funciton looks at the most similar patches and tokens, and then sums those similarities up to obtain a final score.

 ![ColBERT Architecture](/assets/colbert.png)

@@ -50,7 +50,7 @@

 ## How to use ColPali?

 With Morphik, using ColPali is as simple as adding a single `true/false` parameter to the `ingest_file` function and the query function. Here is what an example ingestion pathway looks like:

 ```python
 from morphik import Morphik
@@ -66,8 +66,41 @@
 db.query("At what time-step did we see the highest GDP growth rate?", use_colpali=True)
 ```
 
 So instead of having to implement the ColPali pipeline from scratch, you can use Morphik to do it for you in a single line of code!
 
+## Controlling Output Format
+
+When retrieving ColPali chunks (which are page images), you can control how the images are returned using the `output_format` parameter:
+
+```python
+# Return as base64-encoded data (default)
+chunks = db.retrieve_chunks("quarterly results", use_colpali=True)
+
+# Return as presigned URLs (useful for web UIs)
+chunks = db.retrieve_chunks("quarterly results", use_colpali=True, output_format="url")
+
+# Convert images to markdown text via OCR
+chunks = db.retrieve_chunks("quarterly results", use_colpali=True, output_format="text")
+```
+
+The three output formats are:
+- **`"base64"`** (default): Returns base64-encoded image data
+- **`"url"`**: Returns presigned HTTPS URLs, convenient for LLMs and UIs that accept remote image URLs
+- **`"text"`**: Converts page images to markdown text via OCR
+
+### Choosing Between Formats
+
+**base64 vs url**: Both formats pass images to LLMs for visual understanding and produce similar inference results. However, `url` is lighter on network transfer since only the URL is sent to your application (the LLM fetches the image directly). This can result in faster response times, especially when working with multiple images.
+
+**When to use text**: Passing images to LLMs for inference can be slow and consume significant context tokens. Use `output_format="text"` when:
+- You need **faster inference** speeds
+- Your documents are **primarily text-based** (reports, articles, contracts)
+- You're hitting **context length limits**
+
+<Note>
+If you're experiencing context limit issues with image-based retrieval, it may be because images aren't being passed correctly to the model. See [Generating Completions with Retrieved Chunks](/cookbooks/generating-completions-with-retrieved-chunks) for examples of properly passing images (both base64 and URLs) to vision-capable models like GPT-4o.
+</Note>
+
 
 
 

diff --git a/python-sdk/retrieve_chunks.mdx b/python-sdk/retrieve_chunks.mdx
@@ -1,6 +1,6 @@
 ---
 title: "retrieve_chunks"
 description: "Retrieve relevant chunks from Morphik"
 ---

 <Tabs>
@@ -45,7 +45,10 @@
 - `use_colpali` (bool, optional): Whether to use ColPali-style embedding model to retrieve the chunks (only works for documents ingested with `use_colpali=True`). Defaults to True.
 - `folder_name` (str | List[str], optional): Optional folder scope. Accepts a single folder name or a list of folder names.
 - `padding` (int, optional): Number of additional chunks/pages to retrieve before and after matched chunks (ColPali only). Defaults to 0.
-- `output_format` (str, optional): Controls how image chunks are returned. Set to `"url"` to receive presigned URLs; omit or set to `"base64"` (default) to receive base64 content.
+- `output_format` (str, optional): Controls how image chunks are returned:
+  - `"base64"` (default): Returns base64-encoded image data
+  - `"url"`: Returns presigned HTTPS URLs
+  - `"text"`: Converts images to markdown text via OCR
 - `query_image` (str, optional): Base64-encoded image for reverse image search. Mutually exclusive with `query`. Requires `use_colpali=True`.
 
 ## Metadata Filters
@@ -126,7 +129,7 @@

 The `FinalChunkResult` objects returned by this method have the following properties:

 - `content` (str | PILImage): Chunk content (text or image)
 - `score` (float): Relevance score
 - `document_id` (str): Parent document ID
 - `chunk_number` (int): Chunk sequence number
@@ -135,13 +138,30 @@
 - `filename` (Optional[str]): Original filename
 - `download_url` (Optional[str]): URL to download full document 
 
-## Image URL output
+## Output Format Options
 
-- When `output_format="url"` is provided, image chunks are returned as presigned HTTPS URLs in `content`. This is convenient for UIs and LLMs that accept remote image URLs (e.g., via `image_url`).
-- When `output_format` is omitted or set to `"base64"` (default), image chunks are returned as base64 data (the SDK attempts to decode these into a `PIL.Image` for `FinalChunkResult.content`).
+- **`"base64"` (default)**: Image chunks are returned as base64 data (the SDK attempts to decode these into a `PIL.Image` for `FinalChunkResult.content`).
+- **`"url"`**: Image chunks are returned as presigned HTTPS URLs in `content`. This is convenient for UIs and LLMs that accept remote image URLs (e.g., via `image_url`).
+- **`"text"`**: Image chunks are converted to markdown text via OCR. Use this when you need faster inference or when documents are mostly text-based.
 - Text chunks are unaffected by `output_format` and are always returned as strings.
 - The `download_url` field may be populated for image chunks. When using `output_format="url"`, it will typically match `content` for those chunks.
 
+### When to Use Each Format
+
+| Format | Best For |
+|--------|----------|
+| `base64` | Direct image processing, local applications |
+| `url` | Web UIs, LLMs with vision capabilities (lighter on network) |
+| `text` | Faster inference, text-heavy documents, context length concerns |
+
+<Note>
+**base64 vs url**: Both formats pass images to LLMs for visual understanding and produce similar results. However, `url` is lighter on network transfer since only the URL is sent to your application (the LLM fetches the image directly). This can result in faster response times, especially with multiple images.
+
+**When to use text**: Passing images to LLMs for inference can be slow and consume significant context tokens. Use `output_format="text"` when you need faster inference speeds or when your documents are primarily text-based.
+
+If you're hitting context limits with images, it may be because they aren't being passed correctly to the model. See [Generating Completions with Retrieved Chunks](/cookbooks/generating-completions-with-retrieved-chunks) for examples of properly passing images (both base64 and URLs) to vision-capable models like GPT-4o.
+</Note>
+
 Tip: To download the original raw file for a document, use [`get_document_download_url`](./get_document_download_url).
 
 ## Reverse Image Search

diff --git a/python-sdk/retrieve_chunks_grouped.mdx b/python-sdk/retrieve_chunks_grouped.mdx
@@ -1,5 +1,5 @@
 ---
 title: "retrieve_chunks_grouped"
 description: "Retrieve relevant chunks with grouping for UI display"
 ---

@@ -53,11 +53,14 @@
 - `k` (int, optional): Number of results. Defaults to 4.
 - `min_score` (float, optional): Minimum similarity threshold. Defaults to 0.0.
 - `use_colpali` (bool, optional): Whether to use ColPali-style embedding model. Defaults to True.
 - `use_reranking` (bool, optional): Override workspace reranking configuration for this request.
 - `folder_name` (str | List[str], optional): Optional folder scope (single name or list of names)
 - `end_user_id` (str, optional): Optional end-user scope
 - `padding` (int, optional): Number of additional chunks/pages to retrieve before and after matched chunks. Defaults to 0.
-- `output_format` (str, optional): Controls how image chunks are returned. Set to `"url"` for presigned URLs or `"base64"` (default) for base64 content.
+- `output_format` (str, optional): Controls how image chunks are returned:
+  - `"base64"` (default): Returns base64-encoded image data
+  - `"url"`: Returns presigned HTTPS URLs
+  - `"text"`: Converts images to markdown text via OCR (faster inference, best for text-heavy documents)
 - `graph_name` (str, optional): Name of the graph to use for knowledge graph-enhanced retrieval
 - `hop_depth` (int, optional): Number of relationship hops to traverse in the graph. Defaults to 1.
 - `include_paths` (bool, optional): Whether to include relationship paths in the response. Defaults to False.
@@ -182,7 +185,7 @@

 - This method is similar to [`retrieve_chunks`](./retrieve_chunks) but provides additional grouping for UI display.
 - The `chunks` list provides backward compatibility with flat chunk lists.
 - The `groups` list organizes results with their padding context, ideal for building search result UIs.
 - When `padding` is specified, surrounding chunks are included in `padding_chunks` for each group.
 - Knowledge graph parameters (`graph_name`, `hop_depth`, `include_paths`) enable graph-enhanced retrieval.


diff --git a/self-hosting.mdx b/self-hosting.mdx
@@ -1,9 +1,9 @@
 ---
 title: "Installation"
 description: "Install Morphik on your own infrastructure"
 ---

 For users who need to run Morphik on their own infrastructure, we provide two installation options: Direct Installation and Docker.

 <Tabs>
  <Tab title="Self Host - Direct Installation (Advanced)">
@@ -103,22 +103,21 @@
             <Tab title="macOS">
               ```bash
               # Install via Homebrew
-              brew install poppler tesseract libmagic
+              brew install poppler libmagic
               ```
             </Tab>
             <Tab title="Ubuntu/Debian">
               ```bash
               # Install via apt
               sudo apt-get update
-              sudo apt-get install -y poppler-utils tesseract-ocr libmagic-dev
+              sudo apt-get install -y poppler-utils libmagic-dev
               ```
             </Tab>
             <Tab title="Windows">
               For Windows, you may need to install these dependencies manually:
 
               1. **Poppler**: Download from [poppler for Windows](https://github.com/oschwartz10612/poppler-windows/releases/)
-              2. **Tesseract**: Download the installer from [UB Mannheim](https://github.com/UB-Mannheim/tesseract/wiki)
-              3. **libmagic**: This is included in the python-magic-bin package which will be installed with pip
+              2. **libmagic**: This is included in the python-magic-bin package which will be installed with pip
             </Tab>
           </Tabs>
           If you encounter database initialization issues within Docker, you may need to manually initialize the schema: