Skip to content

feat(ollama): add dimensions parameter to OllamaDocumentEmbedder and OllamaTextEmbedder#3322

Merged
bogdankostic merged 5 commits into
deepset-ai:mainfrom
ShubhamGond105:feature/ollama-embedder-dimensions
May 21, 2026
Merged

feat(ollama): add dimensions parameter to OllamaDocumentEmbedder and OllamaTextEmbedder#3322
bogdankostic merged 5 commits into
deepset-ai:mainfrom
ShubhamGond105:feature/ollama-embedder-dimensions

Conversation

@ShubhamGond105
Copy link
Copy Markdown
Contributor

Related Issues

Proposed Changes:

Added dimensions: int | None = None parameter to both OllamaDocumentEmbedder
and OllamaTextEmbedder.

The Ollama SDK (>= 0.6.2) supports a dimensions parameter on client.embed()
that enables server-side embedding truncation via Matryoshka Representation Learning (MRL).
This parameter is a top-level argument of the request payload and cannot be passed
via generation_kwargs / options.

With this change, users can now do:

embedder = OllamaDocumentEmbedder(model="nomic-embed-text", dimensions=512)

instead of manually truncating and re-normalizing vectors client-side.

How did you test it?

  • Added unit tests for both OllamaDocumentEmbedder and OllamaTextEmbedder covering:
    • dimensions defaults to None
    • dimensions is stored on the instance
    • dimensions is forwarded to the sync client
    • dimensions is forwarded to the async client
    • dimensions survives serialization via default_to_dict / default_from_dict
  • All 20 unit tests pass locally

Notes for the reviewer

  • dimensions=None preserves existing behaviour — fully backward compatible
  • Both sync (run) and async (run_async) paths updated in both embedders
  • Also updated text_embedder.py to use the newer client.embed() API
    instead of the deprecated client.embeddings() for consistency

Checklist

  • I have read the contributors guidelines and the code of conduct
  • I have updated the related issue with new insights and changes
  • I added unit tests and updated the docstrings
  • I've used one of the conventional commit types for my PR title: feat:

@ShubhamGond105 ShubhamGond105 requested a review from a team as a code owner May 15, 2026 19:59
@ShubhamGond105 ShubhamGond105 requested review from bogdankostic and removed request for a team May 15, 2026 19:59
@github-actions github-actions Bot added integration:chroma integration:ollama type:documentation Improvements or additions to documentation labels May 15, 2026
@ShubhamGond105
Copy link
Copy Markdown
Contributor Author

The chroma test failures are caused by an extra commit (63e66be)
that was accidentally included when I synced my fork.
That commit is not part of my changes.

The ollama test failures are integration tests that require
a running Ollama server which is not available in CI.

My actual changes only touch the ollama embedder files and their tests.

@ShubhamGond105 ShubhamGond105 force-pushed the feature/ollama-embedder-dimensions branch 2 times, most recently from 5a3b834 to d50a95f Compare May 16, 2026 15:43
@ShubhamGond105 ShubhamGond105 force-pushed the feature/ollama-embedder-dimensions branch from d50a95f to 4b0367b Compare May 16, 2026 18:24
Copy link
Copy Markdown
Contributor

@bogdankostic bogdankostic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ShubhamGond105, thank your for this PR! The linter check is currently failing, please make sure to run the linter as described in our contributing guidelines.

@bogdankostic bogdankostic self-assigned this May 19, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 19, 2026

Coverage report (ollama)

Click to see where and how coverage changed

FileStatementsMissingCoverageCoverage
(new stmts)
Lines missing
  integrations/ollama/src/haystack_integrations/components/embedders/ollama
  document_embedder.py
  text_embedder.py
Project Total  

This report was generated by python-coverage-comment-action

@ShubhamGond105
Copy link
Copy Markdown
Contributor Author

Hi @bogdankostic ,

I have fixed the lint errors — all imports have been moved to the top level as required. All checks are now passing.

Please let me know if there is anything else to address.

Thank you!

Copy link
Copy Markdown
Contributor

@bogdankostic bogdankostic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing the lint errors! The PR is almost good to go, I just added a couple of minor comments.

The desired number of dimensions in the embedding output. Only supported by models
that implement Matryoshka Representation Learning (MRL), such as nomic-embed-text-v1.5,
mxbai-embed-large, and qwen3-embedding. If None (default), the full vector is returned.
Requires ollama-python >= 0.6.2.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove this line from the docstring and instead bump the version of ollama-python in pyproject.toml

Comment on lines +11 to +19
Computes the embeddings of a string using embedding models compatible with the Ollama Library.

It uses embedding models compatible with the Ollama Library.

Usage example:
Usage example:
```python
from haystack_integrations.components.embedders.ollama import OllamaTextEmbedder
from haystack_integrations.components.embedders.ollama import OllamaTextEmbedder

embedder = OllamaTextEmbedder()
result = embedder.run(text="What do llamas say once you have thanked them? No probllama!")
print(result['embedding'])
embedder = OllamaTextEmbedder()
result = embedder.run(text="What do llamas say once you have thanked them? No probllama!")
print(result['embedding'])
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's revert the extra indent introduced here.

The desired number of dimensions in the embedding output. Only supported by models
that implement Matryoshka Representation Learning (MRL), such as nomic-embed-text-v1.5,
mxbai-embed-large, and qwen3-embedding. If None (default), the full vector is returned.
Requires ollama-python >= 0.6.2.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove this line from the docstring and instead bump the version of ollama-python in pyproject.toml

assert embedder.dimensions == 512

def test_dimensions_passed_to_embed_client(self):

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove these empty lines in tests between the test name and test code.

mock_response = {"embeddings": [[0.1, 0.2, 0.3]]}
embedder._async_client.embed = AsyncMock(return_value=mock_response)

asyncio.run(embedder._embed_batch_async(["hello"], batch_size=32))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For async tests, let's use the @pytest.mark.async decorator, change def to async def and call async methods using await.

@ShubhamGond105
Copy link
Copy Markdown
Contributor Author

Hi @bogdankostic,

I have addressed all the feedback points:

  • Removed Requires ollama-python >= 0.6.2. from both docstrings
  • Bumped ollama>=0.5.0 to ollama>=0.6.2 in pyproject.toml
  • Reverted the extra indentation in text_embedder.py class docstring
  • Removed empty lines between test name and test code
  • Fixed async test to use @pytest.mark.asyncio with async def and await

All checks are now passing. Please take a look when you get a chance!

Thank you!

@bogdankostic bogdankostic changed the title feat: add dimensions parameter to OllamaDocumentEmbedder and OllamaTextEmbedder feat(ollama): add dimensions parameter to OllamaDocumentEmbedder and OllamaTextEmbedder May 21, 2026
Copy link
Copy Markdown
Contributor

@bogdankostic bogdankostic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing the feedback @ShubhamGond105! :)

@bogdankostic bogdankostic merged commit dbb62df into deepset-ai:main May 21, 2026
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

integration:chroma integration:ollama type:documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add dimensions parameter to OllamaDocumentEmbedder and OllamaTextEmbedder

2 participants