Skip to content

fix: partially support image-text-to-text models on HuggingFaceLocalChatGenerator#10928

Open
srini047 wants to merge 1 commit intodeepset-ai:mainfrom
srini047:hf-image-text-to-text
Open

fix: partially support image-text-to-text models on HuggingFaceLocalChatGenerator#10928
srini047 wants to merge 1 commit intodeepset-ai:mainfrom
srini047:hf-image-text-to-text

Conversation

@srini047
Copy link
Contributor

@srini047 srini047 commented Mar 26, 2026

Related Issues

Proposed Changes:

Currently haystack doesn't support image-text-to-text generation. Hence, adding support for the same.

How did you test it?

Add test cases and mocked those by validating it.

Notes to the reviewer

If you want me to revert the changes for test_init_text2text_generation_raises_error let me know.

Checklist

  • I have read the contributors guidelines and the code of conduct.
  • I have updated the related issue with new insights and changes.
  • I have added unit tests and updated the docstrings.
  • I've used one of the conventional commit types for my PR title: fix:, feat:, build:, chore:, ci:, docs:, style:, refactor:, perf:, test: and added ! in case the PR includes breaking changes.
  • I have documented my code.
  • I have added a release note file, following the contributors guidelines.
  • I have run pre-commit hooks and fixed any issue.

@srini047 srini047 requested a review from a team as a code owner March 26, 2026 06:57
@srini047 srini047 requested review from anakin87 and removed request for a team March 26, 2026 06:57
@vercel
Copy link

vercel bot commented Mar 26, 2026

@srini047 is attempting to deploy a commit to the deepset Team on Vercel.

A member of the Team first needs to authorize it.

@github-actions github-actions bot added topic:tests type:documentation Improvements on the docs labels Mar 26, 2026
@srini047 srini047 force-pushed the hf-image-text-to-text branch from 8030e14 to 4d345be Compare March 26, 2026 07:00
@srini047 srini047 changed the title fix: add support for image-text-to-text generation for hf feat: add support for image-text-to-text generation for hf Mar 26, 2026
@anakin87 anakin87 changed the title feat: add support for image-text-to-text generation for hf fix: partially support image-text-to-text models on HuggingFaceLocalChatGenerator Mar 26, 2026
Copy link
Member

@anakin87 anakin87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the contribution!

See #10926 (comment)

Your fix looks good.

I left a few comments.

- `text-generation`: Supported by decoder models, like GPT.
- `text2text-generation`: Deprecated as of Transformers v5; use `text-generation` instead.
Previously supported by encoder–decoder models such as T5.
- `image-text-to-text`: Supported by vision-language models, like Qwen2-VL.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- `image-text-to-text`: Supported by vision-language models, like Qwen2-VL.
- `image-text-to-text`: Supported by vision-language models.

This would need less updates in the future

Comment on lines +168 to +170
@patch("haystack.components.generators.chat.hugging_face_local.transformers")
def test_init_text2text_generation_raises_error(self, version_mock):
version_mock.__version__ = "5.0.0"
Copy link
Member

@anakin87 anakin87 Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should not be needed since we run tests with >=5.0.0

Comment on lines +856 to +857
@patch("haystack.components.generators.chat.hugging_face_local.pipeline")
def test_run_image_text_to_text(self, pipeline_init_mock, model_info_mock, chat_messages):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1.test_run_image_text_to_text test can be removed
2. I'd add a very simple test like this: if the initialization is successful, we are OK (it's currently failing on main)

llm = HuggingFaceLocalChatGenerator(model="Qwen/Qwen3.5-0.8B")
assert llm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

topic:tests type:documentation Improvements on the docs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add image-text-to-text task support to HuggingFaceLocalChatGenerator

2 participants