Skip to content

Conversation

@anakin87
Copy link
Member

@anakin87 anakin87 commented Jan 14, 2026

Related Issues

I investigated Haystack Docker builds in relation to deepset-ai/hayhooks#199

I found out several optimization opportunities.

Proposed Changes:

  • add stable tag for v2.Y.Z versions (needed for Use latest stable Haystack in Hayhooks Docker image hayhooks#199)
  • stop installing xpdf and related libraries (only needed in Haystack 1.x - see fix: provide a fallback for PyMuPDF #4564) -> image size reduced from 303 MB to 248 MB
  • revisit the two-stage Docker build:
    • do not install no longer needed low-level system libraries
    • use uv pip interface to install packages
    • this seems to reduce image build time in the CI from ~ 7 minutes to ~ 2 minutes
  • make sure that Docker build workflow runs only when relevant files are changed (no docs, no CI workflows other than Docker one, ...)
  • update Docker README, removing 1.x references

How did you test it?

I tried to use this PR for testing with some workarounds. See for example https://github.com/deepset-ai/haystack/actions/runs/21027111342/job/60454125178

The only aspect I could not fully test is the stable tag: it requires the next Haystack release to be published.

I also tested the built image locally, in isolation and with Hayhooks. Seems to work as always.

Checklist

  • I have read the contributors guidelines and the code of conduct.
  • I have updated the related issue with new insights and changes.
  • I have added unit tests and updated the docstrings.
  • I've used one of the conventional commit types for my PR title: fix:, feat:, build:, chore:, ci:, docs:, style:, refactor:, perf:, test: and added ! in case the PR includes breaking changes.
  • I have documented my code.
  • I have added a release note file, following the contributors guidelines.
  • I have run pre-commit hooks and fixed any issue.

@vercel
Copy link

vercel bot commented Jan 14, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Review Updated (UTC)
haystack-docs Ignored Ignored Preview Jan 16, 2026 9:43am

Review with Vercel Agent

@anakin87 anakin87 changed the title ci: docker refactor - WIP ci: refactor/optimize Docker builds Jan 14, 2026
@anakin87 anakin87 marked this pull request as ready for review January 15, 2026 10:27
@anakin87 anakin87 requested review from a team as code owners January 15, 2026 10:27
@anakin87 anakin87 requested review from davidsbatista and removed request for a team January 15, 2026 10:27
@anakin87 anakin87 requested a review from mpangrazzi January 15, 2026 10:27
Copy link
Contributor

@mpangrazzi mpangrazzi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

I was also thinking -as a future improvement- about using GH cache when it comes to build the image.

- name: Build base images
  # ...
  with:
    # ...
    set: |
      *.cache-from=type=gha
      *.cache-to=type=gha,mode=max

This will store Docker build layers GA cache (we should have 10GB of free cache). So on subsequent builds, unchanged layers are pulled from cache instead of rebuilt.

Note: no need to do it now, it's just an idea and definitely not tested. It may speed up the build phase quite a bit though.

@anakin87
Copy link
Member Author

I was also thinking -as a future improvement- about using GH cache when it comes to build the image.
...

This will store Docker build layers GA cache (we should have 10GB of free cache). So on subsequent builds, unchanged layers are pulled from cache instead of rebuilt.

Note: no need to do it now, it's just an idea and definitely not tested. It may speed up the build phase quite a bit though.

Great idea! I'll see if feasible

@coveralls
Copy link
Collaborator

coveralls commented Jan 16, 2026

Pull Request Test Coverage Report for Build 21062222416

Details

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage remained the same at 92.225%

Totals Coverage Status
Change from base Build 21061072654: 0.0%
Covered Lines: 14413
Relevant Lines: 15628

💛 - Coveralls

@anakin87
Copy link
Member Author

I did some attempts with caching, but I ended up not rebuilding the image when important files are changed, which is not desired.

As you said, caching probably needs a better investigation. Skipping for now.

@anakin87 anakin87 merged commit 0cdccae into main Jan 16, 2026
11 checks passed
@anakin87 anakin87 deleted the docker-refactor branch January 16, 2026 09:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants