Skip to content

feat: specialized python playwright images#251

Merged
vladfrangu merged 27 commits intomasterfrom
feat/specialized-python-playwright-images
Feb 17, 2026
Merged

feat: specialized python playwright images#251
vladfrangu merged 27 commits intomasterfrom
feat/specialized-python-playwright-images

Conversation

@vladfrangu
Copy link
Copy Markdown
Member

Closes #214

@vladfrangu vladfrangu requested review from B4nan and vdusek January 15, 2026 10:43
@github-actions github-actions Bot added this to the 132nd sprint - Tooling team milestone Jan 15, 2026
@github-actions github-actions Bot added the t-tooling Issues with this label are in the ownership of the tooling team. label Jan 15, 2026
@B4nan B4nan requested a review from Copilot January 15, 2026 13:57
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request introduces specialized Python Playwright Docker images with browser-specific variants. The main python-playwright image now includes all browsers plus Google Chrome, while new specialized images (python-playwright-chrome, python-playwright-firefox, python-playwright-webkit, python-playwright-camoufox) contain only their respective browsers to reduce image sizes. Similar enhancements are applied to Node.js Playwright images.

Changes:

  • Created browser-specific Docker images for both Python and Node.js Playwright (chrome, firefox, webkit, camoufox variants)
  • Updated the main python-playwright and node-playwright images to include Google Chrome alongside Playwright browsers
  • Enhanced CI/CD workflows and build matrix to support the new specialized images

Reviewed changes

Copilot reviewed 39 out of 39 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
python-playwright/Dockerfile Updated to install Google Chrome and configure certificate handling for all browsers
python-playwright/src/main.py Enhanced test suite to validate both Playwright browsers and Google Chrome
python-playwright-chrome/Dockerfile New specialized image containing only Chromium and Google Chrome
python-playwright-firefox/Dockerfile New specialized image containing only Firefox browser
python-playwright-webkit/Dockerfile New specialized image containing only WebKit browser
python-playwright-camoufox/Dockerfile New specialized image containing only Camoufox browser
node-playwright/Dockerfile Updated to use Ubuntu Noble base and install Google Chrome
node-playwright-chrome/Dockerfile New specialized image for Chrome/Chromium only
node-playwright-firefox/Dockerfile New specialized image for Firefox only
node-playwright-webkit/Dockerfile New specialized image for WebKit only
node-playwright-camoufox/Dockerfile New specialized image for Camoufox only
Makefile Updated test targets and version numbers for all new images
.github/workflows/release-python-playwright.yaml Extended to build and publish all Python Playwright image variants
.github/actions/version-matrix/src/matrices/python/playwright.ts Updated matrix generation to include all image variants and Camoufox version

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread node-playwright/Dockerfile Outdated
Comment thread node-playwright/Dockerfile Outdated
Comment thread node-playwright-chrome/Dockerfile Outdated
Comment thread node-playwright-firefox/Dockerfile Outdated
Comment thread Makefile Outdated
Comment thread python-playwright-chrome/Dockerfile
Comment thread python-playwright-chrome/Dockerfile Outdated
setuptools \
wheel \
apify~=${APIFY_VERSION} \
playwright~=${PLAYWRIGHT_VERSION} \
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We seem to be installing Playwright multiple times:

First, the Playwright (Python package) is installed on line 85:

python3 -m pip install --user playwright~=${PLAYWRIGHT_VERSION}

Then it's installed again on line 122:

RUN pip install --upgrade playwright~=${PLAYWRIGHT_VERSION}

And finally, it's installed once more in the user-facing Actor template images (https://github.com/apify/actor-templates/blob/master/templates/python-playwright/Dockerfile#L21):

pip install -r requirements.txt

where requirements.txt contains just playwright (without version).

This should be reduced to only 1 installation.

Also, keep in mind that python3 -m pip install --user playwright and pip install playwright install Playwright into different locations, which is probably not what we want.


Summary:

  • There is a triple installation of playwright.
  • There is a duplicated installation of apify.

I know that at least part of this duplication existed before this PR, but it would be good to address it now. My suggestion would be:

  • Install playwright (and selenium) only in the base image, since Playwright is needed there for browser installation.
  • Install the apify package only in the child images, as users may want to control its version (and are doing it now), and it isn't required in the base image itself.

This should avoid redundant installs.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets go through them 1 by 1:

First install is to replicate node's npx playwright install(-deps). Not sure if pip by default has something similar or not. But we also don't want that to linger in the image

Second install (playwright+apify) are an annoying remnant from node images where we also install them (mostly so that the image will just run if you execute it without changes) + tests on our end 🙃

Third install in user images is an issue in node too, but there we document that you should either use * (which tells the package manager to use w/e is already installed) or manual pinning to the same version as your docker image. I also don't see how we would solve that and keep our tests functional. In fact, I would personally keep it as is instead of try to change things (last time I tried that I broke images 🙃 and had to revert twice)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if pip by default has something similar or not.

I would say it's pipx, but I'm not sure we want to use it in Dockerfile - We don't want another layer of isolation, Docker already provides one. And it will again clash with the pip installs in the child images.


In my opinion, we should:

  • Avoid using the --user flag with pip install in Dockerfiles.
  • Get rid of redundant installations.

This means installing Playwright, the required browser and its dependencies only in the base images, for example:

pip install --no-cache-dir "playwright~=${PLAYWRIGHT_VERSION}"
python -m playwright install-deps chromium

Then, install the apify package only in the child Dockerfiles.

This should result in minimal image size, no path inconsistencies, and no re-installs.


Could I ask for your opinions @Pijukatel and @janbuchar? We should also take into account the future uv-based templates (apify/actor-templates#350).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look, from my end I am all for dropping installing dependencies ahead of time (maybe we could even convince it for JS side too) but rn there is also the issue of version clashes for playwright (like we can see in CI here, older playwright versions on python do NOT have deps for debian 13 - but i am also willing to just drop em entirely)

But I also don't see the benefits of installing playwright on the base image if it gets reinstalled after on the user side?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I also don't see the benefits of installing playwright on the base image if it gets reinstalled after on the user side?

The crawlee-python templates actually use voodoo to avoid installing a different version of playwright than what's in the image. I think this makes sense because it makes the "user-provided" layers considerably smaller.

But in general, I also don't see much value in preinstalling crawlee and SDK.

@B4nan B4nan requested a review from Pijukatel February 3, 2026 14:34
@Pijukatel
Copy link
Copy Markdown

So, how is this going to be used once merged? Do we have to modify the templates in the template repo and in Crawlee?

@vladfrangu
Copy link
Copy Markdown
Member Author

So, how is this going to be used once merged? Do we have to modify the templates in the template repo and in Crawlee?

Ideally yes! Similar to how we have for node

@Pijukatel Pijukatel requested review from Pijukatel and removed request for Pijukatel February 5, 2026 08:37
Copy link
Copy Markdown

@Pijukatel Pijukatel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally yes! Similar to how we have for node

Then I would merge it and fix any remaining issues discovered when we try to use it.

Comment thread python-playwright-camoufox/Dockerfile Outdated
@vladfrangu vladfrangu requested a review from vdusek February 10, 2026 11:25
Copy link
Copy Markdown
Contributor

@vdusek vdusek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks 🚀

@B4nan
Copy link
Copy Markdown
Member

B4nan commented Feb 10, 2026

Here is what claude thinks about the PR, havent checked it in detail, often those things are not valid, on the other hand, usually at least something is, so please take a look at those points:

⏺ PR #251 Review: Specialized Python Playwright Images

  Author: @vladfrangu | Status: Open (2 approvals, most recent from @vdusek LGTM)
  Closes: #214 | +1676 / -548 across 60 files

  ---
  Summary

  This PR creates browser-specific Python Playwright Docker images (python-playwright-chrome, python-playwright-firefox, python-playwright-webkit, python-playwright-camoufox) alongside the existing "all browsers" python-playwright image. It also:
  - Removes pre-installed apify SDK from all Python images (users install it themselves)
  - Adds a Firefox intermediate certificate management system (pre-downloaded zip + fallback download)
  - Adds hourly update-certificates.yaml workflow
  - Updates Node.js Playwright images with consistent patterns (env vars moved up, noble base)
  - Renames Makefile test targets to be prefixed with node- / python-
  - Replaces real browser validation tests with dummy "warning" placeholder code

  ---
  Issues Found

  1. Copy-paste bug in node-playwright-firefox/Dockerfile

  The comment says "for chrome" but the commands install Firefox:
  # Install playwright browser dependencies and browsers for chrome  <-- WRONG
  && npx playwright install-deps firefox \
  && npx playwright install firefox \

  2. register_intermediate_certs.sh duplicated 6 times

  Identical 56-line script copied to node-playwright/, node-playwright-camoufox/, node-playwright-firefox/, python-playwright/, python-playwright-camoufox/, python-playwright-firefox/. Any future fix needs to be applied in all 6 places. Consider a shared
  certificates/register_intermediate_certs.sh that gets copied into the Docker context by the Makefile/CI.

  3. src/main.py duplicated across all Python images

  The same dummy "WARNING: replace this file" code is copy-pasted into 7 Python images. Same maintenance concern.

  4. All real Python test validation removed

  The old python-playwright/src/main.py actually launched all browsers and verified navigation worked. Now it just prints environment variables. Same for python-selenium and python. The Node images still have real browser tests (chrome_test.js,
  firefox_test.js, etc.), but Python images have zero runtime validation. How do you verify the images actually work during CI builds?

  5. Unnecessary packages in Chrome-only images

  python-playwright-chrome/Dockerfile installs jq and p11-kit with the comment "needed for the intermediate certificates to work in Firefox" — but this image only contains Chrome/Chromium and no Firefox. These packages add unnecessary bloat.

  6. CI matrix explosion

  The playwright matrix now iterates over imageNames (5 images) inside the existing python-version x playwright-version loop. With 5 Python versions, 5 Playwright versions, and 5 images, that's ~125 matrix entries (up from ~25). This is a significant CI
  cost increase. Consider whether specialized images really need to be tested with all 5 Playwright versions or just the latest.

  7. Hourly certificate update seems excessive

  update-certificates.yaml runs cron: "0 * * * *" (every hour). Intermediate certificates don't change that frequently. A daily schedule would reduce commit noise and workflow minutes significantly while still keeping certificates fresh.

  8. Binary file committed to git

  certificates/firefox-certificates.zip is a binary tracked in git and updated hourly by CI. Over time this will bloat the repo history. Consider storing it as a GitHub release artifact or in an external cache instead.

  9. start_xvfb_and_run_cmd.sh files are dead code

  All new Python images include a start_xvfb_and_run_cmd.sh that just does exit 0. If it's not used, it shouldn't be shipped.

  10. Dead code: Python 3.9 camoufox check

  In playwright.ts, there's a check if (imageName.includes('camoufox') && pythonVersion === '3.9'). But supportedPythonVersions based on the cache data is 3.10,3.11,3.12,3.13,3.14 — 3.9 isn't in the list, making this a no-op guard.

  ---
  Minor / Nits

  - node-playwright/Dockerfile still has the old typo comment # Tell Node.js this is a production environemnt (line unchanged from before)
  - python-playwright/xvfb-entrypoint.sh adds echo "Running on architecture: $(uname -m)" that no other image has — inconsistency
  - Base image upgrade from jammy to noble/trixie is a significant change that should be called out in the PR description

  ---
  What's Good

  - Clean separation of browser-specific images will reduce image sizes for users
  - Removing pre-installed apify SDK is the right call — users should manage their own dependencies
  - Certificate pre-download with fallback is more robust than download-at-build-time
  - Consistent Dockerfile patterns across all images (env vars at top, SHELL instruction, etc.)
  - Makefile helper functions (copy-firefox-certs, cleanup-firefox-certs) are a nice touch

  ---
  Verdict

  The overall architecture is sound. The main concerns are the missing Python test validation (regression in CI confidence), the CI matrix cost explosion, and the significant code duplication that will be a maintenance burden. The copy-paste comment bug
  in the Firefox Dockerfile should be fixed before merge.

@vladfrangu
Copy link
Copy Markdown
Member Author

Handled those + self reviews from Claude, feel free to take one last human look (esp @vdusek pls)

Comment thread python-playwright-camoufox/src/__main__.py
Copy link
Copy Markdown
Contributor

@vdusek vdusek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I few more things:

  1. Python images run as root. Unlike the Node images which create myuser and USER myuser then install browsers as that user, the Python images create myuser but never switch to it (USER myuser is missing before WORKDIR). The WORKDIR /usr/src/app is owned by myuser via chown, but pip install and browser install run as root. The APIFY_DEFAULT_BROWSER_PATH for camoufox points to /root/.cache/camoufox/camoufox-bin, which won't be accessible if the container later runs as non-root.

  2. Selenium test was gutted. The python-selenium/src/main.py no longer tests Selenium at all - it just prints environment variables and a warning. The previous version actually launched Chrome and Firefox via Selenium and verified page loading. This means the CI "test" step for python-selenium no longer validates the image works.

Not sure about these, just for consideration:

  1. python -m pip install playwright && python -m playwright install-deps firefox && python -m pip uninstall -y playwright - Installing playwright just to get OS deps, then uninstalling it, then reinstalling it later. This is wasteful (extra layer, download twice). Could use --dry-run or just keep playwright installed from the first step.

  2. python-playwright-chrome Dockerfile missing copy-firefox-certs in Makefile. The test-python-playwright-chrome Make target doesn't call copy-firefox-certs or cleanup-firefox-certs, but neither does the Dockerfile reference firefox-certs. This is correct since Chrome doesn't need Firefox certs, but inconsistent with the CI workflow, which copies certs to ALL image folders unconditionally (lines 481-489 of the workflow diff).

@vladfrangu
Copy link
Copy Markdown
Member Author

i have intentionally not done 1 because it breaks templates downstream due to permission errors. If you want, we can do it after but I will need help from you python peeps...

  1. Will bring them back, idk why my claude didn't when i told it to bring tests back 🙃

  2. Intentional, we use latest playwright because older playwright dont support newer os releases so they fail to install deps. It sucks but the alternative is using a npx esque python command

  3. Intentional, chrome does not use it and I really dont wanna bother with filtering it on the workflow level bc if we forget to update it in the future stuff might break

Copy link
Copy Markdown
Contributor

@vdusek vdusek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i have intentionally not done 1 because it breaks templates downstream due to permission errors. If you want, we can do it after but I will need help from you python peeps...

Will bring them back, idk why my claude didn't when i told it to bring tests back 🙃

Intentional, we use latest playwright because older playwright dont support newer os releases so they fail to install deps. It sucks but the alternative is using a npx esque python command

Intentional, chrome does not use it and I really dont wanna bother with filtering it on the workflow level bc if we forget to update it in the future stuff might break

Perfect, got it.

So please resolve 2) and open an issue for 1), and let's resolve it later.

thanks.

Otherwise LGTM 🚀

Copy link
Copy Markdown
Member

@B4nan B4nan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(No more comments from my end)

@vladfrangu vladfrangu merged commit 672e16b into master Feb 17, 2026
63 checks passed
@vladfrangu vladfrangu deleted the feat/specialized-python-playwright-images branch February 17, 2026 15:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

t-tooling Issues with this label are in the ownership of the tooling team.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Split Python Playwright docker images similarly to Node base images

7 participants