feat: specialized python playwright images#251
Conversation
There was a problem hiding this comment.
Pull request overview
This pull request introduces specialized Python Playwright Docker images with browser-specific variants. The main python-playwright image now includes all browsers plus Google Chrome, while new specialized images (python-playwright-chrome, python-playwright-firefox, python-playwright-webkit, python-playwright-camoufox) contain only their respective browsers to reduce image sizes. Similar enhancements are applied to Node.js Playwright images.
Changes:
- Created browser-specific Docker images for both Python and Node.js Playwright (chrome, firefox, webkit, camoufox variants)
- Updated the main python-playwright and node-playwright images to include Google Chrome alongside Playwright browsers
- Enhanced CI/CD workflows and build matrix to support the new specialized images
Reviewed changes
Copilot reviewed 39 out of 39 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| python-playwright/Dockerfile | Updated to install Google Chrome and configure certificate handling for all browsers |
| python-playwright/src/main.py | Enhanced test suite to validate both Playwright browsers and Google Chrome |
| python-playwright-chrome/Dockerfile | New specialized image containing only Chromium and Google Chrome |
| python-playwright-firefox/Dockerfile | New specialized image containing only Firefox browser |
| python-playwright-webkit/Dockerfile | New specialized image containing only WebKit browser |
| python-playwright-camoufox/Dockerfile | New specialized image containing only Camoufox browser |
| node-playwright/Dockerfile | Updated to use Ubuntu Noble base and install Google Chrome |
| node-playwright-chrome/Dockerfile | New specialized image for Chrome/Chromium only |
| node-playwright-firefox/Dockerfile | New specialized image for Firefox only |
| node-playwright-webkit/Dockerfile | New specialized image for WebKit only |
| node-playwright-camoufox/Dockerfile | New specialized image for Camoufox only |
| Makefile | Updated test targets and version numbers for all new images |
| .github/workflows/release-python-playwright.yaml | Extended to build and publish all Python Playwright image variants |
| .github/actions/version-matrix/src/matrices/python/playwright.ts | Updated matrix generation to include all image variants and Camoufox version |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
| setuptools \ | ||
| wheel \ | ||
| apify~=${APIFY_VERSION} \ | ||
| playwright~=${PLAYWRIGHT_VERSION} \ |
There was a problem hiding this comment.
We seem to be installing Playwright multiple times:
First, the Playwright (Python package) is installed on line 85:
python3 -m pip install --user playwright~=${PLAYWRIGHT_VERSION}
Then it's installed again on line 122:
RUN pip install --upgrade playwright~=${PLAYWRIGHT_VERSION}
And finally, it's installed once more in the user-facing Actor template images (https://github.com/apify/actor-templates/blob/master/templates/python-playwright/Dockerfile#L21):
pip install -r requirements.txt
where requirements.txt contains just playwright (without version).
This should be reduced to only 1 installation.
Also, keep in mind that python3 -m pip install --user playwright and pip install playwright install Playwright into different locations, which is probably not what we want.
Summary:
- There is a triple installation of
playwright. - There is a duplicated installation of
apify.
I know that at least part of this duplication existed before this PR, but it would be good to address it now. My suggestion would be:
- Install
playwright(andselenium) only in the base image, since Playwright is needed there for browser installation. - Install the
apifypackage only in the child images, as users may want to control its version (and are doing it now), and it isn't required in the base image itself.
This should avoid redundant installs.
There was a problem hiding this comment.
Lets go through them 1 by 1:
First install is to replicate node's npx playwright install(-deps). Not sure if pip by default has something similar or not. But we also don't want that to linger in the image
Second install (playwright+apify) are an annoying remnant from node images where we also install them (mostly so that the image will just run if you execute it without changes) + tests on our end 🙃
Third install in user images is an issue in node too, but there we document that you should either use * (which tells the package manager to use w/e is already installed) or manual pinning to the same version as your docker image. I also don't see how we would solve that and keep our tests functional. In fact, I would personally keep it as is instead of try to change things (last time I tried that I broke images 🙃 and had to revert twice)
There was a problem hiding this comment.
Not sure if pip by default has something similar or not.
I would say it's pipx, but I'm not sure we want to use it in Dockerfile - We don't want another layer of isolation, Docker already provides one. And it will again clash with the pip installs in the child images.
In my opinion, we should:
- Avoid using the
--userflag with pip install in Dockerfiles. - Get rid of redundant installations.
This means installing Playwright, the required browser and its dependencies only in the base images, for example:
pip install --no-cache-dir "playwright~=${PLAYWRIGHT_VERSION}"
python -m playwright install-deps chromium
Then, install the apify package only in the child Dockerfiles.
This should result in minimal image size, no path inconsistencies, and no re-installs.
Could I ask for your opinions @Pijukatel and @janbuchar? We should also take into account the future uv-based templates (apify/actor-templates#350).
There was a problem hiding this comment.
Look, from my end I am all for dropping installing dependencies ahead of time (maybe we could even convince it for JS side too) but rn there is also the issue of version clashes for playwright (like we can see in CI here, older playwright versions on python do NOT have deps for debian 13 - but i am also willing to just drop em entirely)
But I also don't see the benefits of installing playwright on the base image if it gets reinstalled after on the user side?
There was a problem hiding this comment.
But I also don't see the benefits of installing playwright on the base image if it gets reinstalled after on the user side?
The crawlee-python templates actually use voodoo to avoid installing a different version of playwright than what's in the image. I think this makes sense because it makes the "user-provided" layers considerably smaller.
But in general, I also don't see much value in preinstalling crawlee and SDK.
we build several hundred images at a time, lets be nicer to Firefox's CDN 😄
This reverts commit acdd5b1.
|
So, how is this going to be used once merged? Do we have to modify the templates in the template repo and in Crawlee? |
Ideally yes! Similar to how we have for node |
Pijukatel
left a comment
There was a problem hiding this comment.
Ideally yes! Similar to how we have for node
Then I would merge it and fix any remaining issues discovered when we try to use it.
|
Here is what claude thinks about the PR, havent checked it in detail, often those things are not valid, on the other hand, usually at least something is, so please take a look at those points: |
|
Handled those + self reviews from Claude, feel free to take one last human look (esp @vdusek pls) |
vdusek
left a comment
There was a problem hiding this comment.
Okay, I few more things:
-
Python images run as root. Unlike the Node images which create myuser and USER myuser then install browsers as that user, the Python images create myuser but never switch to it (USER myuser is missing before WORKDIR). The WORKDIR
/usr/src/appis owned by myuser via chown, but pip install and browser install run as root. TheAPIFY_DEFAULT_BROWSER_PATHfor camoufox points to/root/.cache/camoufox/camoufox-bin, which won't be accessible if the container later runs as non-root. -
Selenium test was gutted. The
python-selenium/src/main.pyno longer tests Selenium at all - it just prints environment variables and a warning. The previous version actually launched Chrome and Firefox via Selenium and verified page loading. This means the CI "test" step for python-selenium no longer validates the image works.
Not sure about these, just for consideration:
-
python -m pip install playwright && python -m playwright install-deps firefox && python -m pip uninstall -y playwright- Installing playwright just to get OS deps, then uninstalling it, then reinstalling it later. This is wasteful (extra layer, download twice). Could use--dry-runor just keep playwright installed from the first step. -
python-playwright-chromeDockerfile missingcopy-firefox-certsin Makefile. Thetest-python-playwright-chromeMake target doesn't call copy-firefox-certs or cleanup-firefox-certs, but neither does the Dockerfile reference firefox-certs. This is correct since Chrome doesn't need Firefox certs, but inconsistent with the CI workflow, which copies certs to ALL image folders unconditionally (lines 481-489 of the workflow diff).
|
i have intentionally not done 1 because it breaks templates downstream due to permission errors. If you want, we can do it after but I will need help from you python peeps...
|
vdusek
left a comment
There was a problem hiding this comment.
i have intentionally not done 1 because it breaks templates downstream due to permission errors. If you want, we can do it after but I will need help from you python peeps...
Will bring them back, idk why my claude didn't when i told it to bring tests back 🙃
Intentional, we use latest playwright because older playwright dont support newer os releases so they fail to install deps. It sucks but the alternative is using a npx esque python command
Intentional, chrome does not use it and I really dont wanna bother with filtering it on the workflow level bc if we forget to update it in the future stuff might break
Perfect, got it.
So please resolve 2) and open an issue for 1), and let's resolve it later.
thanks.
Otherwise LGTM 🚀
Closes #214