You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Reorganzie dependency installation for better squashing
I'll leave it up to y'all to decide if the changes/risks here are worth
the reduction in image size. Thanks!
Reduced image size
┌─────────────────┬──────────┬─────────┬───────────────┐
│ Metric │ Original │ New │ Reduction │
├─────────────────┼──────────┼─────────┼───────────────┤
│ Image Size │ 60.1 GB │ 48.2 GB │ 11.9 GB (20%) │
├─────────────────┼──────────┼─────────┼───────────────┤
│ Filesystem Size │ 49 GB │ 44 GB │ 5 GB (10%) │
└─────────────────┴──────────┴─────────┴───────────────┘
Note: Image size includes all layers; filesystem size is the actual disk usage inside the container.
- Added --no-cache to uv pip install (Safe)
Cache is only useful for repeated installs in the same environment. In Docker builds, each layer is fresh, so cache provides no benefit.
- Removed Intel MKL numpy (Less sure)
Removed the Intel MKL numpy install from Intel's Anaconda channel.
Intel's channel only has numpy 1.26.4 (numpy 1.x), but the base image has numpy 2.0.2.
Installing Intel's numpy would downgrade and break packages compiled against numpy 2.x ABI.
The base image's numpy 2.0.2 uses OpenBLAS optimizations and is compatible with all installed packages.
- Removed preprocessing package (Less sure)
Package is unmaintained (last release 2017) and requires nltk==3.2.4 which is incompatible with Python 3.11 (inspect.formatargspec was removed).
Package hasn't been updated in 7+ years and cannot function on Python 3.11.
- Updated scikit-learn to 1.5.2 (Less sure)
Changed from scikit-learn==1.2.2 to scikit-learn==1.5.2.
scikit-learn 1.2.2 binary wheels are incompatible with numpy 2.x ABI, causing "numpy.dtype size changed" errors.
scikit-learn 1.5.x maintains API compatibility with 1.2.x. The original pin was for eli5/learntools compatibility, which should work with 1.5.x.
- Added uv cache cleanup to clean-layer.sh (safe)
Added /root/.cache/uv/* to the cleanup script.
The script only cleaned pip cache, not uv cache.
Cache cleanup scripts are run after package installs; cache is not needed at runtime.
RUN uv pip install --no-cache --system -r /requirements.txt
17
65
18
66
# Install manual packages:
19
67
# b/183041606#comment5: the Kaggle data proxy doesn't support these APIs. If the library is missing, it falls back to using a regular BigQuery query to fetch data.
20
68
RUN uv pip uninstall --system google-cloud-bigquery-storage
21
69
22
-
# b/394382016: sigstore (dependency of kagglehub) requires a prerelease packages, installing separate.
23
-
RUN uv pip install --system --force-reinstall --prerelease=allow "kagglehub[pandas-datasets,hf-datasets,signing]>=0.3.12"
24
-
25
70
# uv cannot install this in requirements.txt without --no-build-isolation
26
71
# to avoid affecting the larger build, we'll post-install it.
27
-
RUN uv pip install --no-build-isolation --system "git+https://github.com/Kaggle/learntools"
72
+
RUN uv pip install --no-cache --no-build-isolation --system "git+https://github.com/Kaggle/learntools"
28
73
29
74
# newer daal4py requires tbb>=2022, but libpysal is downgrading it for some reason
30
-
RUN uv pip install --system "tbb>=2022""libpysal==4.9.2"
31
-
32
-
# b/404590350: Ray and torchtune have conflicting tune cli, we will prioritize torchtune.
33
-
# b/415358158: Gensim removed from Colab image to upgrade scipy
34
-
# b/456239669: remove huggingface-hub pin when pytorch-lighting and transformer are compatible
35
-
# b/315753846: Unpin translate package, currently conflicts with adk 1.17.0
36
-
# b/468379293: Unpin Pandas once cuml/cudf are compatible, version 3.0 causes issues
37
-
# b/468383498: numpy will auto-upgrade to 2.4.x, which causes issues with numerous packages
38
-
# b/468367647: Unpin protobuf, version greater than v5.29.5 causes issues with numerous packages
39
-
RUN uv pip install --system --force-reinstall --no-deps torchtune gensim "scipy<=1.15.3""huggingface-hub==0.36.0""google-cloud-translate==3.12.1""numpy==2.0.2""pandas==2.2.2"
40
-
RUN uv pip install --system --force-reinstall "protobuf==5.29.5"
75
+
RUN uv pip install --no-cache --system "tbb>=2022""libpysal==4.9.2"
0 commit comments