This issue is critical from an infrastructure perspective. Please, let me know what you think:
Some of the docker images are unexpectedly large (e.g., 7-8 GBs or more). Because of that:
- the system fails to execute these jobs due to timeout
- even if we override the timeout issue by pre-downloading the huge image, the cluster nodes get full very quickly, rendering the system unusable
I think we should reconsider our policy on the docker image sizes and set size restrictions. If we don't, we will experience a lot of failures and our storage resources will get consumes really fast. Also, if we run the system on a commercial cluster, the cost of the infrastructure will rise too high very quickly and it will be nearly impossible to scale.
I wonder how images get so big. What kind of software code requires so many GBs, considering that desktop OSs overloaded with heavy apps can be bundled in just 5 to 10 GBs. Is it just software or do we allow people to load images with data as well?
Putting any kind of data inside the images is a highly inefficient practice because it makes our system too expensive to run (I can explain this further, if you like).
What do you think we should do?
This issue is critical from an infrastructure perspective. Please, let me know what you think:
Some of the docker images are unexpectedly large (e.g., 7-8 GBs or more). Because of that:
I think we should reconsider our policy on the docker image sizes and set size restrictions. If we don't, we will experience a lot of failures and our storage resources will get consumes really fast. Also, if we run the system on a commercial cluster, the cost of the infrastructure will rise too high very quickly and it will be nearly impossible to scale.
I wonder how images get so big. What kind of software code requires so many GBs, considering that desktop OSs overloaded with heavy apps can be bundled in just 5 to 10 GBs. Is it just software or do we allow people to load images with data as well?
Putting any kind of data inside the images is a highly inefficient practice because it makes our system too expensive to run (I can explain this further, if you like).
What do you think we should do?