With Docker moving all of their official images to Alpine, base image size is a hot topic.  Sure, having sane and minimal base images is important, but software supply chain hygiene is equally (if not more) important - interested to understand why?

Among other things, it's important in a production container environment to have provenance (i.e. knowledge of where your container images came from). Using

Dockerfiles is a great way to track and enforce provenance policies. Each Dockerfile has a FROM line which specifies its upstream image. In a production environment, we have a couple of basic strategies that we can employ:

  1. Wild West: let people build their Dockerfiles using a FROM line that specifies any image on the Internet anywhere
  2. Black List: let people build their Dockerfiles using a FROM line that specifies any image except from few known bad places
  3. White List: let people build their Dockerfiles using a FROM line that specifies only known good images

My sysadmin genes twitch at the first two strategies, so let's go with number three. Strategy three seems to provide the best software supply chain hygiene for several reasons:

  1. We can scan and approve images as they come into the environment
  2. We can limit our attack surface from a content perspective
  3. We can limit the size of our on disk cache on each container host

I addressed the first two points in my article on DCI, but if you can minimize the number of genesis images (i.e. core builds), you really should be able to get the size of the on disk Docker cache down to something like:

Core Build * N + Software Layers * M

Going back to my old college days in computer science - with a fairly low number of core builds, this should get your on disk image size down to approximately the size of the Software Layers themselves. This is kind of like Big O of M. If you let people pull images from all of the green earth, then yes you can (and likely will) have this problem of image sprawl, but if you practice good hygiene, you should be able to reduce your on disk image size significantly. So, while small base images are useful for demos, having a wild number of base image permutations all over the container environment will actually expose you to more disk usage, and a larger attack surface.

Supply Chain Hygiene

Notice in the below output that lighttpd-rhel7 and python-34-rhel7 are both Builder Images, which use the Source to Image tooling to produce child images. This enforces hygiene and guarantees that the rhel7 base image will only ever be downloaded and cached once on any given Docker host.

docker run --privileged -v /var/run/docker.sock:/var/run/docker.sock nate/dockviz images -i -t registry.access.redhat.com/rhel7:latest
└─6c3a84d798dc Virtual Size: 201.7 MB Tags: registry.access.redhat.com:443/rhel7/rhel:latest, registry.access.redhat.com/rhel7:latest
  ├─eb1c00f5bd90 Virtual Size: 0.0 B
  │ └─309edfe8c834 Virtual Size: 0.0 B
  │   └─e55678196329 Virtual Size: 10.6 MB
  │     └─c0cfe103f6a8 Virtual Size: 26.9 MB
  │       └─e2a8aa8fc36d Virtual Size: 0.0 B
  │         └─013730cb9ca4 Virtual Size: 0.0 B
  │           └─e87b59df8cb9 Virtual Size: 1.2 KB
  │             └─3d001d53fb17 Virtual Size: 1.2 KB
  │               └─ec15cf3e6a54 Virtual Size: 447.0 B
  │                 └─2562d5e87e75 Virtual Size: 447.0 B
  │                   └─5de8fddd4822 Virtual Size: 0.0 B
  │                     └─fbc86b0dcde9 Virtual Size: 0.0 B
  │                       └─20e1f1bdc9cf Virtual Size: 0.0 B Tags: lighttpd-rhel7:latest
  │                         └─99a442cabd6a Virtual Size: 298.0 B Tags: lighttpd-test-app:latest
  └─ce709b84e064 Virtual Size: 173.6 MB
    └─bae1743ada78 Virtual Size: 86.8 MB Tags: registry.access.redhat.com/rhscl/python-34-rhel7:latest
      └─cc9b43e09c6e Virtual Size: 743.5 KB Tags: python-34-rhel7-app:latest

Put differently, in a more qualitative way, if the Python image layer is 86MB and the underlying RHEL 7 base image is 200MB, then each of your hosts should have exactly 200MB of storage used up for the base image. Scaled across tens or hundreds of applications all standardized on a rhel7 base image, the 200MB for the base image "fades away". Since this base image should be shared across most or all derived images, the impact of the base image size has a very small impact on your clusters. I would instead optimize on functionality (yum, strace, system tap) rather than image size. Another byproduct is, the developer can offload the updating of things like glibc to the systems administrators.

Moral of today's story: hygiene matters (a lot)... so build all (or most) of your images off of a core build. They can limit your attack surface across your entire environment, provide your developers with increased flexibility, and provide operations a more manageable container environment.