It’s been just over three years since Solomon Hykes presented the world with the (so far) most creative way to use the tar command: the Docker project. Not only does the project combine existing container-technologies and make them easier to use, but its well-timed introduction drove an unprecedented rate of adoption for new technology.
Did people run containers before the Docker project? Yes, but it was harder to do so. The broader community was favoring LXC, and Red Hat was working on a libvirt-based model for Red Hat Enterprise Linux. With OpenShift 2, Red Hat had already been running containers in production for several years – both in an online PaaS as well as on-premise for enterprise customers. The model pre-Docker however was fundamentally different from what we are seeing today: rather than enabling completely independent runtimes inside the containers, the approach in OpenShift 2 and libvirt-lxc was to partition the host, re-using the software installed on the host-machine. There were several issues with this model, however, with the most prominent being complexity. Modern deployments are so complex that the process of recreating an application stack (from a puppet manifest, for example) over and over again in dev / test / ops has become too fragile.
This mirrors the problem that we faced with the predominant operational model roughly 20 years ago, when we moved from compiling software on local machines to pre-build binary distribution with rpm. The issue we solved in the “olden days” was that the behavior of a locally compiled application was dependent on the state of the machine at build time and the overhead of this model. We needed binary distribution to achieve a predictable experience of the aggregate software stack.
Today, stacks are so complex and changes in software streams so frequent, that the stack you build is neither what you test nor is what you end up running in production; adding on top of this is the demand for updating applications/systems in place. This brings us back to a situation where the behavior of a production software stack simply becomes dependent on too many variables.
So how do containers, specifically the packaging as provided by the Docker project, marginalize if not outright eliminate these variables? By partitioning and aggregating, of course, which leads to a whole other set of challenges and solutions…but that’s for my next post.