Container Tidbits: When Should I Break My Application into Multiple Containers?

There is a lot of confusion around which pieces of your application you should break into multiple containers and why. I recently responded to this thread on the Docker user mailing list which led me to writing today’s post. In this post I plan to examine an imaginary Java application that historically ran on a single Tomcat server and to explain why I would break it apart into separate containers. In an attempt to make things interesting – I will also aim to justify this action (i.e. breaking the application into separate containers) with data and (engineering) logic… as opposed to simply stating that “there is a principle” and that one must adhere to it all of the time.

Let’s take an example Java application made up of the following two components:

  1. A front-end application built on the Struts Web Framework
  2. A back-end REST API server built on Java EE

As mentioned, this application historically ran in a single Tomcat server and the two components were communicating over a REST-based API… so the question becomes:

Should I break this application into multiple containers?

Yes. I believe this application should be decomposed into two different Docker containers… but only after careful consideration.

Instead of breaking applications up into multiple containers “just because” or in an attempt to adhere to some newfangled principle (e.g. “run only one process per container“) – I suggest we think through the engineering requirements to make an informed and intelligent decision. Whether or not all applications should be broken into multiple containers – containerizing them should, at the very least, make your life easier by providing you with a simpler deployment strategy.

Let’s pause (briefly) on the analysis of our example application and do some design thinking:

  1. The JVM is multi-threaded, so you are not necessarily running multiple Unix/Linux processes. In fact, I think this is rightfully confusing to engineers coming from the Java world. Historically, Java developers actually liked having multiple applications running in the same JVM; doing so can, in practice, save quite a bit of memory at scale. Furthermore, web application servers like Tomcat were built from the ground up to support running multiple applications in a single JVM. That’s actually one of the main differences between running a simple Java program versus a Java EE application (…which could be comprised of multiple threads and multiple different programs).
  2. In reality, many applications use multiple processes per container. The Apache Web Server prefork and MPM modules both use multiple processes within a container. Modern web applications featuring event driven programming or the reactor pattern (take, for example, Nginx) actually fire off many sub processes. I would argue, the whole idea of FastCGI, AIO, and the reactor pattern is to offload work to other processes (or threads) and let the kernel handle the I/O. The Linux kernel is quite good at scheduling sub processes. Kubernetes/Swarm and individual Docker containers are not as good at this. Processes (and threads) are about kernel resource allocation; containers are about cluster resource allocation.
  3. The run one process per container “best practice” is widely cited as a “principle” but sounds a lot (more) like philosophy. As an engineer, I want to understand the technical components and make logical decisions. I would argue that this “best practice is not even universally agreed upon and its over-application stems from a wide-spread lack of understanding with respect to how Unix works.
  4. Linux Containers have historically come in many forms and many actually recommend running multiple processes per container. What makes Docker containers any different? Linux containers are essentially the clone() system call, SELinux, and Cgroups; whether they are of a LXC or Docker (through libcontainer) type is mostly irrelevant as the Linux kernel itself “does” the process isolation.
  5. There are valid times when processes communicate over sockets, through files, over the network, etc. – and – each approach has its own pros and cons. A given application’s approach to communication will (most certainly) have an impact on whether or not you want to break your application into multiple containers.
  6.  Separation of code, configuration and data will also play into your ability to break your application into multiple containers (or not). If your application has good separation of code, configuration and data, it should be easy to decompose the application. If your application is very old and not well understood – it may make (unwanted) changes to your file system and could be very difficult to decompose. Note that this is OK (!), you don’t have to re-write your application to containerize it. You can still get the benefits of the Docker container format by putting your application into a single container. You will then be able to easily move it around (using a registry server) and deploy it (using docker run).

OK, with some Unix / application 101 out of the way, let’s get back to analyzing our example Java application:

  1. The two Java components as described above seem to be doing very different things. One component is a web front-end, the other is an API server. Since these components are doing different things (i.e they are indeed different services), there is little chance that there would be a performance benefit from being in the same JVM (…though, of course, I can’t be 100% certain without actually testing performance).
  2. These two applications communicate using a REST API (…instead of using sockets, shared memory or files, etc.).
  3. Generally, if an application contains an API layer and a front-end layer, it can be useful to scale these independently. For example, if the API is also consumed by a mobile application it might be useful to scale it up and down with user load, where the web front-end may not need to be scaled. Conversely, if I scale the web front-end, I may also need to scale the API server portion, but I may only need one more API server for every five web front-ends. Long story short, scaling logic can be complicated and being able to scale these independently with something like Kubernetes could be very useful.

Based on these three observations alone – I would recommend breaking these two components into separate containers. I would also recommend using container orchestration such as Kubernetes or OpenShift to wire the services together. I would not base this decision on “principles” or supposed “best practices”, I would base it on the application architecture and some form of informed reasoning.

Here’s where things get wild… I submit (to you) a new “best practice”, drum roll please:

…yes, if your application / service has good separation of code, configuration, and data, installs cleanly (as installer scripts can make this whole process difficult), and features a clean communication paradigm – it does make sense to break the application up / allocate one service per container.

Fundamentally, I suggest that we all begin to think more rationally about how to put applications into containers. Let’s also realize that containerization is more than just philosophy, it’s about solving technical pain(s). I love containers and I want to use them (all of the time) – but I also want to do so in an intelligent and informed manner. I’m positive that people have opinions and ideas about containerizing applications – I encourage you to share your thoughts in the comments section below.

  1. Great topic. It’s important to ask “why” about these things. One process per container commandments have been on my mind, because I’m partial to having systemd available inside the container, to better fit already-packaged apps. Maybe I need to fight that urge, but maybe not!:)

    Of your three-point analysis of the example Java app, it looks like points 1 and 2 are reasons why you *could* separate the services, rather than reasons why you *should*. Point 3 is a good one, though the services have to “expect” to scale separately.

    I’m interested in reading more about this topic. Also on the service separation front, it’d be interesting to explore, in the kubernetes context, issues around services living in the same pod, or in separate pods.

    1. So, I actually agree with you 100%. I am an loud advocate of right tool for the job. A agree, point 1 and 2 are what I would consider “valid” things to think about when evaluating whether you should break an application up or not. They are not strong reasons in and of themselves. Point 3 is a valid business/technical reason. I am also a big advocate of SystemD in the container because it will be years, and years, and years until most or applications can easily be broken up in to services/separate containers. I have even spoken with customers that are putting things like network analysis tools or Java Application Server which require installer scripts in a containers. Once you have the container bug, you realize that the Docker format makes your life easier, but you do NOT have to buy into microservices too.

      We had a discussion today of PHP and FPM. They talk over port 9000/tcp but that is a not a “reason” to break them into separate containers. They “may” need to scale (capacity) independently, but they may not. The Linux kernel is really good at scheduling processes very quickly to handle load (vertically). Kubernetes is really good at scheduling processes across a cluster (horizontally). We have to be careful to understand that and not just break the application up automatically because of some “principle” declared by the container gods.

      More things to evaluate would be:
      1. Horizontal Scale: how much does my application need to scale. Is it Google, where it must scale horizontally because CPU and Memory requirements are such that they won’t fit in a single computer (oh, Google, I remember your brutal distributed systems interviews)
      2. Vertical Scale: Does you application need to instead sale up and down in microseconds within a small upper and lower bounds.
      3. Jitter: How quickly does your application need to scale. Obviously, most workloads do NOT need a separate container for each user request. Even Google’s infrastructure typically creates a container for each user session, not request. Lambda is a particular edge case where a container may or may not be created for each request. I am skeptical.

      The list goes on and on…..

      1. I thought about another reason to be careful. Scripting languages use virtual machines. Imaging 10000 Java applications. We can run them on 100 JMVs or 10000 JVMs all running in separate containers. There are going to be wildly different CPU and Memory requirements at scale. The overhead of 10K JVMs is not light, nor is it with Ruby, or PHP, or anything else.

        Some of this overhead can be mitigated temporally, if these services do not need to run all of the time, but again, we end up in deep analysis of when to break things up and when not to. I am thoroughly convinced that it is ridiculous to say that you should break up every application even if you can do it easily. There are a lot of variables to think about. Here’s to blazing a trail into the future!

  2. Great article! When migrating Apps to Containers this all has to be evaluated. So many times I hear 1 container per process but as you stated one size sometimes doesn’t fit all. Thanks for the tips.

  3. In my case, I also believe that the one process per-container sounds a bit like religion. I like to have the ability to run several processes sometimes in my containers and I also like to avoid the dreaded missing process 1 problem which leads to zombie processes in containers. As to the separation into multiple containers, I go by the oldest pattern: Separate what varies from what stays the same, this time with the objective of optimizing deployment strategy. I do not want moving things that can fail at deployment time, so that sort of delineates the hinges. I isolate infrastructure and application into their own containers. I also use container inheritance putting common elements that do not change that often into a base container and the app churn inherits from it. So, I get deployment benefits, like no “install” code running in deployments, or no changes to the infrastructure at the expense of more containers and some orchestration complexity.

  4. Hey Scott, great post. I think one element that you’ve perhaps neglected to include is that of deployment cadence and component custodianship as a potential delineation: if you have one component (say the web UI) that changes with great frequency, and a backend API that has a much slower rate of change, then that can also be an indicator that they could be separated. This becomes even more the case when there are different teams responsible for each component, as then your delivery chain hygiene becomes even more critical.

    1. Thank you for your comment and I agree. So, originally, I had purposefully ignored that reason, but may go back and edit the article. Originally, I ha several reasons:

      1. With Java it’s much less clear why this would be helpful in development. With PHP, Ruby, or Python, I think the reasoning is more clear. I figured I already had enough nuance with the discussion around Java and processes:-)

      2.I was thinking if I got toodeep into developrr reasons, this article might better be placed on DeveloperBlog than RHELBlog😉

      3. It’s a hair easier to infer scaling pain than developer pain from the original email that drove me to write this.

      All that said, I now realiz there is barely ANY canonical literature on when to break an apllication into multiple containers. I have received several comments through Twitter, etc brining up this lavk of developer reasoning:-) I am now convinced that I should edit for posterity’s sake:-)

  5. An old network-admin motto says: never forget layer 1!
    Now, I add: never forget IPC (socket, shared-memory and files)! Performance of APIs on the network cannot be enough for you.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s