From Checkpoint/Restore to Container Migration

The concept to save (i.e. checkpoint / dump) the state of a process, at a certain point in time, so that it may later be used to restore / restart the process (to the exact same state) has existed for many years. One of the most prominent motivations to develop and support checkpoint/restore functionality was to provide improved fault tolerance. For example, checkpoint/restore allows for processes to be restored from previously created checkpoints if, for one reason or another, these processes had been aborted.

Over the years there have been several different implementations of checkpoint/restore for Linux. Existing implementations of checkpoint/restore differ in terms of  “what level” (of the operating system) they are operating; the lowest level approaches focus on implementing checkpoint/restore directly in the kernel while other “higher level” approaches implement checkpoint/restore completely in user-space. While it would be difficult to unearth each and every approach /  implementation – it is likely fair to

Continue reading “From Checkpoint/Restore to Container Migration”