Graham Christensen

Optimising Docker Layers for Better Caching with Nix

posted on October 01 2018

Nix users value its isolated, repeatable builds and simple sharing of development environments. Nix makes it easy to go back in time and rebuild software from years ago without issue.

At the same time, the value of the container ecosystem is huge. Tying in to the schedulers, orchestration, and monitoring is very valuable.

Nix has been able to generate Docker images for several years now, however the typical approach to layering with Nix is to generate one fat image with all of the dependencies. This fat image offers no sharing, is slow to build, upload, and download.

In this post I talk about how I fix this problem and use Nix to automatically create multi-layered Docker images, allowing a high amount of caching between images.

Docker uses layers

Docker’s use of layering is well known, and its benefits are undeniable: sharing a “base” system is a simple abstraction which allows extending a well known image with your own code.

A Docker image is a sequence of layers, where each member is a filesystem diff, adding and removing files from its parent member:

Efficient layering is hard because there are no rules

When there are no restrictions on what a command will do, the only way to fully capture its effects is to snapshot the full filesystem.

Most package managers will write files to shared global directories like /usr, /bin, and /etc.

This means that the only way to represent the changes between installing package A and installing package B is to take a full snapshot of the filesystem.

As a user you might manually create rules to improve the behavior of the cache: add your code towards the end of a Dockerfile, or install common libraries in a single RUN instruction, even if you don’t want them all.

These rules make sense: If a Dockerfile adds code and then installs packages, Docker can’t cache the installation because it can’t know that the package installation isn’t influenced by the code addition. Docker also can’t know that installing package A has nothing to do with package B and the changes are separately cachable.

With restrictions, we can make better optimisations

Nix does have rules.

The most important and relevant rule when considering distribution and Docker layers is:

A package build can’t write to arbitrary places on the disk.

A build can only write to a specific directory known as $out, like /nix/store/ibfx7ryqnqf01qfzj4v7qhzhkd2v9mm7-file-5.34. When you add a new package to your system, you know it didn’t modify /etc or /bin.

How does file find its dependencies? It doesn’t – they are hard-coded:

$ ldd /nix/store/ibfx7ryqnqf01qfzj4v7qhzhkd2v9mm7-file-5.34/bin/file

This provides great, cache-friendly properties:

  1. You know exactly what path changed when you added file.
  2. You know exactly what paths file depends on.
  3. Once a path is created, it will never change again.

Think graphs, not layers

If you consider the properties Nix provides, you can see it already constructs a graph internally to represent software and its dependencies: it natively has a better understanding of the software than Docker is interested in.

Specifically, Nix uses a Directed Acyclic Graph to store build output, where each node is a specific, unique, and immutable path in /nix/store:

Or to use a real example, Nix itself can render a graph of a package’s dependencies:

Flattening Graphs to Layers

In a naive world we can simply walk the tree and create a layer out of each path:

and this image is valid: if you pulled any of these layers, you would automatically get all the layers below it, resulting in a complete set of dependencies.

Things get a bit more complicated for a graph with a wider graph, how do you flatten something like Bash:

If we had to flatten this to an ordered sequence, obviously bash-interactive-4.4-p23 is at the top, but does readline-7.0p5 come next? Why not bash-4.4p23?

It turns out we don’t have to solve this problem exactly, because I lied about how Docker represents layers.

How Docker really represents an Image

Docker’s layers are content addressable and aren’t required to explicitly reference a parent layer. This means a layer for readline-7.0p5 doesn’t have to mention that it has any relationship to ncurses-6.1 or glibc-2.27 at all.

Instead each image has a manifest which defines the order:

  "Layers": [

If you have only built Docker images using a Dockerfile, then you would expect the way we flatten our graph to be critically important. If we sometimes picked readline-7.0p5 to come first and other times picked bash-4.4p23 then we may never make cache hits.

However since the Image defines the order, we don’t have to solve this impossible problem: we can order the layers in any way we want and the layer cache will always hit.

Docker doesn’t support an infinite number of layers

Docker has a limit of 125 layers, but big packages with lots of dependencies can easily have more than 125 store paths.

It is important that we still successfully build an image if we go over this limit, but what do we do with the extra layers?

In the interest of shortness, let’s pretend Docker only lets you have four layers, and we want to fit five. Out of the Bash example, which two paths do we combine in to one layer?

Smushing Layers

I decided the best solution is to combine the layers which are less likely to be a cache hit with other software. Picking the most low level, fundamental paths and making them a separate layer means my next image will most likely also share some of those layers too.

Ideally it would end up with at least glibc, and ncurses in separate layers. Visually, it is hard to tell if either readline or bash-4.4p23 would be better served as an individual layer. One of them should be, certainly.

My actual solution

My prioritization algorithm is a simple graph-based popularity contest. The idea is to weight each node more heavily the deeper and more references they have.

Starting with the dependency graph of Bash from before,

we first duplicate nodes in the graph so each node is only pointed to once:

we then replace each leaf node with a counter, starting at 1:

each node whose only children are counters are then combined with their children, and their children’s counters summed, then incremented:

we then repeat the process:

we repeat this process until there is only one node:

and finally we sort the paths in each popularity bucket by name to ensure the list is consistently generated to get the paths ordered by cachability:

This solution has properly put foundational paths which are most commonly referred to at the top, improving its chances of cache hit. The algorithm has also put the likely-to-change application right at the bottom in case the last layers need to be combined.

Let’s consider a much larger image. In this image, we set the maximum number of layers to 120, but the image has 200 store paths. Under this design the 119 most fundamental store paths will have their own layers, and we store the remaining 81 paths together in the 120th layer.

With this new approach of automatically layering store paths I can now generate images with very efficient caching between different images.

For a practical example of a PHP application with a MySQL database.

First we build a MySQL image:

# mysql.nix
  pkgs = import (builtins.fetchTarball {
    url = "";
    sha256 = "05a3jjcqvcrylyy8gc79hlcp9ik9ljdbwf78hymi5b12zj2vyfh6";
  }) {};
in pkgs.dockerTools.buildLayeredImage {
  name = "mysql";
  tag = "latest";
  config.Cmd = [ "${pkgs.mysql}/bin/mysqld" ];
  maxLayers = 120;
$ nix-build ./mysql.nix
$ docker load < ./result

Then we build a PHP image:

# php.nix
  pkgs = import (builtins.fetchTarball {
    url = "";
    sha256 = "05a3jjcqvcrylyy8gc79hlcp9ik9ljdbwf78hymi5b12zj2vyfh6";
  }) {};
in pkgs.dockerTools.buildLayeredImage {
  name = "grahamc/php";
  tag = "latest";
  config.Cmd = [ "${pkgs.php}/bin/php" ];
  maxLayers = 120;
$ nix-build ./php.nix
$ docker load < ./result

and export the two image layers:

$ docker inspect mysql | jq -r '.[] | .RootFS.Layers | .[]' | sort > mysql
$ docker inspect php | jq -r '.[] | .RootFS.Layers | .[]' | sort > php

and look at this, the PHP and MySQL images share twenty layers:

$ comm -1 -2 php mysql

Where before you wouldn’t bother trying to have your application and database images share layers, with Nix the layer sharing is completely automatic.

The automatic splitting and prioritization has improved image push and fetch times by an order of magnitude. Having multiple images allows Docker to request more than one at a time.

Thank you Target for having sponsored this work with Tweag in NixOS/nixpkgs#47411.


Graham works on NixOS.