Docker build caching for .NET applications done right with dotnet-subset
Introduction
Containerization has become the industry standard for application packaging and deployment, and it is easy to understand why: isolation, reproducibility, ease of deployment, to name only a few of its advantages.
When authoring the Dockerfile
of your application, you have two choices:
- Building the application outside the Dockerfile and then copying the artifacts as an image layer
- Building the application inside the Dockerfile by running
dotnet publish
for .NET applications
Building inside the Dockerfile
has some advantages (deterministic builds, reproducibility on local dev machines, ...), but build speed isn't one of them. Fortunately, Docker offers some techniques to help us make the build faster, such as the build cache mechanism.
This feature is commonly used to optimize the dependencies' download/install step, so that it is only re-executed in subsequent builds if they were changed, which occurs way less often than source code changes.
Leveraging cache properly for the dependencies is straightforward for some languages/frameworks, but it can be a bit tricky to do it right for medium to large .NET applications. The goal of this article is to explain why, and describe a solution to this problem using the new open source .NET tool from Nimbleways created specially for this use case: dotnet-subset.
The problem
Let's step back and understand why we need Docker build caching in the first place. This is what a simple Dockerfile for a small .NET Web API looks like:
This Dockerfile
contains two build stages:
- The
publish
stage based on a full SDK image that builds the application - The
final
stage based on the smaller ASP.NET Core Runtime image that imports the artifacts from the previous stage and defines the entry point.
The first docker build executes all the instructions as there is no cache yet. On the second run, COPY . .
will compare the checksums of the copied files with the ones from the previous build. If they match, the subsequent instructions in the same stage may benefit from the cache if the instruction itself didn't change. If the checksums don't match, all caches are invalidated and subsequent instructions will execute. You can learn more about the caching behavior from the Docker documentation.
In our case that means that if no file in the project has changed, dotnet publish
's output from the previous run will be reused and our docker build will be extremely fast. But what happens if we change a C# source file ? Yes, you guessed right: the checksum changed, therefore, the dotnet publish
will be re-executed, and that's fine because we do want our code changes to be included in the new image. However, the dotnet publish
also does an implicit restore. Do all the dependencies need to be redownloaded/reinstalled when only C# source code files were changed ? Probably not.
That is why the official documentation provides a better Dockerfile
for .NET application.
The official recommended solution
Below is the Dockerfile of a simple ASP.NET Core project, taken from the official documentation:
Also from the same documentation:
In the preceding Dockerfile, the*.csproj
files are copied and restored as distinct layers. When thedocker build
command builds an image, it uses a built-in cache. If the*.csproj
files haven't changed since thedocker build
command last ran, thedotnet restore
command doesn't need to run again. Instead, the built-in cache for the correspondingdotnet restore
layer is reused.
Let's ignore the sln
file copy step because it is not required and was done mainly for convenience.
This Dockerfile
solves our previous problem by:
- copying the project descriptor (a MSBuild file with a
csproj
extension) - running
dotnet restore
- copying the remaining files
- running
dotnet publish
Why copy only the csproj
? It is where the NuGet dependencies are defined, and that is what the dotnet restore
needs to know what to do.
However, this solution suffers from some shortcomings:
Multi-project applications
Real-life .NET applications are often composed of multiple projects (ie: multiple csproj
files). The .NET team provides a Dockerfile
example for this scenario:
The suggested solution is to copy all the csproj
files manually while preserving the original folder structure. (In case you are wondering why globbing wasn't used to copy all the project files with one line while preserving folder structure, Docker doesn't support it.)
It works in most cases, but it requires that for every project dependency change in complexapp
or any of its transitive project dependencies, the Dockerfile
must be updated. For sizeable applications, you may end up with huge Dockerfile
like this one.
Some people have complained about this "laborious solution", while others came up with some hacky commands to automate this operation to some extent.
Note that if you are restoring a project that has a missing project dependency, for example libfoo
from our complexapp
, it will just skip it and won't fail:
1>_GetAllRestoreProjectPathItems:
Skipping project "/root/project/libfoo/libfoo.csproj" because it was not found.
NuGet-specific files
There are a couple of files that can alter the dotnet restore
behavior and thus should be copied along the csproj
files:
nuget.config
The nuget.config
file contains parameters such as HTTP proxy, trusted package signers and remote package repositories (you can find the full list here). These parameters can be mandatory for a successful dotnet restore
.
In our case, there are two caveats to be aware of:
- On case-sensitive file systems, like in linux distributions,
dotnet
will check for these three casings in this order and use the first match :nuget.config
,NuGet.config
andNuGet.Config
- NuGet read its configuration from multiple
nuget.config
files. It will look for the computer and user configs, and also for config files present in all the folders between the projet base and its drive root. Values in all these files are combined following a specific order to define the final settings to be applied.
You know the drill now, all these files should be copied too for the dotnet restore
to behave as expected.
packages.lock.json
This is a lesser known feature of .NET: you can create lock files for your NuGet dependencies. As to why and when it can be useful, check the official documentation.
NuGet looks for the first file in the project base folder that matches in this order (as defined in NuGet's source code):
- The value of the property
NuGetLockFilePath
defined in thecsproj
file if it is not empty - The file
packages.<project_name>.lock.json
if it exists, where<project_name>
is thecsproj
file name without extension and with spaces replaced by underscores. - The file
packages.lock.json
if it exists
Custom logic for defining dependencies
Last but not least, NuGet dependencies can technically be defined in any file, not only the csproj
file.
MSBuild, the build engine and also the projet file format, provides a way to include a MSBuild file into another. This is heavily used by the platform to abstract away all the build logic (that is the reason why csproj
files are so minimalistic), and it can also be useful for developers to centralize project settings and/or define some common NuGet dependencies like Roslyn Analyzers.
So if you don't want to miss a dependency during dotnet restore
, you may need to copy all the files directly or transitively imported by any of the application projects.
To sum it up, optimizing the image build for dotnet restore
is not as simple as copying the .csproj
file first. There are few edge cases that should be addressed and most importantly maintained over the project lifetime …
… or we can use dotnet-subset
to handle it all for us 😃
The better solution
At Nimbleways, we don't settle for the "good enough" solution, we challenge ourselves to do things the right way. We weren't satisfied with the current solutions to do dotnet restore
caching in docker properly, so we created dotnet-subset
to achieve that.
What is dotnet-subset ?
dotnet-subset
is an open source .NET tool whose goal is to extract a subset of files from a root directory and copy it to a target directory. This subset is defined by the tool's arguments.
Project links:
Let's see it in action on the complexapp
sample:
Above is the result of running dotnet subset restore /source/complexapp/complexapp.csproj --root-directory /source/ --output /tmp/restore_subset/
Breaking down the command line:
dotnet subset
: dotnet-subset is invoked as a sub-command of thedotnet
CLIrestore
: the subset algorithm to use.restore
is currently the only supported algorithm./source/complexapp/complexapp.csproj
: the project or solution that needs to be restored--root-directory
: the directory from where the files will be copied--output
: the directory where the files needed for the restore will be copied, preserving the original structure.
The output directory /tmp/restore_subset
contains only the files that can impact dotnet restore
:
csproj
files ofcomplexapp
and all the projects it depends on directly and transitively- MSBuild files located under the root directory and that are imported by the copied
csproj
- Package lock files associated with copied
csproj
files nuget.config
files in copiedcsproj
's directories and all parent directories up to the root directory
Now that we have this superpower, how can we use it efficiently inside our Dockerfile
?
Docker cache + dotnet-subset = 🚀
As explained before, dotnet-subset
needs the whole application source code as input, which means we will need a COPY . .
before running it. This will invalidate the cache for the subsequent instructions in the same stage if any file changes, that is why we will be calling dotnet-subet
in its own stage, then we import its output from within the build
stage before running dotnet restore
.
Our previous Dockerfile
now becomes:
Tada ! Less COPY
instructions and more confidence in the reliability of our Dockerfile
.
Do you remember the huge Dockerfile
I mentioned earlier ? Let's appreciate how it became neater thanks to dotnet-subset
(PR link):
Conclusion
Identifying all the files impacting dotnet restore
is hard, maintaining this list is even harder. Miss one and you may deteriorate the docker cache quality or fail the docker build if you are lucky, or you may cause the application to crash at runtime with an obscure error if you are not.
dotnet-subset
helps you write optimized *and* reliable Dockerfiles
without the maintenance cost.
The tool is still in an early stage, awaiting your feedback to steer it in the right direction and make it better !