Jakob's Website

Ccache is my favourite build system

To build C and C++ projects, you usually set up a build system like Make or CMake. Typically, these track dependencies between source files and compilation targets to be able to rebuild only targets whose transitive dependencies have been modified, based on modification time, thereby speeding up compilation over just compiling everything from scratch. In practice, you still perform some duplicate compilations, e.g. when doing a clean build, when building different versions of the same software, etc.

Ccache solves this problem by wrapping the compiler and caching its outputs. When ccache detects that the same input would be compiled again, all compiler output is taken from the cache instead.
“But ccache is not a build system” I hear you say.

The pitch

But if ccache is already taking care of recompiling only the necessary files, can’t we ditch the aforementioned build tools entirely? Yes and the result is my absolute favourite build system:

(find -name '*.c' | xargs --max-args=1 ccache gcc -c) && gcc *.o -o main

This bash script uses xargs to invoke ccache for each source file in parallel. At the end gcc is invoked to link all the object files together, which cannot be cached by ccache.

If these files are compiled for the first time, they are all compiled in parallel. If you run the script again, all object files are quickly pulled from cache (in parallel). If you change a single .c file, only that one is recompiled. The rest is pulled from cache. If you change a single header, only the sources including it will be recompiled.

Unfortunately, ccache does not cache anything when multiple files are passed to one command1, so the even simpler ccache gcc *.c -o main does not work as intended.

Is this any good?

I really like it. I’ve been using it for all my smaller projects, because it is so simple. The question “How do I add this compile flag?” is easily answered.

Upsides:

Downsides:

Obviously, a traditional build system provides many more features than xargs + ccache, such as the ability to define compilation flags in a portable way, find libraries in your system, and declare complex dependency graphs, just to name a few. They should also perform strictly better when using ccache as well, due them only invoking ccache for the needed files.

For small projects though, ccache is fast enough. To give you a rough idea, here are some basic hyperfine benchmarks based on my randomwalk project with 8 files.

# compile, link, full build, cmake build, cmake configure
hyperfine --warmup=10 --min-benchmarking-time=10 'xargs --max-args=1 --max-procs=50 ccache clang++ -Wall -std=c++26 -O2 -g -Ivendor/imgui -Ivendor/imgui/backends $(pkg-config --cflags sdl3) -c <sources.txt'  'clang++ -Wall -std=c++26 -O2 -g -fuse-ld=mold $(pkg-config --libs --cflags sdl3) -o main ./*.o' ./build.sh 'cmake --build build -j' 'gio trash build; mkdir -p build && pushd build && cmake .. -DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_COMPILER_LAUNCHER=ccache -DCMAKE_SHARED_LINKER_FLAGS="-fuse-ld=mold" -DCMAKE_EXE_LINKER_FLAGS="-fuse-ld=mold" -DCMAKE_VERBOSE_MAKEFILE=ON; popd'
benchmark command
compile
6.1 ms ± 0.2 ms
link
27.8 ms ± 0.6 ms
full build
42.9 ms ± 0.8 ms
cmake build
39.6 ms ± 0.7 ms
cmake configure
259.5 ms ± 2.5 ms
benchmark results
command,mean,stddev,median,user,system,min,max
xargs --max-args=1 --max-procs=50 ccache clang++ -Wall -std=c++26 -O2 -g -Ivendor/imgui -Ivendor/imgui/backends $(pkg-config --cflags sdl3) -c <sources.txt,0.006066601464426004,0.00019500747852830948,0.006052739860000001,0.0111329819363762,0.018234544204702628,0.005603244360000001,0.00685912636
clang++ -Wall -std=c++26 -O2 -g -fuse-ld=mold $(pkg-config --libs --cflags sdl3) -o main ./*.o,0.027786096779540224,0.0005768461943693684,0.027717892860000003,0.007775711034482753,0.006169721149425283,0.026545809360000002,0.031001498360000004
./build.sh,0.04292396277176469,0.0008201344898828762,0.042812791360000005,0.019651706470588233,0.02745959843137255,0.041018994360000005,0.048181378360000006
cmake --build build -j,0.039590598263999986,0.000667556575374093,0.03950835536,0.020420704,0.018985583999999993,0.037859186360000005,0.04222505436000001
"gio trash build; mkdir -p build && pushd build && cmake .. -DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_COMPILER_LAUNCHER=ccache -DCMAKE_SHARED_LINKER_FLAGS=""-fuse-ld=mold"" -DCMAKE_EXE_LINKER_FLAGS=""-fuse-ld=mold"" -DCMAKE_VERBOSE_MAKEFILE=ON; popd",0.259492938991579,0.002473031131694414,0.25948769986,0.14207899052631573,0.10164378315789473,0.24995149636000003,0.26429419836

In my opinion, using only ccache delivers pretty compelling performance for how simple and flexible it is.

Prior art

The idea of caching compilation results is certainly not new. Zig’s compiler, as a recent example, provides an even simpler experience. You can simply pass all sources to zig as in zig cc a.c b.c and it will cache the compilation of all individual sources.

For simple enough projects, this obviates the need for a Makefile or other build system.
—Andrew Kelley, zig cc: a Powerful Drop-In Replacement for GCC/Clang

This is great! ccache is just slightly more compatible in my experience, especially with C++ compilers.

Further recommendations

use a fast linker

Incremental builds require linking anew. With the build script given above, even if nothing changed, the linker is invoked. Therefore, linking speed is critical for a quick build cycle. I am a happy user of mold. In GCC and Clang you can set it with -fuse-ld=mold.

limit parallelism

I personally like not bothering with this in the spirit of: declare your dependencies (none) and let the scheduler do the rest. Still, it probably makes sense to limit xargs’s parallelism with eg. --max-procs=50 to avoid excessive context switching when compiling larger projects.

I would still recommend keeping it quite a bit higher than your core count. Firstly, on cache-hits, ccache performs plenty of non-CPU work, so it can benefit from having more processes than cores. Secondly, even when actually compiling, having more processes can finish the build sooner, because tasks are completed more evenly.

max-procs=2
c0
a.c
c.c
c1
b.c
max-procs=3
c0
a.c
b.c
c1
b.c
c.c
a.c
1.5 s
1 s
1.5 s
2 s
Suppose having to compile three files, each taking 1 s to compile, on two cores. The figure shows two hypothetical scheduling of this job. One, where two and one where three concurrent processes (--max-procs) are allowed. With max-procs=2 this will take 2 s. With max-procs=3, the build can finish in ~1.5 s, disregarding context-switching overhead.

A more integrated solution could be smarter about how much parallelism to employ for cache retrieval and compilation respectively.

use a copy-on-write compressed filesystem

This is my recommendation in general. Similar to other cache implementations, ccache has the ability to use a filesystem’s native copy-on-write to duplicate a file instantly without actually duplicating any data on disk. Ccache can also compress cached files to save space, though not when using copy-on-write, since ccache needs to provide the uncompressed content. To not prevent compression, copy-on-write is disabled by default in ccache. Enable it by putting the following in your ~/.config/ccache/ccache.conf:

file_clone = true

This disables ccache’s compression, but your filesystem can still compress the files transparently. These two filesystem features work well in conjunction to improve the speed and storage requirements of ccache. Btrfs has been working well or me.

throw it into build.sh

I usually create at least one build.sh file in each project directory where I save the compile command. The input files and compile flags vary depending on the project.

The example above is stripped down for clarity. A more realistic example looks like this:

#!/bin/bash
(find . -name '*.c' -print0 | xargs -0 --max-args=1 --max-procs=50 ccache gcc -Wall -O2 -g -c) && gcc -fuse-ld=mold ./*.o -o main
rm ./*.o

Closing thoughts

I love the simplicity of the xargs + ccache build system. Ccache is powerful enough to avoid compilation in the majority of cases where it is possible. A build script is easy to maintain and it is trivial to modify your compile command.

It would be great to see ccache add support for multiple files to make this build system even simpler. Integrated multiple file support would even allow optimizing the parallelism issue hinted at above. I also considered implementing this as a small CLI atop ccache, with the same interface, that automatically separates files from flags. However, this would be a tough sell, considering you can simply use xargs instead.

I hope you enjoyed this blog post and give ccache a try :) Thanks a lot to Lars Quentin, Hossein Biniaz, and my other reviewers.