Build Cache Today and Tomorrow

Joseph SibonyJoseph Sibony
Published On:
October 19, 2023
Estimated reading time:
1 minute

It may seem counterintuitive, but sometimes not running something can help you be faster. It’s a guiding principle for how web browsers work. Instead of loading every asset on a page, cached elements are already there – no need to re-load them each time you visit a website or webpage. Your page loads faster, the browser does less work, and the end result is the same.

It is always faster to NOT run something than to run something faster.

This is the same rule guiding caching in software development. In this case, caching offers several key benefits:

  1. It noticeably speeds up development.
  2. It lowers costs. This is especially true when talking about using cloud resources for computing, but it applies when completing builds on-prem. It even extends to CI server licenses, since each CI server can complete more builds in parallel.
  3. It encourages developers to adopt best practices frequently avoided due to long build times, such as working with feature branches and branching frequently, introducing small commits each time, frequently pulling changes from the repository, and more.
  4. From a CI perspective, shorter build times mean shifting left to build per commit, faster response times to developers, better coping with crunch times (the more builds executed, the “hotter” and more efficient the cache is), faster time for bug resolution, time to market, etc.

What current caching approaches for software builds are missing

Caching can happen on several levels, with different degrees of impact on your performance. The more granular the caching is (e.g., caching an entire section of a code base vs a few lines vs a script), the better your performance and the more costs are reduced.

Therefore, when you think about caching build artifacts, it’s great to be able to cache and reuse final artifacts like executables and libraries, but you should think smaller. Your cache should also include intermediate artifacts – C++ obj files, debug information, and other aspects of your build including unit test outputs and custom build commands.

The burden of dependencies

One of the driving principles of caching is that to successfully reuse a cached artifact during compilation, the current task must have the exact relevant input dependencies as a previously cached output artifact.

To achieve this, each task that places an artifact in the cache must associate all the input dependencies that led to the creation of the cached artifact. This is usually done by creating a unique hash ID that represents all the task dependencies, including input files, environment variables, command line parameters and other inputs. This Hash ID will represent all the dependencies that lead to a specific artifact.

Any tasks that are executed will be able to reuse previously cached artifacts only if all its relevant input dependencies are identical to a hash that represents a previously executed task.

Common approaches to this requirement create a burden on the user and the tool/compiler we are looking to cache:

  • User burden – To achieve caching, some build tools require users to explicitly include in their build scripts all the input dependencies of each task that the build will execute. Doing this requires a lot of work determining and explicitly writing the entire product dependencies. Moreover, this approach requires deep knowledge of the product’s structure and dependencies as well as forcefully maintaining these dependencies by developers who introduce new code. While this approach can be more feasible in newly created products, going through this process for large projects can be difficult and can lead to build errors, especially during on-going development that continuously need to enforce these explicit declarations.
  • Tool burden – some tools include a run-time service that provides these input dependencies, such as compilers’ pre-processing functions. Compilers that have pre-processing capabilities let users determine the input dependencies of a compilation task. The problem with this approach is that it requires implementation per compiler. Not all tools have pre-processing capabilities (imagine you’d like to cache unit tests) and the pre-processing execution just to determine task dependencies, when performed on each compiled task, can create a compute burden on the build machine that extends build times. These limitations become much more complex when looking to share the cache between different hosts, such as a team of developers sharing cached artifacts between them or a group of continuous integration nodes.

Incredibuild’s patented approach to caching

Incredibuild’s unique approach to build cache is based on the foundation of the platform’s robust, field-proven parallel distribution technology – low-level system-hooks that are injected into processes, similar to how anti-virus software works. This allows incredibuild to automatically map out every file read, or other inputs accessed by a process, making it seamless and generic to determine task dependencies and lifting the burden off both the user and the tool.

Here’s how it works:

  • A Process is executed (let’s use a C++ compilation process for example).
  • Incredibuild injects system hooks into the process allowing it to monitor all the inputs that are being used by the compilation process (files folders, registry flags, environment variables, etc).
  • While the compilation process is running, Incredibuild automatically determines all the inputs the process required and everything the process created (the outputs).
  • All inputs are hashed, making a correlation between the inputs and the outputs they generated. Both the hash and the outputs are then placed in the cache, the hash serving as the index ID of these specific task outputs.
  • The power of the cache is then amplified by sharing it between multiple users across the network, so they can reuse the outputs created anywhere in the network instead of rebuilding it again.

Using system hooks to seamlessly track process dependencies for caching purposes is an entirely new approach to caching (the U.S. Patent Office recognized this approach as technologically innovative enough to deserve a patent).

How this looks in Visual Studio

Let’s use an example from Visual Studio C++ to show how Incredibuild works. Caching will be utilized on a ‘solution level’, a ‘project level’ and a ‘unit (file) level.’ This means that if nothing has changed in a project, the entire project’s artifact can be re-used without rebuilding any of its units (In a large C++ application there are typically hundreds of units within many projects within a solution). Having more granular unit-level caching (in C++ it means caching each compiled unit’s “obj” file) means that even if the final artifact’s dependencies have changed due to a change in a single unit, only the unit that has changed will need to be re-executed, while the rest of the unit’s outputs can be retrieved from the cache, resulting in much faster build time with less compute power.


A vision for the future

This unique approach to generic, “behind-the-scenes” caching unlocks additional capabilities for the future of the dev ecosystem. The field is highly fragmented with many tools being used as part of large build execution – various compilers, build systems, software language tools, test frameworks and more. Incredibuild’s generic approach to caching will allow us to offer a single holistic caching solution for all the tools being used as part of a build that can benefit from caching services. This also lets our customers accelerate:

  • Additional software languages and compilers
  • Additional compute use cases like various testing frameworks, code analysis, and more 
  • Shorter build workloadsof languages that compile faster like C#
  • Support sequential, non-parallel workloads
  • Users’ custom steps that are part of the CI process like custom build steps, build scripts, homegrown tools, and more

Having a holistic solution to caching that can benefit many different use cases means that every feature being developed around the caching services will serve all the caching use cases. This approach opens a lot of use cases and justifies the large investment we envision for such a holistic, robust cache.

You can learn more about Incredibuild’s Build Cache here.