Build Cache Today and Tomorrow

Dori Exterman reading time: 6 minutes

October 19, 2023

It may seem counterintuitive, but sometimes not running something can help you be faster. It’s a guiding principle for how web browsers work. Instead of loading every asset on a page, cached elements are already there – no need to re-load them each time you visit a website or webpage. Your page loads faster, the browser does less work, and the end result is the same.

It is always faster to NOT run something than to run something faster.

This is the same rule guiding caching in software development. In this case, caching offers several key benefits:

It noticeably speeds up development.
It lowers costs. This is especially true when talking about using cloud resources for computing, but it applies when completing builds on-prem. It even extends to CI server licenses, since each CI server can complete more builds in parallel.
It encourages developers to adopt best practices frequently avoided due to long build times, such as working with feature branches and branching frequently, introducing small commits each time, frequently pulling changes from the repository, and more.
From a CI perspective, shorter build times mean shifting left to build per commit, faster response times to developers, better coping with crunch times (the more builds executed, the “hotter” and more efficient the cache is), faster time for bug resolution, time to market, etc.

What current caching approaches for software builds are missing

Caching can happen on several levels, with different degrees of impact on your performance. The more granular the caching is (e.g., caching an entire section of a code base vs a few lines vs a script), the better your performance and the more costs are reduced.

Therefore, when you think about caching build artifacts, it’s great to be able to cache and reuse final artifacts like executables and libraries, but you should think smaller. Your cache should also include intermediate artifacts – C++ obj files, debug information, and other aspects of your build including unit test outputs and custom build commands.

The burden of dependencies

One of the driving principles of caching is that to successfully reuse a cached artifact during compilation, the current task must have the exact relevant input dependencies as a previously cached output artifact.

To achieve this, each task that places an artifact in the cache must associate all the input dependencies that led to the creation of the cached artifact. This is usually done by creating a unique hash ID that represents all the task dependencies, including input files, environment variables, command line parameters and other inputs. This Hash ID will represent all the dependencies that lead to a specific artifact.

Any tasks that are executed will be able to reuse previously cached artifacts only if all its relevant input dependencies are identical to a hash that represents a previously executed task.

Common approaches to this requirement create a burden on the user and the tool/compiler we are looking to cache:

User burden – To achieve caching, some build tools require users to explicitly include in their build scripts all the input dependencies of each task that the build will execute. Doing this requires a lot of work determining and explicitly writing the entire product dependencies. Moreover, this approach requires deep knowledge of the product’s structure and dependencies as well as forcefully maintaining these dependencies by developers who introduce new code. While this approach can be more feasible in newly created products, going through this process for large projects can be difficult and can lead to build errors, especially during on-going development that continuously need to enforce these explicit declarations.
Tool burden – some tools include a run-time service that provides these input dependencies, such as compilers’ pre-processing functions. Compilers that have pre-processing capabilities let users determine the input dependencies of a compilation task. The problem with this approach is that it requires implementation per compiler. Not all tools have pre-processing capabilities (imagine you’d like to cache unit tests) and the pre-processing execution just to determine task dependencies, when performed on each compiled task, can create a compute burden on the build machine that extends build times. These limitations become much more complex when looking to share the cache between different hosts, such as a team of developers sharing cached artifacts between them or a group of continuous integration nodes.

Incredibuild’s patented approach to caching

Incredibuild’s unique approach to build cache is based on the foundation of the platform’s robust, field-proven parallel distribution technology – low-level system-hooks that are injected into processes, similar to how anti-virus software works. This allows incredibuild to automatically map out every file read, or other inputs accessed by a process, making it seamless and generic to determine task dependencies and lifting the burden off both the user and the tool.

Here’s how it works:

A Process is executed (let’s use a C++ compilation process for example).
Incredibuild injects system hooks into the process allowing it to monitor all the inputs that are being used by the compilation process (files folders, registry flags, environment variables, etc).
While the compilation process is running, Incredibuild automatically determines all the inputs the process required and everything the process created (the outputs).
All inputs are hashed, making a correlation between the inputs and the outputs they generated. Both the hash and the outputs are then placed in the cache, the hash serving as the index ID of these specific task outputs.
The power of the cache is then amplified by sharing it between multiple users across the network, so they can reuse the outputs created anywhere in the network instead of rebuilding it again.

Using system hooks to seamlessly track process dependencies for caching purposes is an entirely new approach to caching (the U.S. Patent Office recognized this approach as technologically innovative enough to deserve a patent).

How this looks in Visual Studio

Let’s use an example from Visual Studio C++ to show how Incredibuild works. Caching will be utilized on a ‘solution level’, a ‘project level’ and a ‘unit (file) level.’ This means that if nothing has changed in a project, the entire project’s artifact can be re-used without rebuilding any of its units (In a large C++ application there are typically hundreds of units within many projects within a solution). Having more granular unit-level caching (in C++ it means caching each compiled unit’s “obj” file) means that even if the final artifact’s dependencies have changed due to a change in a single unit, only the unit that has changed will need to be re-executed, while the rest of the unit’s outputs can be retrieved from the cache, resulting in much faster build time with less compute power.

A vision for the future

This unique approach to generic, “behind-the-scenes” caching unlocks additional capabilities for the future of the dev ecosystem. The field is highly fragmented with many tools being used as part of large build execution – various compilers, build systems, software language tools, test frameworks and more. Incredibuild’s generic approach to caching will allow us to offer a single holistic caching solution for all the tools being used as part of a build that can benefit from caching services. This also lets our customers accelerate:

Additional software languages and compilers
Additional compute use cases like various testing frameworks, code analysis, and more
Shorter build workloadsof languages that compile faster like C#
Support sequential, non-parallel workloads
Users’ custom steps that are part of the CI process like custom build steps, build scripts, homegrown tools, and more

Having a holistic solution to caching that can benefit many different use cases means that every feature being developed around the caching services will serve all the caching use cases. This approach opens a lot of use cases and justifies the large investment we envision for such a holistic, robust cache.

You can learn more about Incredibuild’s Build Cache here.

Dori Exterman

An expert software developer and product strategist, Dori Exterman has 20 years of experience in the software development industry. As CTO of Incredibuild, he directs the company's product strategy and is responsible for product vision, implementation, and technical partnerships. Before joining Incredibuild, Dori held a variety of technical and product development roles at software companies, with a focus on architecture, performance, advanced technologies, DevOps, release management and C++. He is an expert and frequent speaker on technological advancement in development tools.

Cookie	Duration	Description
ARRAffinity	session	ARRAffinity cookie is set by Azure app service, and allows the service to choose the right instance established by a user to deliver subsequent requests made by that user.
ARRAffinitySameSite	session	This cookie is set by Windows Azure cloud, and is used for load balancing to make sure the visitor page requests are routed to the same server in any browsing session.
cf_use_ob	past	Cloudflare sets this cookie to improve page load times and to disallow any security restrictions based on the visitor's IP address.
cookielawinfo-checbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category .
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
__cf_bm	30 minutes	This cookie, set by Cloudflare, is used to support Cloudflare Bot Management.
bcookie	2 years	LinkedIn sets this cookie from LinkedIn share buttons and ad tags to recognize browser ID.
bscookie	2 years	LinkedIn sets this cookie to store performed actions on the website.
lang	session	LinkedIn sets this cookie to remember a user's language setting.
lidc	1 day	LinkedIn sets the lidc cookie to facilitate data center selection.
UserMatchHistory	1 month	LinkedIn sets this cookie for LinkedIn Ads ID syncing.

Cookie	Duration	Description
_gat	1 minute	This cookie is installed by Google Universal Analytics to restrain request rate and thus limit the collection of data on high traffic sites.
_uetsid	1 day	Bing Ads sets this cookie to engage with a user that has previously visited the website.
_uetvid	1 year 24 days	Bing Ads sets this cookie to engage with a user that has previously visited the website.

Cookie	Duration	Description
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_gat_UA-8508435-1	1 minute	A variation of the _gat cookie set by Google Analytics and Google Tag Manager to allow website owners to track visitor behaviour and measure site performance. The pattern element in the name contains the unique identity number of the account or website it relates to.
_gcl_au	3 months	Provided by Google Tag Manager to experiment advertisement efficiency of websites using their services.
_gid	1 day	Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.
_hjAbsoluteSessionInProgress	30 minutes	Hotjar sets this cookie to detect the first pageview session of a user. This is a True/False flag set by the cookie.
_hjFirstSeen	30 minutes	Hotjar sets this cookie to identify a new user’s first session. It stores a true/false value, indicating whether it was the first time Hotjar saw this user.
_hjIncludedInPageviewSample	2 minutes	Hotjar sets this cookie to know whether a user is included in the data sampling defined by the site's pageview limit.
_hjIncludedInSessionSample	2 minutes	Hotjar sets this cookie to know whether a user is included in the data sampling defined by the site's daily session limit.
_hjTLDTest	session	To determine the most generic cookie path that has to be used instead of the page hostname, Hotjar sets the _hjTLDTest cookie to store different URL substring alternatives until it fails.
CONSENT	2 years	YouTube sets this cookie via embedded youtube-videos and registers anonymous statistical data.
MR	7 days	This cookie, set by Bing, is used to collect user information for analytics purposes.
utm_campaign	2 months	Google Ad Services sets this cookie to store session campaign value if present.
utm_content	2 months	This cookie is used for storing the session content value if present.
utm_source	2 months	This cookie is used to record from where the visitor came to the website orginally. This information is used by the website operator to know the efficiency of their marketing.
utm_term	2 months	This cookie is used to record from where the visitor came to the website orginally. This information is used by the website operator to know the efficiency of their marketing.
vuid	2 years	Vimeo installs this cookie to collect tracking information by setting a unique ID to embed videos to the website.

Cookie	Duration	Description
_fbp	3 months	This cookie is set by Facebook to display advertisements when either on Facebook or on a digital platform powered by Facebook advertising, after visiting the website.
_mkto_trk	2 years	This cookie, provided by Marketo, has information (such as a unique user ID) that is used to track the user's site usage. The cookies set by Marketo are readable only by Marketo.
fr	3 months	Facebook sets this cookie to show relevant advertisements to users by tracking user behaviour across the web, on sites that have Facebook pixel or Facebook social plugin.
IDE	1 year 24 days	Google DoubleClick IDE cookies are used to store information about how the user uses the website to present them with relevant ads and according to the user profile.
MUID	1 year 24 days	Bing sets this cookie to recognize unique web browsers visiting Microsoft sites. This cookie is used for advertising, site analytics, and other operations.
personalization_id	2 years	Twitter sets this cookie to integrate and share features for social media and also store information about how the user uses the website, for tracking and targeting.
test_cookie	15 minutes	The test_cookie is set by doubleclick.net and is used to determine if the user's browser supports cookies.
utm_medium	2 months	This cookie is used to record from where the visitor came to the website orginally. This information is used by the website operator to know the efficiency of their marketing.
VISITOR_INFO1_LIVE	5 months 27 days	A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSC	session	YSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-id	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt.innertube::nextId	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.

Build Cache Today and Tomorrow

It is always faster to NOT run something than to run something faster.

What current caching approaches for software builds are missing

The burden of dependencies

Incredibuild’s patented approach to caching

How this looks in Visual Studio

A vision for the future

Dori Exterman

Table of Contents

Shorten your builds

Related Posts

6 minutes Platform Engineering vs DevOps: A Comprehensive Comparison

6 minutes These 4 advantages of caching are a game-changer for development projects

6 minutes The Cloud that Could

Cookie	Duration	Description
_hjSession_2537450	30 minutes	No description
_hjSessionUser_2537450	1 year	No description
AnalyticsSyncHistory	1 month	No description
BIGipServersn-mch-v2-80	session	No description
BIGipServersn02web-nginx-app_https	session	No description
ib_last_referrer	2 months	No description
incap_ses_1319_2167377	session	No description
li_gc	2 years	No description
muc_ads	2 years	No description
nlbi_2167377	session	No description
original_req_url	past	No description
referrer66_00f	1 month	No description
visid_incap_2167377	1 year	No description
visitorId	1 year	No description