Software Release Management Process Tips| Incredibuild

Tips from a VP R&D who also manages the software release management process

Knowledge comes from experience, as most of you would likely agree. Today, I’m a CTO, but I put in the legwork, starting my career as a software engineer before moving to manage both development/QA teams and the release management pipeline. Throughout my career, I’ve worked on hundreds of projects, some with poor software release processes—even no process at all—and others with well-defined release pipelines. In this blog post, I’d like to share with you some of the practices that may be harming your software release process. Don’t worry, I’m also going to let you in on how to get the release management process right. But first, let’s talk about software delivery.

Many applications today are web-based, meaning they’re either cloud compliant or cloud-native. But not all applications are appropriate for the cloud, and so a large portion of enterprise solutions are hybrid or on-premise deployments.

On another front, open-source packages are playing a major role in web development as well as in standalone applications. Automation has also become a must for many companies. And mobile is another important market, with many frameworks supporting mobile development.

Generally speaking, it’s a great time to be in the software industry. But what effect has this evolution had on software release management process? In terms of tooling, there has certainly been a huge positive impact:

Many services today offer continuous integration and continuous delivery as a service, which eliminates the need to deal with underlying infrastructure and configuration and requires that only the pipeline steps be maintained.
More open-source packages for CI/CD are available now than ever before. Jenkins, for example, can be used on-premises if there is no SaaS solution to fit your needs. Build and testing tools such as Cmake and the Google C++ Testing Framework can be used in the C++ ecosystem as well as C++ API for Selenium.
General IT automation solutions like Ansible or infrastructure as code packages such as Terraform are also available.
Services for supporting the release management process itself enable you to handle the entire process from a web application.

Sorry to be a party pooper, but despite this impressive progress, when it comes to the release management process efficiency, little has changed. For decades, we’ve been dealing with the same issues:

Lack of communication and governance
Failure to implement software engineering best practices
Undisciplined teams.

These problems continue to pollute the software delivery process today, making it one of the greatest industry bottlenecks.

Room for Improvement…

So what are we doing wrong? Where can we improve? Well, it’s time to take a hard look in the mirror and reflect on your software delivery practices, which have likely been less than optimal. So let’s discuss some of the common mistakes to avoid in the delivery process.

We’ll start with the tech side and then discuss management issues separately, since each requires a different approach. The good news is, when it comes to the tech, these issues can generally be addressed immediately by working closely with your DevOps team, and improvements can be seen in the short to medium term. Management issues, on the other hand, where people and culture are involved, are often a lot harder to tackle and require a little more patience.

Tech Issue #1: An Unreliable CI/CD System

CI/CD

When your CI is reliable, this means it’s working efficiently. If your CI/CD system isn’t up at least 99.5% of the time (and hopefully 100%), this affects your market/company’s economy. Trust me, you don’t want to have to delay a release because the CI system is down. The cost of downtime is not something to take lightly. While a lot of us don’t realize this, CI/CD downtime is as bad as production downtime and should be treated the same. You should definitely view your CI/CD solution as another production environment (CI/CD as a product): Set SLOs (service level objectives) for it; plan regular maintenance; and, of course, test it!

Try to use small pipelines and make sure that your servers/nodes are well utilized—it’s pretty shameful if your tasks are pending in a pipeline and you have two idle servers.

These can all help you use the underlying infrastructure more efficiently. I was once unable to deliver the entire sprint’s work because our Jenkins master instance was malfunctioning, and exactly on the day of our release! But this situation could have easily been avoided with regular maintenance.

Tech Issue #2: Slow Builds

The old cliché “time is money” is so true! I only really began to understand this when working on a project with a team of six engineers who were sitting around, literally doing nothing while builds and deployments were happening. I don’t blame them; they were right! There’s not much to do when your CI/CD is going on and that “building” notification’s up. True, incremental builds (the whole purpose of CI), short local builds; and more are there to ease the symptoms; but they don’t solve the problem. Even with all of that, we’re still witnessing a large number of companies running only nightly builds.

I was once in a situation in which every build for a pull request was taking almost 40 minutes. If we had 20 pull requests on average for a sprint, that was almost 13 hours where we couldn’t work due to the slow speed of our CI/CD. And this was happening every two weeks. But that was a mild case. I know release managers that spend hours on a single build. Needless to say, the slower your builds are the lower your team’s productivity will be.

While there are techniques to speed up build times, they’re often time-consuming and require readjusting the code. In addition, the performance improvement is not always as dramatic as you may have hoped for. The easiest, most cost-efficient way to handle this issue is to utilize distributed processing solutions like Incredibuild – and trust me, I’m not just saying that because I work there. By distributing tasks across machines in your network or public cloud, Incredibuild helps to speed up the execution of these tasks.

Tech Issue #3: Lack of Automation

The year 2020 is here, but projects are still being deployed manually. Manual deployments are error-prone, make the project dependent on specific team members, and are time-consuming. Because of this, setting up an automated CI/CD pipeline should be your team’s top priority.

Manual deployments also affect delivery and your team’s capacity. Automation applies to acceptance and capacity tests as well. It’s no fun for your QA team to have to do manual regression every sprint. In general, you should automate as much as possible. With automation, I was able to move from one to four deployments a month. We were deploying a feature or bug fixes practically every week—all of them within regular working hours.

Tech Issue #4: Discrepancies Between Environments

Welcome to dependency hell. Why am I so optimistic, you ask? Dependencies by nature work on your local machine, but not on your CI/CD servers nor on your coworker’s machine. If you added a dependency and didn’t update the CI, then Huston, we have a problem. This issue that is often discussed is the bane of every release manager’s existence. Dependency management is a complex task, especially when it comes to legacy applications. Stick to specific dependency versions, and if possible, use containerized solutions. From my experience, I can tell you that every single project in which I was too lax with the dependencies version ended up failing at some point during the build time due to a dependency change.

Tech Issue #5: Broken Builds Due to Failing Tests

Covering complex parts of the application with unit and integration testing requires a lot of effort. But teams often prefer to skip flaky tests since they’re “random” or because they’re rushed and feel they don’t have the time or the prerequisites for it. Also, agile methodology encourages us to run partial testing (“test what changed”) in order to save time. But skipping too many of these tests is just bad practice. In my experience, on more than one occasion, these random failing tests hit in production. And they’re usually a symptom of a bigger problem.

This is a very simple but strict rule: No test can be skipped during the deployment phase; if they don’t pass, then there’s no deployment. And while you’re bound to piss off some of your R&D and product guys, in the long term, they’ll appreciate it. Even better though, you can avoid having to deal with their short-term frustration altogether by finding ways to speed up test development (Check out Incredibuild’s acceleration solution for testing). This would allow you to include all the tests you need. The goal here is to completely eliminate test-related broken builds and to use automation to minimize the chance of them being deployed in production.

Management Issue #1: No Metrics

You surely have metrics for production, but do you have metrics in place for your CI/CD workflow? How well do you know your delivery capacity? You should ask yourself these questions:

How much time is being spent running tests?
How many minutes per day is your CI working?
How many green/red builds do you have per day?
How do you know if your CI system is malfunctioning?
How much time does it take to get a build into production?
How much time on average does each step in the pipeline take?
How many concurrent builds is your CI/CD supporting?

If you can’t answer these questions, you probably don’t know exactly how many changes you can deliver. It’s important to add metrics to the CI/CD, just as you do with your production environments. Here are a few metrics that have helped me over the years:

Error rates for machines/containers for the infrastructure
Number of green/red builds per day/developer
Max builds per day
Pipeline duration.

These may seem like aggressive metrics, but you’re not trying to point fingers at anyone for not delivering; you just want to “own” your metrics and know what to expect. When numbers change (and they do, trust me), you’ll immediately know that something went wrong with your CI/CD pipelines.

Apart from the CI/CD, for performance analytics, or what you should really call “productivity analytics,” I use the Incredibuild Enterprise dashboard. It lays out my entire builds and machines utilization map with important metrics, such as the number of builds executed, average build duration, build status, processing time (per machine), and machine usage.

Figure 1: Incredibuild’s Enterprise dashboard

Figure 2: Incredibuild’s Enterprise dashboard

In addition, instead of missing all sorts of stuff using the regular output, I get my release managers to utilize build visualization tools, such as the Incredibuild build monitor tool. It enables them to monitor a build in real-time and retrospectively to see how it performed and how things transpired. Each color represents the status of the relevant build file, allowing you to identify which were built without any issues and those that require attention (e.g., tasks with long durations, errors, warnings, bottlenecks, dependencies, and gaps).

Figure 3: The Build monitor tool

The build monitor tool also allows you to monitor CPU Usage, tasks ready to be executed, memory usage, I/O, and much more.

Management Issue #2: Lack of Governance

Even with nifty metrics and the best tooling in place, if you don’t have clear, defined rules and procedures for the delivery, there’s no guarantee you’ll succeed (Let’s face it, success is never guaranteed, but there are ways to improve your chances). I would prefer to work on a project with a malfunctioning CI/CD system than one without release management process governance.

Governance is all about communication and accountability, and communication implies documentation. It’s extremely important to define the point of contact for a feature release and who should be notified when something’s being released. In fact, in my experience, a project should at the very least have:

A documented release plan, including a release checklist template, a rollback procedure, and risk mitigation plan
Documentation about your CI/CD infrastructure and how to deploy code (a detailed explanation of the pipelines)
A shared release calendar
Communication channels for notifying everyone involved that a new feature will be deployed and when.

I recall a project I was involved in for a big telecommunication company in which deployments were happening overnight (from midnight to six in the morning). The policy was to take turns performing deployments (deployment window). One night, one of the teams failed to report their progress, and nobody from any of the other teams knew whom to call for confirmation. Because of this, no other team was able to deploy. This could have easily been avoided with centralized documentation outlining who was responsible for each team’s deployment.

Final Notes

The software release management process is mainly recognized as a discipline within the spectrum of the software release process. Every project is unique, and the process for releasing code to production thus differs from project to project and from company to company. But there are still best practices that we as industry players (even leaders) should follow.

In this blog post, I shared some of the best practices I’ve learned from my many years’ experience in the field. While they’re all equally important in theory, in practice, some will be more relevant than others depending on the nature of the specific project. But for any project, working to achieve a “quality-first” culture within your team and enforcing this should be a top priority.

Dori Exterman

An expert software developer and product strategist, Dori Exterman has 20 years of experience in the software development industry. As CTO of Incredibuild, he directs the company's product strategy and is responsible for product vision, implementation, and technical partnerships. Before joining Incredibuild, Dori held a variety of technical and product development roles at software companies, with a focus on architecture, performance, advanced technologies, DevOps, release management and C++. He is an expert and frequent speaker on technological advancement in development tools.

Cookie	Duration	Description
ARRAffinity	session	ARRAffinity cookie is set by Azure app service, and allows the service to choose the right instance established by a user to deliver subsequent requests made by that user.
ARRAffinitySameSite	session	This cookie is set by Windows Azure cloud, and is used for load balancing to make sure the visitor page requests are routed to the same server in any browsing session.
cf_use_ob	past	Cloudflare sets this cookie to improve page load times and to disallow any security restrictions based on the visitor's IP address.
cookielawinfo-checbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category .
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
__cf_bm	30 minutes	This cookie, set by Cloudflare, is used to support Cloudflare Bot Management.
bcookie	2 years	LinkedIn sets this cookie from LinkedIn share buttons and ad tags to recognize browser ID.
bscookie	2 years	LinkedIn sets this cookie to store performed actions on the website.
lang	session	LinkedIn sets this cookie to remember a user's language setting.
lidc	1 day	LinkedIn sets the lidc cookie to facilitate data center selection.
UserMatchHistory	1 month	LinkedIn sets this cookie for LinkedIn Ads ID syncing.

Cookie	Duration	Description
_gat	1 minute	This cookie is installed by Google Universal Analytics to restrain request rate and thus limit the collection of data on high traffic sites.
_uetsid	1 day	Bing Ads sets this cookie to engage with a user that has previously visited the website.
_uetvid	1 year 24 days	Bing Ads sets this cookie to engage with a user that has previously visited the website.

Cookie	Duration	Description
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_gat_UA-8508435-1	1 minute	A variation of the _gat cookie set by Google Analytics and Google Tag Manager to allow website owners to track visitor behaviour and measure site performance. The pattern element in the name contains the unique identity number of the account or website it relates to.
_gcl_au	3 months	Provided by Google Tag Manager to experiment advertisement efficiency of websites using their services.
_gid	1 day	Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.
_hjAbsoluteSessionInProgress	30 minutes	Hotjar sets this cookie to detect the first pageview session of a user. This is a True/False flag set by the cookie.
_hjFirstSeen	30 minutes	Hotjar sets this cookie to identify a new user’s first session. It stores a true/false value, indicating whether it was the first time Hotjar saw this user.
_hjIncludedInPageviewSample	2 minutes	Hotjar sets this cookie to know whether a user is included in the data sampling defined by the site's pageview limit.
_hjIncludedInSessionSample	2 minutes	Hotjar sets this cookie to know whether a user is included in the data sampling defined by the site's daily session limit.
_hjTLDTest	session	To determine the most generic cookie path that has to be used instead of the page hostname, Hotjar sets the _hjTLDTest cookie to store different URL substring alternatives until it fails.
CONSENT	2 years	YouTube sets this cookie via embedded youtube-videos and registers anonymous statistical data.
MR	7 days	This cookie, set by Bing, is used to collect user information for analytics purposes.
utm_campaign	2 months	Google Ad Services sets this cookie to store session campaign value if present.
utm_content	2 months	This cookie is used for storing the session content value if present.
utm_source	2 months	This cookie is used to record from where the visitor came to the website orginally. This information is used by the website operator to know the efficiency of their marketing.
utm_term	2 months	This cookie is used to record from where the visitor came to the website orginally. This information is used by the website operator to know the efficiency of their marketing.
vuid	2 years	Vimeo installs this cookie to collect tracking information by setting a unique ID to embed videos to the website.

Cookie	Duration	Description
_fbp	3 months	This cookie is set by Facebook to display advertisements when either on Facebook or on a digital platform powered by Facebook advertising, after visiting the website.
_mkto_trk	2 years	This cookie, provided by Marketo, has information (such as a unique user ID) that is used to track the user's site usage. The cookies set by Marketo are readable only by Marketo.
fr	3 months	Facebook sets this cookie to show relevant advertisements to users by tracking user behaviour across the web, on sites that have Facebook pixel or Facebook social plugin.
IDE	1 year 24 days	Google DoubleClick IDE cookies are used to store information about how the user uses the website to present them with relevant ads and according to the user profile.
MUID	1 year 24 days	Bing sets this cookie to recognize unique web browsers visiting Microsoft sites. This cookie is used for advertising, site analytics, and other operations.
personalization_id	2 years	Twitter sets this cookie to integrate and share features for social media and also store information about how the user uses the website, for tracking and targeting.
test_cookie	15 minutes	The test_cookie is set by doubleclick.net and is used to determine if the user's browser supports cookies.
utm_medium	2 months	This cookie is used to record from where the visitor came to the website orginally. This information is used by the website operator to know the efficiency of their marketing.
VISITOR_INFO1_LIVE	5 months 27 days	A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSC	session	YSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-id	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt.innertube::nextId	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.

How to Improve Your Software Release Management Process

Room for Improvement…

Tech Issue #1: An Unreliable CI/CD System

Tech Issue #2: Slow Builds

Tech Issue #3: Lack of Automation

Tech Issue #4: Discrepancies Between Environments

Tech Issue #5: Broken Builds Due to Failing Tests

Management Issue #1: No Metrics

Management Issue #2: Lack of Governance

Final Notes

Dori Exterman

Table of Contents

Shorten your builds

Related Posts

11 minutes Measuring the Value of Development Acceleration

11 minutes Platform Engineering vs DevOps: A Comprehensive Comparison

11 minutes 8 Reasons Why You Need Build Observability

Cookie	Duration	Description
_hjSession_2537450	30 minutes	No description
_hjSessionUser_2537450	1 year	No description
AnalyticsSyncHistory	1 month	No description
BIGipServersn-mch-v2-80	session	No description
BIGipServersn02web-nginx-app_https	session	No description
ib_last_referrer	2 months	No description
incap_ses_1319_2167377	session	No description
li_gc	2 years	No description
muc_ads	2 years	No description
nlbi_2167377	session	No description
original_req_url	past	No description
referrer66_00f	1 month	No description
visid_incap_2167377	1 year	No description
visitorId	1 year	No description