The DevOps mindset has taken hold of many aspects of the software delivery process. So much so that it is worth gaining additional insight on the day-to-day operations that can show success markers and areas for improvement. Rather than focus on daily reports that do little more than show “normalcy,” today’s DevOps metrics focus on measurable data that is (or should be) collected throughout your company’s DevOps implementation.
Information and the tools that collect those metrics include anything from simple progress indicators like number of deployments, to a combined view of data that supports justifying additional automation. Looking at DevOps through this expanded lens is the goal of those interested in measuring performance for DevOps metrics. We will take a deeper look into the metrics and tools that help provide that data.
What does a successful DevOps implementation look like?
It is important to determine what benchmarks and other factors are involved in demonstrating the success of a DevOps implementation. Beyond the measurable metrics we will be discussing, successful DevOps depends on factors specific to the products and services a company is providing.
Overall, the main goal of a DevOps implementation is to automate software delivery in a way that provides less human intervention. Along the way, quality is improved through various means. Pre-commit git hooks are used to check with a specific set of policies to help prevent code from even reaching the codebase. Automated QA methods allow for extensive API and UI testing before code is merged in.
So, does that mean if you’re using steps like these in your pipelines that you have a successful DevOps implementation? Hardly. These are just small parts of the larger picture that DevOps is meant to help with, especially with the additional automation needs for today’s software delivery. Measuring success via a set of DevOps metrics helps look past the small victories to show the overall benefit of the mindset. How else can we determine whether the investment in additional DevOps automation is worthwhile and how it can be continuously improved?
Constant assessment and improvement are key
Once you have the initial DevOps processes implemented, their performance needs to be brought to the attention of a wider audience. Work done in silos has never proven to benefit a software team and it’s critical to bring DevOps processes to a point where they can be viewed and continually improved.
What is the best way to go about this in a service-type setting such as DevOps? There are some factors that may be more measurable than others. If your group handles tickets like a service desk, you may have things like SLA to keep track of. Along those lines, the number of support incidents generated after a change in automation may be important.
Furthering analysis of these in addition to other DevOps metrics is where a more appropriate picture can be obtained, showing more than just vanity numbers. While it may be interesting to know how many successful builds were completed, that number would mean more when compared to other metrics that dig further into the downstream effects of those builds, including releases tied to sprints or similar software delivery schedules.
Benefits of data-driven DevOps metrics
It all comes down to seeing how your DevOps truly perform when it comes to implementation and some industry standards. This goes beyond the vanity metrics and deep into the core of what we hope DevOps can do: Provide a method to deliver software and services through automation in a way that is repeatable, consistent, and without error.
From initial inception to delivery, and eventually to support incident metrics related to the release – this is all important information to show the benefit of today’s DevOps. While there isn’t necessarily one product or service that provides a quick culmination of these DevOps metrics, there are methods that the industry is moving towards that will help guide you in what may work best.
It may also be possible to collect too much or the wrong type of data. Going back to our example of vanity metrics like “number of successful builds,” it may be tempting for teams to start looking at small pieces of data as their own performance indicators. A particular group may see that adding more lines of code is an indication of success, while the actual value may not shed any light on success or failure at the time of delivery.
By keeping data in mind at the start of an automation effort, some of the more common data elements that may provide deep insights can be identified early on. Determine what is available for collection and take note of how that data can be processed and used as a Key Performance Indicator (KPI). KPIs are measurable values that can show progress towards a key business objective. Common metrics like “lead time” can be derived from the tasking and CI/CD information for a particular objective.
These DevOps metrics should adhere to industry recognized properties and standards that represent useful data. You may have been exposed to the acronym SMART: Specific, Measurable, Achievable, Relevant, and Time-bound. The following are good examples of properties that adhere to the SMART methodology in the context of the software development life-cycle:
- Metrics should be measurable in that they have an actual value that can be averaged (numbers) or counted (e.g. pass/fail). Relations between specific measures can be expressed in percentages (e.g. number of defects opened out of total numbers of tests implemented/total number of test lines).
- Relevancy is important and often dependent on the business or service. That’s naturally obvious, because there’s no point in measuring and reacting to things that are not related to actual business results (however, understanding what is related to actual business results is not that obvious).
- Resilience against falsifying or inflating results by team members. Measures shall be resilient in how they are being recorded. For example, “number of defects opened” is a measure that can be easily “messed up” by opening false defects and then closing them (e.g., by a testing engineer who wants to present high activity levels). We will discuss ways to deal with problems as that, but for this example we can decide to ignore defects that were closed with status “not an issue” or “not reproducible”.
- Identify actions that will help improve the overall process by including workflows for automation and implementation of additional policies. This relates to relevancy: you wouldn’t want to put effort into measures for which you cannot control any improvement, as you cannot identify the actions that affect them.
- Being traceable to the source is a critical property of DevOps metrics. Rather than just giving an indication of a failure, a path to find where the failure takes place or what data element is responsible is a “must have”, otherwise, again, improvement is not possible.
DevOps Research and Assessment (DORA) helped identify key metrics as important data points that are considered targeted and easy to implement in most situations. There are four such measurements that are often used as KPIs to help capture overall delivery performance:
- Deployment Frequency
One of the first tenets of DevOps automation is to deploy often. This is surely a method to show the value of your automation and various integrations with anything from the code repository to the deployment target. You may not be able to tell other aspects of these deployments, for example, how many of those deployments resulted in additional failures that affect other key metrics?
The information for this metric is often derived from the software used to deploy. In most cases, the major build and deploy products on the market have built-in reporting available for such data. Using just this data point may show your DevOps has progressed to a higher level of performance. However, as with most metrics, it needs to be looked at alongside other information to ensure that deployment frequency isn’t being increased due to a desire to show progress when there isn’t one.
Azure DevOps, for example, ties more information to the release metric. This is beneficial to determine other valuable data points: work items can be tied to the software development process so that a history of the change is included along with the simple frequency of deployment information. Other Source Control/
tools also have similar features.
- Lead Time
Imagine a timer starting from the moment a developer commits their changes to the moment where it is running in a production environment. This span of time is often called “lead time.” It can be a quick indicator of performance that shows how well your DevOps automation is going. Long lead times could mean there are more areas that could be improved as far as efficiency and automation opportunities.
Measuring this DevOps metric involves tying information from multiple systems to create a complete view. If a team’s work is completed in Atlassian JIRA but they release with Jenkins, there may be data that must be compiled from both products in order to show the true lead time. This hybrid use of DevOps tools is nothing new to most of us. This metric is a good indicator of how swiftly the team can respond to feedback and feature requests. While it may be hard to set a desired target for this measure, being dependent on the project, tools used and complexity of each development, the goal that we should strive for is seeing improvement over time – for instance, deployments taking minutes rather than hours is a win.
- Mean Time to Recover
Failures happen. The key is to lessen the effect of those failures. One key metric is meant to show how long it typically takes to recover from a failure. Mean Time to Recover is the amount of time needed to recover from a failure. Collecting this data can show how a team is able to respond to a defect or an outage requiring a code release, reconfiguration or other troubleshooting actions. With multiple environments to consider, the information gathered may be one that deserves multiple points to be measured. For example, does the value consider time taken to recover in Development vs. Production?
By using data from service tickets for both internal and external customers, a calculation of this metric can be constructed. The ticket and associated SLA metrics are measured against other metrics such as “Lead Time” to surface the “Time to Recover.” Using statistical calculations like means and averages allow for a metric that has a targetable aspect,. one that hopes to reduce time to recovery to provide better support. The question is how quickly from the moment a major problem arises, the team is able to recognize the root cause, understand it, solve it and deploy the solution.
- Change Failure Rate
This measure captures the rate of commit failures (using here the term “commit” in a very generic sense: you may implement that for code commits to the main branch, to deployments from development to integration/system testing and so on).
This metric is centered both around the quality of code being deployed towards production and the quality of your DevOps gates. At first you may believe a failure rate of zero would be the most desirable, but this is not realistic in a real-world scenario and in most cases would not point to high development quality but rather the low quality of your DevOps gates. A high change failure rate can show that your automation is working as designed, but you may have a problem with your dev, or you could be missing prior gates.
It should be emphasized that the gates shall be propagated to R&D so that they check themselves before commit. Bottom line: if there is a high change failure rate, things are not good, even if DevOps is doing their best. That’s why this is a good measure! Using release management software, this metric can be found quite easily. And if we can trace the changes back to their commits, we can then analyze specific cases, allowing us to pin-point the root cause of issues and seeing if the problems come from specific areas in dev, whether we are missing a certain prior gate, etc.
What are some other possible DevOps metrics?
While the “North Star” metrics we discussed above are good for many situations, others need to rely on different markers to indicate success, failures, or to show progression . Keeping in mind that some are more “vanity” in nature, that may be the exact metric needed for that application.
- Deployment Time – This is either very important, or not as critical as it may be for other teams. There are times when a high deployment rate is necessary due to the order of operations. Other times, this metric could indicate that more resources are needed for build agents. This measure is a subset of “Lead Time,” focusing only on the deployment part, without the CI pipeline part of building and testing.
- Failed Deployments – Not the same as failed releases, this metric is about how many deployments simply errored out when they were attempted. This is usually an indication of an unstable release environment, but could also be configuration errors in the way builds and releases are being executed, maybe missing a gate before deployment or having a loose part in the process. This may be especially true in areas where developers are able to control the build process using YAML or other similar definitions.
- Code Commits – A simple metric that can be easily retrieved from your source control tool. What is considered good, average, or low, is more project-specific and should be interpreted on a case-by-case basis. A high number of commits could show good practices for keeping the feature branches up to date. A low number could mean that less work is being done than expected, or it may be possible developers are working on local branches that prevent them from committing as often. Neither is beneficial!
- Injected Activities – Injected or unplanned activities can have a lasting effect on a software release process. Besides lowering the amount of time dedicated to specific work items, people often like to display this metric to justify additional resources. An example of a measurable activity might be the number of support tickets destined for DevOps engineers vs. the project work they have while working directly with teams.
- Application Usage Insights – How people use the software is absolutely a factor for those who release frontend applications. Today’s logging and usage tracking allows for data that shows how a feature or bug fix may affect how consumers respond. While not for every industry, it is a good indicator of how a change directly affects the main product. One such example is to look at the number of errors caused by people using the application after a release. If this value exceeds a certain threshold, that can be an indicator that the software release broke something.
As you can see, many of these DevOps metrics intertwine with one another. How they affect each other becomes apparent once many of the main measurements are in place. this means that if your Code Commit metrics show low numbers but you have many Injected Activities, you could surmise that the low amount of code being committed is due to other activities getting in the way.
Let’s test another scenario. Looking at our “North Star” metrics, Lead Time and Application Usage Insights can absolutely show trends related to one another. How? Let’s say you have longer than desired lead times for feature requests that consumers have made. The usage insights may show less and less visits to the application, or even deeper metrics related to subscriptions. These and other DevOps metrics may show losses due to long release cycles.
How are DevOps metrics used today and how should you use DevOps metrics?
It is clear that measuring is important. Without measuring it is hard to improve in a methodological way. When it comes to improvement processes, intuition is not enough in the long run, especially when your organization and code base is growing, even beyond the single team level. However, you should be cautious when implementing and acting on measure alone without understanding their actual meaning; this may encourage wrong and biased behaviors such as unnecessary frequent deployments to show that things progress well even if they are not. This is true for almost any measurement and is the reason for looking at combined measures, which help see the real picture and not just an obstructed view of it.
Interestingly, some aspects of using metrics to measure DevOps success have drifted to the sidelines. Information on the subject is available, but there isn’t much in the way of discourse in the DevOps ring. Whether that is due to a need to prioritize other initiatives, or a lack of success in getting everyone on the same page, is hard to say. What most of us do find important is to continuously improve our automation processes. This is not to say that DevOps metrics are not in use – they are! – but maybe not to the extent you would expect it to be used, especially when compared to more specific development measures (such as “Code Coverage,” which is quite specific for testing and being highly in use – looking at a narrow artifact, even if important – while the big picture is much more interesting and sometimes overlooked).
It is worth noting that different organizations and products would look at different measures and strive for different actual value for each measure, which is natural. However, any organization that strives for improvement should set goals based on its own measurements.
Looking farther into the future, additional metrics will be available from AI systems. Taking a look into the future of DevOps, there will likely be more focus on automation, Infrastructure as Code, and the sheer amount of data that’s becoming available. Creative and useful ways to look at the metrics from that data will become more and more important to show results.