Any time something has to be assessed or compared, metrics are needed. They are quantifiable measures that are used to judge progress in every industry, including software development, where dev leaders rely on software metrics to track performance and production.
In our blog post on How to Measure and Improve Developer Productivity, we discussed the role of code metrics in the context of measuring and improving developer productivity. One of the challenges is selecting the right metrics for the right job at the right time, and this article is here to help you learn what it is that you need to know.
What Are Software Metrics and Why Are They Important?
The job of any manager is to create value for their company, and in software, dev leaders are in charge of building and improving products, managing time, and meeting their budget. These responsibilities all rely on the ability to assess the current state of a team or project, measure progress, and estimate the time and resources required to reach the next milestone.
Dev leaders rely on software metrics for quality assurance and to help strengthen collaboration between team members. Effective software metrics go hand-in-hand with proactive management, helping dev leaders to meet release dates and keep expenses under control. Software metrics assist in the identification of problems in current and pre-release builds, and they can be used to track and prioritize issues accordingly. Importantly, the right software metrics pave the way for more accurate forecasting and help to consider the impact of decisions made during the development lifecycle.
How Are Software Metrics Lacking?
Software metrics are tricky because measuring software quality is not only multifaceted and subjective but also, what’s important can change from project to project. The priorities and what matters most in one team or organization will be different for another, which implies that there is no one-size-fits-all software metric.
How To Choose the Right Metrics at the Right Time
Myriad software metrics exist and choosing the right ones depends on exactly what it is that you need to track. A common problem is that metrics are not always linked to the goals of the project. If memory optimization is important, for example, then reducing the number of lines of code (LOC) might be pertinent. However, if the goal is to minimize errors then a heavier weight should be applied to testing metrics such as the number of bugs reported.
The stage of a project will also dictate which software metrics should be given more consideration. Early in a project, the number of Git commits will yield valuable information about how quickly modules are being added. Later, dev leaders will be more interested in the mean time between failures (MTBF), or the application crash rate (ACR).
Evaluating and Tracking Metrics
One should remember that a valuable metric is more than just a number. A single-point indication at any given time is of little use without showing the trend, and they need to present the bigger picture. This is because from day to day, there may be little variation for a specific metric. Yet, over time, the trend will highlight gradual success or failure. If your metrics aren’t highlighting trends then they need to be rethought.
Another important point is that metrics have to suggest and promote changes. Having metrics that do not lead to significant changes is a waste of time because without this, developers will continue to make the same mistakes. Again, metrics need to be more than just a number, they need to further your goals. If there is a metric that has not led to changes in the codebase or the process, then it is best replaced with something more meaningful.
Types of Software Metrics
Software metrics fall into several different categories, each of which may be more or less useful to a project depending on the team, environment, and development phase. Different metrics are applicable with varying levels of granularity, where one is relevant to individual developers but others consider the performance of the team. Naturally, some apply to both, although individuals and teams should be rated separately.
Code metrics are measures that indicate specifics about a codebase. These might be size-oriented metrics such as the total number of lines of code or the ratio of logic to comments, or content-related, like an indicator of code complexity. In general, code metrics are only somewhat helpful, especially when used in isolation, as they do not fully consider the context and thus cannot tell the full story.
A size-oriented metric is a type of code metric, generally used to compare same-language projects in terms of thousands of lines of code (KLOC). KLOC metrics are not intended to measure the size of a project. Rather, they are statistical measures that indicate the relative number of errors, the relative number of defects, and the relative cost of each 1,000 lines of code. It uses this common baseline, regardless of project size. Due to the inherent differences between programming languages, size-oriented metrics are not useful for comparing projects that are developed using different languages.
Coding methodologies vary between companies and projects. Although there are common goals, the approaches, and more specifically, the journey for each is different. When it comes to KPIs, different methodologies have different priorities. Two popular methodologies are Agile and Waterfall.
Agile process metrics
These metrics are specific to a development team following the Agile process methodology. They measure how effective teams are at releasing shippable software. For example, Sprint Burndown is used to track the completion of work during a sprint and shows how much work remains. Velocity is another Agile metric and it describes how much work a team can complete during a sprint, measured in either hours or story points.
Waterfall methodology metrics
The Waterfall methodology, which is more fixed than Agile, is often used when a project needs a high degree of reliability or one with requirements that are non-ambiguous. With a fixed timeline that does not include frequent feedback, the metrics are somewhat different. For example, the number of bugs discovered in the implementation phase is important because it sometimes necessitates returning to the design phase. Having this happen too often might be a sign that the pre-implementation review is being done too hastily.
Measures of productivity can be used to determine how much work has been done on a project, or by a team. This correlates highly with efficiency and the speed at which work can be completed, essentially highlighting where a team excels and where they need to improve. When selecting productivity measures, ensure to choose ones that are not only relevant but represent the actual goals of the project.
Productivity is indeed something that should be maximized in any project. As we discuss in our blog post on Productivity Tools for C++ Developers, there are a variety of tools to assist with boosting productivity for both individuals and teams.
A security metric is a measure that shows how susceptible or resistant software is to security incidents. Having software vulnerabilities can be very costly and lead to problems with governmental compliance, so this type of metric is indispensable for certain products. Examples of security metrics are the average time required to resolve a vulnerability and the number of vulnerabilities identified by automatic static code scanning. It is important to remember that certain security metrics, such as those that are compliance-related, will be more relevant to executive management than the number of vulnerabilities detected by a code analyzer. It is important to cater to all of the relevant stakeholders.
Operational metrics help to assess how well the software is running in a production environment, including a product’s annual uptime or ratio of uptime versus downtime. These are important because it speaks to quality assurance, product reliability, and whether enough resources are dedicated to maintenance and support. These metrics are not as useful for development teams working on new products.
A product metric indicates how well a product is doing in the market. These measures are not solely in the domain of software but still apply. They allow you to track things such as how well your product meets the company’s objectives. Examples of these are user adoption and customer retention. These metrics are aimed to answer questions for management and planners, as opposed to developers.
Quality assurance is a catch-all term that includes failure metrics, as well as other maintenance-related measures. This includes details like the average time between failures and how long it normally takes to fix them. This can provide insight into the amount of uptime versus downtime, and how much of the downtime can be attributed to maintenance.
There are a variety of test metrics that developers and product testers use before any release moves to a production environment. These measures help to provide information about how well tested a system is, which is related to QA. These metrics, however, are not intended for management, which is how they differ from more general QA metrics. A QA metric is used by management to judge the quality of a release, and a test metric is intended solely to assist developers at the pre-release stage.
Specific Metrics – A Closer Look
Now that you have an overview, let’s have a look at some specific software metrics that you might choose, depending on your project and processes.
This reflects the length of time it takes to develop a new feature or module, from definition to delivery. It tends to show how responsive the development group is to requests from stakeholders. Even if a team is unwilling to provide an approximate time to deliver, one can be estimated by looking at previous products with a similar feature set.
This metric refers to the time between a change request and its shippable, or production release. This includes the time to open an issue, time to find and review the problem, time to approve the work, the time required to complete the changes, and finally the time to deploy. This is an important metric because it indicates the time to value versus efficiency ratio.
The deployment frequency refers to the number of releases per day. This metric indicates the level of value that is being delivered to the customers. This is important to consider because a development pipeline can be efficient with a low cycle time, but at the same time, having a low deployment frequency could mean that not enough value is being delivered.
A team’s velocity provides insight into how much work a team completes during an Agile sprint or a release iteration. While it is useful for gauging progress within a team over time, it should not be used to compare teams because the nature and complexity of each team’s deliverables may not be fairly comparable.
An open/close is a production issue that is identified and recognized within a set period. As this rate increases over time, it shows that the team is becoming more efficient at fixing problems.
Efficiency / Productivity
Efficiency generally refers to how much of a developer’s code is in production, measured in terms of percentage rather than lines of code. A high efficiency correlates with providing value for a longer time, whereas low efficiency might indicate many false starts on an innovative feature that is difficult to implement. The opposite of efficiency is code churn, which indicates the level of non-productive coding.
The number of active days is related to a programmer’s productivity. An active day represents a coding day worked by a single developer on a single project. This tracks only programming and does not include administrative work. In fact, administrative tasks such as meetings take away from coding time, which is what this metric actually measures. Essentially, tracking the number of active days puts a spotlight on the cost of interruptions.
This metric is a subjective measure that indicates the degree of change to a project after code has been added, deleted, or modified. The idea is that changes with a heftier impact are more difficult to implement, suggesting a larger undertaking or perhaps a greater cognitive effort. For example, the addition of a novel and complex feature will have a greater impact than changing the text in a set of output statements, even if there are many more lines of modified code.
Code Churn is a Git-based metric that provides insights into individuals and teams alike. It represents how much of a developer’s work is modified or deleted over a short period and is normally presented as the number of lines of code that have changed over the specified time. Having a high code churn can mean that a developer is unsure of what to do, has trouble with the implementation, or even they do not have anything else to work on. From a management or team perspective, it could indicate that the module or feature in question was not properly defined or was prematurely added.
Mean time between failures (MTBF)
This QA metric represents the average time between failures, defining the reliability of the system. Failures are bound to occur but it is best if they are few and far between. Ideally, when a failure does occur, the time it takes to recover is relatively short, but regardless, this metric can assist when it comes to scheduling preventative maintenance.
Mean time to recover/repair (MTTR)
Even highly reliable systems fail and when they do, customers want to minimize any downtime that occurs as a result. For this, the average time to recover from a failure must be kept as low as possible. Of course, the severity of failures will differ, as will the individuals making the necessary changes, all adding noise to the metric. However, over time, the MTTR will act as a reliable estimate when predicting how long the client will have to wait before operations return to normal.
Application crash rate (ACR)
An application’s crash rate is similar to the MTBF but refers to the ratio of how often it is used versus how often it fails. MTBF is different because it is a measure of time.
An endpoint incident is a security-related issue, indicating how many devices have been affected by malware for a specified period. This could be the result of a vulnerability in the software.
Errors per KLOC / Defects per KLOC
Cost per KLOC
This measure describes the average cost for one thousand lines of code. It can be used to describe different phases of the project. For example, the cost per KLOC during development will be different from the cost per KLOC during post-release maintenance.
Effort per FP / Defects per FP / Cost per FP
These are function-oriented metrics that depend on one first calculating the Function Point (FP). An FP is a measure the represents business functionality available to the user in a software application and is defined according to the requirements.
As a basic unit of measure, FPs have correlative measures that include the effort per function point (EFP), the number of defects per function point (DFP), and the cost per function point (CFP). A lower EFP translates to better productivity, whereas a lower DFP is representative of a higher-quality product. CFP indicates cost efficiency, and a decreasing CFP means development and maintenance are becoming more cost-effective.
Defect removal efficiency
The defect removal efficiency (DRE) is used to express how many defects were found by end-users, as compared to how many were found during pre-release development and testing. It is calculated by dividing the number of errors found pre-delivery by the total number of errors found both before and after the software goes into production. The more errors that are found by end-users, the lower the number. A perfect score is 1.0, which would indicate that no problems were identified by end-users in production.
Bad Implications When Selecting an Improper Measure
Choosing a poor metric can have consequences that go beyond simply wasting time, especially when considered in isolation. It is important to understand what you are measuring, and the following examples illustrate some potential problems that can occur.
Lines of Code (LOC)
Consider a situation where LOC is the only or primary metric. On the face of it, one might assume that writing more lines of code is better. However, as an experienced manager, you know that this is not always the case. In reality, when LOC is a driving factor then developers tend to write long code that is less elegant, more cumbersome, and perhaps less efficient. Unless it is combined with an efficiency-rewarding metric, LOC is more of a burden than a bonus.
Another problematic metric used in isolation is code coverage in testing. Code coverage alone, without utilizing other quality metrics such as the number of defects found per test, can produce a misleading result. This will happen if the tests are naïve and do not reliably identify bugs. The code coverage may be very high, yielding an impressive metric, but without knowing the results of the testing, evaluation of the tests is more difficult to do.
Are There Measures that are More Relevant to a Specific Project than others?
It is without a doubt that different metrics are more relevant to specific projects. Measuring impact, for example, is not relevant for a project that is in its initial stages. On the other hand, code metrics such as LOC are more valuable during early development, as compared to handling feature requests in a more mature product. Selecting the right measures for your project means matching metrics against the right goals at the right time, and ensuring that metrics complement each other to add validity to the statistics that ultimately lead to positive changes to the codebase, team, or processes.
Software metrics are the quantitative measures applied to the software development lifecycle that can be used to assess the current state of a project. Over time, trends will appear and dev leaders can show progress, calculate the impact of project decisions, and make reliable estimates about timelines. Metrics are invaluable and indeed critical for advancing any project.
At the same time, having too many metrics, or ones that do not contribute to the goals of the project is counterproductive. No metric is an exact science. Choose wisely! Then, track and evaluate them to ensure that each one still serves a purpose. If a metric doesn’t lead to changes then it must be revised or replaced.