Structural quality and technical debt are major concerns in software development. Recently, a CAST report on application software health (CRASH) showed that businesses were exposed to technical debt that will cost millions of dollars to fix. The CAST Application Intelligence Platform was used to evaluate structural quality of 365 million lines of code within 745 IT applications across 160 companies throughout 10 industries.
“Our findings, although conservative, revealed an average technical debt of $3.61 per line of code,” said Dr. Bill Curtis, CAST’s chief scientist and senior vice president of the CAST Research Labs and director of the Consortium for IT Software Quality. “A significant number of applications examined in the study – nearly 15% – had over a million lines of code, which means even the smallest of those contains over $3.6 million in technical debt.”
I had the opportunity to talk to Dr. Curtis about the report and ask him more about the findings.
I recently did a Q&A with Casper Jones and Olivier Bonsignour about their book, The Economics of Software Quality. The CAST Application Intelligence Platform generated all the structural quality data in the book, so there are definitely some common themes. However, I notice that there are some differences in the findings. Can you explain why there may be differences in the results?
Dr. Bill Curtis: Their scoring was based on many other factors than just structural quality, such as ability to detect missed or faulty requirements early, ability to get detect functionality defects quickly, etc. Structural quality is often spoken of as technical debt in the Agile community, and it is a very big concern for them, since structural issues are often hard to detect with the functional testing that dominates Agile and iterative methods because of their short cycle times. Consequently, our results can be different because we only look at structural quality.
The table of quality attributes in The Economics of Software Quality rates "lines of code quality measures" as -5 and "cost-per-defect quality measures" as -7, but it seems as though these are being used in some of the data described in the report. Are metrics based on "lines of code" still relevant?
Curtis: There are a wide range of opinions on Function Points and there is no perfect size measure. Since Function Points are usually counted manually by trained experts from the source code, they are extremely expensive to count and many organizations have moved away from them. In addition, two certified experts can differ by as much as 10% in their final counts (sometimes more) since the definition of Function Points has some ambiguity in its specification. For the CRASH report, we need an exact, inexpensive and repeatable size measure, which we get since AIP has a standard definition of how to count lines of code. We have recently automated a CAST version of Function Point counting in AIP, but many of the applications in our sample had already been analyzed before it was available. In the future, we anticipate being able to use an exact, automated and repeatable measure of Function Points in our CRASH report. Some people use 'Backfired Function Points,' but these are just estimates of Function Points based on lines of code and weighted for different languages. CAST is working with the Consortium for IT Software Quality (CISQ) to establish a standard international definition for automated Function Points, which will eliminate the cost and reliability problems associated with manual Function Point counting. CISQ has submitted its specification for Automated Function Points to an international standards group and it is currently working its way through the standardization process. We believe that when approved, this standard will enable a dramatic increase in the use of Function Points globally. Until then, lines of code are a legitimate measure of size, provided you can enforce a consistent algorithm for counting them. We are able to do this with CAST AIP and avoid some of the traditional problems with line of code measures in the CRASH report.
The definition of defects in 'cost per defect' measures is almost always based on failed test cases. CAST does not use test case failures in its calculations because the AIP is looking for structural problems, and test cases are usually checking for functional correctness based on the functional requirements. We are looking at a completely different area of quality than is assessed in the traditional 'cost per defect' analysis. The only measure we report that is similar to 'cost per defect' is technical debt. While we caveat our formula, we have provided data in an area where there was none and have enabled an entirely new level of discourse on a critical emerging topic. We will continue refining our methods for calculating technical debt as we are able to learn from field experience.
Was some kind of static analysis tool used to assess risks throughout the code?
Curtis: The analyses of these 745 applications were performed by CAST's Application Intelligence Platform (AIP), an industry leading application-level static analysis technology. AIP is able to parse the source code of more than 28 computer languages and rebuild the metadata into a representation of the full application from GUI to database in order to detect violations of good architectural and coding practice that may cross different layers and languages in the application.
Were the reports then given to the 160 companies so they could decide what to fix?
Curtis: Analysis output listing violations and quality metrics are provided with every analysis performed by AIP. The results are available in the AIP's dashboards that are designed for use by executives and application managers, in addition to a portal through which developers can trace the location of violations in the code. In cases where customers have contracted with CAST or a CAST partner to perform the analysis, reports are provided to the customer.
If it will cost of $3.61 per line of code to address the technical debt, might it be less costly to create replacement applications that don't have the technical debt?
Curtis: $3.61 per line of code is a conservative figure for fixing the technical debt. We made conservative assumptions in our technical debt equation in order to provide a low-end number for the debt. The damage and cost of a single security breach in an application containing confidential customer financial data has been reported in the media to be at least as great if not greater than the total technical debt calculated for our largest application (approximately $4 million). Replacing a significant legacy application and getting it to perform at the same level of quality, performance and functionality is an extremely expensive decision. That is why there is still so much COBOL in operation in the financial services and insurance sector. However, failure to manage the amount of technical debt can force an organization into a very expensive replacement exercise that can expose the business to considerable turmoil until the replacement stabilizes.
What are the best ways organizations can prevent technical debt?
Curtis: First, they can remediate the most severe violations before they release the code into operation, thus avoiding the most dangerous forms of technical debt. Second, they can set targets in their quality measures and manage their development and maintenance to those numbers. Third, they can also establish and monitor quality targets in contracts with their outsourcers, as we are seeing many organizations begin to do. Fourth, they can include the most frequent medium and high severity violations in their up-front design reviews. Fifth, they can provide feedback to development teams about the types of violations they are making so the teams can eliminate their causes in future development work, and we have seen evidence of team learning at our customers. Sixth, they can incorporate information about frequent violations into their training to ensure developers have the knowledge and skills required to avoid them. Finally, they can build their own repository of structural quality data to detect trends in their applications and their development activities in order to make informed executive decisions about how to manage technical debt. This is why we call it 'application intelligence.'
Is there a way they can assess their own applications?
Curtis: Many companies have acquired licenses to use the CAST AIP in-house and have built Application Intelligence Centers to support its use. They also often use it in a quality gate to control the quality of applications received from their outsourcers and suppliers.