Using SLOC to estimate software costs, schedules

Poor cost and schedule estimates ruin projects more than technical, political or development team problems. But if you can determine the source lines of code (SLOC) in an application, you can better gauge the amount of time and effort needed to complete the project.

Poor cost and schedule estimates ruin projects more than technical, political or development team problems. But...

if you can determine the source lines of code (SLOC) in an application, you can better gauge the amount of time and effort needed to complete the project. Here's a look at three ways to measure SLOC and some of the risks and realities of SLOC.

SLOC realities and risks
"In extensive research of over 20,000 software development projects spanning 18 years, we found that more projects were doomed from poor cost and schedule estimates than they ever were from technical, political, or development team problems."

This is because the software project estimate, which is based on the size estimate, establishes funding constraints and a project plan that can limit the management, engineering, quality assurance, and reporting processes that can be used, the tools that can be applied, and the ability to engage in rework and risk resolution. Unrealistic estimates caused by analyst bias, customer or management pressure, or any other factor will increase the risks to the project. Spending time in software sizing and preparing the most viable range of software sizing that serves as a foundation for the cost estimate most likely can do more to reduce risk to a project than nearly any other factor including program requirements, architecture, quality factors, and other issues that affect the integrity of the process and the timely delivery of an acceptable product within projected budgets. The figure below illustrates that size drives estimates.

Using SLOC
SLOC can be used to measure either an existing program or a program yet to be built. Using SLOC to measure the size of an existing program is reasonably straightforward and involves little risk when appropriate counting rules are applied to a sound technical baseline. This straightforward process becomes more complicated if it is used to answer questions such as, "How big would the application have been if we had used a different language?" or "What savings in code would result if we modified the architecture?" Careful engineering analysis in concert with some rules provided in this chapter can help make these trade-offs possible.

Program size is the most obvious and perhaps the most fundamental measure of the amount of intellectual work needed for software development. Physical lines of code are among the easiest measurements to make; measurement simply involves counting the end-of-line markers. For example the number of SLOC in Microsoft Windows can be found by searching the Internet. However, without the definition of the SLOC, that information is nearly meaningless (unless the goal is merely to "'wow"' someone with big numbers). Thus, published information on software measures that depend on SLOC is difficult to interpret and compare with programs yet to be developed. Complicating the matter, many questions, such as those related to costs, schedules, progress, reuse, and productivity may be unanswered. One Software Engineering Institute (SEI) report says this about the measurement of source code size:

Historically, the primary problem with measures of source code size has not been in coming up with numbers — anyone can do that. Rather, it has been in identifying and communicating the attributes that describe exactly what those numbers represent.

Robert E. Park, formerly of the Software Engineering Institute, developed a framework (the SEI Checklist) for defining SLOC values to enable people to carefully explain and define the SLOC measure used in a project. Of the numerous types of SLOC measures, three stand out as most used: physical carriage returns, physical SLOC, and logical SLOC. They are defined below.

Physical Carriage Returns -- Includes a count only of physical line endings. Counting physical carriage returns has little value in estimation. Counting the comment lines does not provide significant correlation with effort. Comment lines do not involve any additional intellectual work beyond what is encompassed in the code. They simply exist for documentation purposes. Blank lines can vary by 2 or 3 to 1 based only on coding standards or personal preference on the number of blanks before and after each comment and the standard for the code preamble. It should be noted that ratios of comments to source lines may have value from a quality viewpoint. Additionally, from a size perspective, the ratio could be used to approximate a logical line count. In converting physical carriage returns to an approximate logical SLOC count, we have seen that ranges can vary from 20 to 70 percent (i.e., physical carriage returns X 20 percent = approximate logical SLOC). You can develop these ratios for your software by doing manual counts on several samples of code.

Physical SLOC -- Expresses the physical length of code, which is every single line of source code as seen by human eyes, excluding comments and blanks but counting each physical line ending of all other lines. A physical SLOC is most commonly defined as a count of nonblank, non-comment lines in a program's source code. The SEI technical report on software size measurement states that, "Counts of physical lines describe size in terms of the physical length of the code as it appears when printed for people to read."

This is among the easiest measurements to make because it involves counting the end-of-line markers rather than programming language-specific syntax. You need to specify only how you will recognize the statement types that you will not count, e.g., comments and blank lines. It is relatively easy to build automated counters for physical source line measures; however, measurement results can be more subject to variations in programming style than results using other measures. Later in this chapter, Table 5.3 details differences in line counting methods.

Logical SLOC -- Measures the number of logical programming "'statements"' that may or may not cross over more than one physical line. The ideal count examines the language syntax and counts the statement types (e.g., data declarations, math statements) explicitly. Many languages can be counted more quickly by counting terminators (e.g., C-like languages can be approximated by the number of line-terminating semicolons and closing curly braces). The rules for counting logical statements should specify how to recognize and count embedded statements for each source language to be used. Logical SLOC is less sensitive to formatting and style conventions than physical SLOC. According to the SEI technical report on software size measurement:

The count of logical statements is an attempt to characterize size in terms of number of software instructions, irrespective of the physical format in which they appear.

Logical source lines are the lines that carry programming instructions and data declarations, that is, the implementation of the software design represented by the actual instructions that convert to executable code and data. Therefore, counting logical lines is the best measure of software size. Much of the historical data that has been used to construct cost models for project estimating is based on logical measures of source code size.

This article was originally printed in the book Software Sizing, Estimation, and Risk Management. It is being reprinted with permission of Auerbach Publications, Taylor & Francis Group.

About the author: Dan Galorath is president of Galorath Inc.

Dig Deeper on Topics Archive