National Impact Estimates …………………………………………………. 8-6 i Executive Summary Software has become an intrinsic part of business over the last decade. Virtually every business in the U. S. in every sector depends on it to aid in the development, production, marketing, and support of its products and services. Advances in computers and related technology have provided the building blocks on which new industries have evolved. Innovations in the fields of robotic manufacturing, nanotechnologies, and human genetics research all have been enabled by low cost computational and control capabilities supplied by computers and software.
In 2000, total sales of software reached approximately $180 billion. Rapid growth has created a significant and high-paid workforce, with 697,000 employed as software engineers and an additional 585,000 as computer programmers. Reducing the cost of software development and improving software quality are important objectives of the U. S. software industry. However, the complexity of the underlying software needed to support the U. S. ’s computerized economy is increasing at an alarming rate. The size of software products is no longer measured in terms of thousands of lines of code, but millions of lines of code.
This increasing complexity along with a decreasing average market life expectancy for many software products has heightened concerns over software quality. Software nonperformance and failure are expensive. The media is full of reports of the catastrophic impact of software failure. For example, a software failure interrupted the New York Mercantile Exchange and telephone service to several East Coast cities in ES-1 The Economic Impacts of Inadequate Infrastructure for Software Testing February 1998 (Washington Technology, 1998). Headlines frequently read, “If
Microsoft made cars instead of computer programs, product-liability suits might now have driven them out of business. ” Estimates of the economic costs of faulty software in the U. S. range in the tens of billions of dollars per year and have been estimated to represent approximately just under 1 percent of the nation’s gross domestic product (GDP). “In analyzing repair histories of 13 kinds of products gathered by Consumer Reports, PC World found that roughly 22 percent [of PCs] break down every year— compared to 9 percent of VCRs, 7 percent of bigscreen TVs, 7 percent of clothes dryers and 8 percent of refrigerators” (Barron, 2000).
In actuality many factors contribute to the quality issues facing the software industry. These include marketing strategies, limited liability by software vendors, and decreasing returns to testing and debugging. At the core of these issues is the difficulty in defining and measuring software quality. Common attributes include functionality, reliability, usability, efficiency, maintainability, and portability. But these quality metrics are largely subjective and do not support rigorous quantification that could be used to design testing methods for software developers or support information dissemination to consumers.
Information problems are further complicated by the fact that even with substantial testing, software developers do not truly know how their products will perform until they encounter real scenarios. The objective of this study is to investigate the economic impact of an inadequate infrastructure for software testing in the U. S. The National Institute of Standards and Technology (NIST) undertook this study as part of joint planning with industry to help identify and assess technical needs that would improve the industry’s software testing capabilities.
The findings from this study are intended to identify the infrastructure needs that NIST can supply to industry through its research programs. To inform the study, RTI conducted surveys with both software developers and industry users of software. The data collected were used to develop quantitative estimates of the economic impact of inadequate software testing methods and tools. Two industry groups were selected for detailed analysis: automotive and aerospace equipment manufacturers and financial services providers and related electronic communications equipment manufacturers.
The findings from these two industry groups were then used as the ES-2 Executive Summary basis for estimating the total economic impact for U. S. manufacturing and services sectors. Based on the software developer and user surveys, the national annual costs of an inadequate infrastructure for software testing is estimated to range from $22. 2 to $59. 5 billion. 1 Over half of these costs are borne by software users in the form of error avoidance and mitigation activities. The remaining costs are borne by software developers and reflect the additional testing resources that are consumed due to inadequate testing tools and methods.
ES. 1 ISSUES OF SOFTWARE QUALITY Quality is defined as the bundle of attributes present in a commodity and, where appropriate, the level of the attribute for which the consumer (software users) holds a positive value. Defining the attributes of software quality and determining the metrics to assess the relative value of each attribute are not formalized processes. Compounding the problem is that numerous metrics exist to test each quality attribute. Because users place different values on each attribute depending on the product’s use, it is important that quality attributes be observable to consumers.
However, with software there exists not only asymmetric information problems (where a developer has more information about quality than the consumer), but also instances where the developer truly does not know the quality of his own product. It is not unusual for software to become technically obsolete before its performance attributes have been fully demonstrated under real-world operation conditions. As software has evolved over time so has the definition of software quality attributes. McCall, Richards, and Walters (1977) first attempted to assess quality attributes for software.
His software quality model characterizes attributes in terms of three categories: product operation, product revision, and product transition. In 1991, the International Organization for Standardization (ISO) adopted ISO 9126 as the standard for software quality (ISO, 1991). 1Note that the impact estimates do not reflect “costs” associated with mission critical software where failure can lead to extremely high costs such as loss of life or catastrophic failure. Quantifying these costs was beyond the scope of the study. ES-3 The Economic Impacts of Inadequate Infrastructure for Software Testing
It is structured around six main attributes listed below (subcharacteristics are listed in parenthesis): Z functionality (suitability, accurateness, interoperability, compliance, security) Z reliability (maturity, fault tolerance, recoverability) Z usability (understandability, learnability, operability) Z efficiency (time behavior, resource behavior) Z maintainability (analyzability, changeability, stability, testability) Z portability (adaptability, installability, conformance, replaceability) Although a general set of standards has been agreed on, the appropriate metrics to test how well software meets those standards are still poorly defined. Publications by IEEE (1988, 1996) have presented numerous potential metrics that can be used to test each attribute. These metrics include Z fault density, Z requirements compliance, Z test coverage, and Z mean time to failure. The problem is that no one metric is able to unambiguously measure a particular quality attribute.
Different metrics may give different rank orderings of the same attribute, making comparisons across products difficult and uncertain. ES. 2 SOFTWARE TESTING INADEQUACIES Software testing is the action of carrying out one or more tests, where a test is a technical operation that determines one or more characteristics of a given software element or system, according to a specified procedure. The means of software testing is the hardware and/or software and the procedures for its use, including the executable test suite used to carry out the testing (NIST, 1997). Historically, software development focused on writing code and testing specific lines of that code.
Very little effort was spent on determining its fit within a larger system. Testing was seen as a necessary evil to prove to the final consumer that the product worked. As shown in Table ES-1, Andersson and Bergstrand (1995) estimate that 80 percent of the effort put into early software ES-4 Executive Summary Table ES-1. Allocation of Effort Requirements Analysis 1960s – 1970s 1980s 1990s 40% 20% Preliminary Design 10% Detailed Design Coding and Unit Testing 80% 60% Integration and Test 10% 20% 30% System Test 30% Source: Andersson, M. , and J. Bergstrand. 1995. “Formalizing Use Cases with Message Sequence Charts. ” Unpublished Master’s thesis. Lund Institute of Technology, Lund, Sweden. evelopment was devoted to coding and unit testing. This percentage has changed over time. Starting in the 1970s, software developers began to increase their efforts on requirements analysis and preliminary design, spending 20 percent of their effort in these phases. More recently, software developers started to invest more time and resources in integrating the different pieces of software and testing the software as a unit rather than as independent entities. The amount of effort spent on determining the developmental requirements of a particular software solution has increased in importance. Forty percent of the software developer effort is now spent in the requirements analysis phase.
Software testing infrastructure improvements include enhanced Z integration and interoperability testing tools, automated generation of test code, methods for determining sufficient quality for release, and performance metrics and measurement procedures. Testing activities are conducted throughout all the development phases shown in Table ES-1. Formal testing conducted by independent test groups accounts for about 20 percent of labor costs. However, estimates of total labor resources spent testing by all parties range from 30 to 90 percent (Beizer, 1990). The worldwide market for software testing tools was $931 million in 1999 and is projected to grow to more than $2. 6 billion by 2004 (Shea, 2000).
However, such testing tools are still fairly primitive. The lack of quality metrics leads most companies to simply count the number of defects that emerge when testing occurs. Few organizations engage in other advanced testing techniques, such as forecasting field reliability based on test data and calculating defect density to benchmark the quality of their product against others. Numerous issues affect the software testing infrastructure and may lead to inadequacies. For example, competitive market pressures may encourage the use of a less than optimal amount of time, Z Z Z ES-5 The Economic Impacts of Inadequate Infrastructure for Software Testing esources, and training for the testing function (Rivers and Vouk, 1998), and with current software testing tools developers have to determine whether applications and systems will interoperate. In addition, the need for certified standardized test technology is increasing. The development of these tools and the accompanying testing suites often lag behind the development of new software applications (ITToolbox, 1999). Standardized testing tools, suites, scripts, reference data, reference implementations, and metrics that have undergone a rigorous certification process would have a large impact on the inadequacies listed above. For example, the availability of standardized test data, metrics, and automated test suites for performance testing would make benchmarking tests less costly to perform.
Standardized automated testing scripts along with standard metrics would also provide a more consistent method for determining when to stop testing. In some instances, developing conformance testing code can be more time consuming and expensive than developing the software product being tested. Addressing the high testing costs is currently the focus of several research initiatives in industry and academia. Many of these initiatives are based on modeling finite state machines, combinatorial logic, or other formal languages such as Z (Cohen et al. , 1996; Tai and Carver, 1995; NIST, 1997; Apfelbaum and Doyle, 1997). ES. 3 SOFTWARE TESTING COUNTERFACTUAL SCENARIOS
To estimate the costs attributed to an inadequate infrastructure for software testing, a precise definition of the counterfactual world is needed. Clearly defining what is meant by an “inadequate” infrastructure is essential for eliciting consistent information from industry respondents. In the counterfactual scenarios the intended design functionality of the software products released by developers is kept constant. In other words, the fundamental product design and intended product characteristics will not change. However, the realized level of functionality may be affected as the number of bugs (also referred to as defects or errors) present in released ersions of the software decreases in the counterfactual scenarios. ES-6 Executive Summary An improved software testing infrastructure would allow developers to find and correct more errors sooner with less cost. The driving technical factors that do change in the counterfactual scenarios are when bugs are discovered in the software development process and the cost of fixing them. An improved infrastructure for software testing has the potential to affect software developers and users by Z removing more bugs before the software product is released, Z detecting bugs earlier in the software development process, and Z locating the source of bugs faster and with more precision.
Note that a key assumption is that the number of bugs introduced into software code is constant regardless of the types of tools available for software testing; bugs are errors entered by the software designer/programmer and the initial number of errors depends on the skill and techniques employed by the programmer. Because it may not be feasible or cost effective to remove all software errors prior to product release, the economic impact estimates were developed relative to two counterfactual scenarios. The first scenario investigates the cost reductions if all bugs and errors could be found in the same development stage in which they are introduced.
This is referred to as the cost of an inadequate software testing infrastructure. The second scenario investigates the cost reductions associated with finding an increased percentage (but not 100 percent) of bugs and errors closer to the development stages where they are introduced. The second scenario is referred to as cost reduction from “feasible” infrastructure improvements. For the “feasible” infrastructure improvements scenario, developers were asked to estimate the potential cost savings associated with enhanced testing tools and users were asked to estimate cost savings if the software they purchase had 50 percent fewer bugs and errors. ES. 4
ECONOMIC IMPACT OF AN INADEQUATE SOFTWARE TESTING INFRASTRUCTURE: AUTOMOTIVE AND AEROSPACE INDUSTRIES We conducted a case study with software developers and users in the transportation equipment manufacturing sector to estimate the economic impact of an inadequate infrastructure for software testing. The case study focused on the use of computer-aided design/computer-aided manufacturing/computer-aided engineering ES-7 The Economic Impacts of Inadequate Infrastructure for Software Testing (CAD/CAM/CAE) and product data management (PDM) software. Interviews were conducted with 10 software developers (vendors) and 179 users of these products. Developers of CAD/CAM/CAE and PDM software indicated that in the current environment, software testing is still more of an art than a science, and testing methods and resources are selected based on the expert judgment of senior staff.
Respondents agreed that finding the errors early in the development process greatly lowered the average cost of bugs and errors. Most also indicated that the lack of historic tracking data and inadequate tools and testing methods, such as standard protocols approved by management, available test cases, and conformance specification, limited their ability to obtain sufficient testing resources (from management) and to leverage these resources effectively. Users of CAD/CAM/CAE and PDM software indicated that they spend significant resources responding to software errors (mitigation costs) and lowering the probability and potential impact of software errors (avoidance costs).
Approximately 60 percent of the automotive and aerospace manufacturers surveyed indicated that they had experienced significant software errors in the previous year. For these respondents who experienced errors, they reported an average of 40 major and 70 minor software bugs per year in their CAD/CAM/CAE or PDM software systems. Table ES-2 presents the economic impact estimates for the development and use of CAD/CAM/CAE and PDM software in the U. S. automotive and aerospace industries. The total cost impact on these manufacturing sectors from an inadequate software testing infrastructure is estimated to be $1. 8 billion and the potential cost reduction from feasible infrastructure improvements is $0. 6 billion.
Users of CAD/CAM/CAE and PDM software account for approximately three-fourths of the total impact, with the automotive industry representing about 65 percent and the aerospace industry representing 10 percent. Developers account for the remaining one-fourth of the costs. ES-8 Executive Summary Table ES-2. Cost Impacts on U. S. Software Developers and Users in the Transportation Manufacturing Sector Due to an Inadequate Testing Infrastructure ($ millions) The Cost of Inadequate Software Testing Infrastructure (billions) Software Developers CAD/CAM/CAE and PDM Software Users Automotive Aerospace Total $1,229. 7 $237. 4 $1,840. 2 $373. 1 Potential Cost Reduction from Feasible Infrastructure Improvements (billions) $157. 7 $377. 0 $54. 5 $589. 2 ES. 5
ECONOMIC IMPACT OF AN INADEQUATE SOFTWARE TESTING INFRASTRUCTURE: FINANCIAL SERVICES SECTOR We conducted a second case study with four software developers and 98 software users in the financial services sector to estimate the economic impact of an inadequate infrastructure for software testing. The case study focused on the development and use of Financial Electronic Data Interchange (FEDI) and clearinghouse software, as well as the software embedded in routers and switches that support electronic data exchange. Financial service software developers said that better testing tools and methods used during software development could reduce installation expenditures by 30 percent. All developers of financial services software agreed that an improved system for testing was needed.
They said that an improved system would be able to track a bug back to the point where it was introduced and then determine how that bug influenced the rest of the production process. Their ideal testing infrastructure would consist of close to real time testing where testers could remedy problems that emerge right away rather than waiting until a product is fully assembled. The major benefits developers cited from an improved infrastructure were direct cost reduction in the development process and a decrease in post-purchase customer support. An additional benefit that respondents thought would emerge from an improved testing infrastructure is increased confidence in the quality of the product they produce and ship.
The major selling characteristic of the products they create is the certainty that that product will accomplish a particular task. Because of the real time nature of their products, the reputation loss can be great. ES-9 The Economic Impacts of Inadequate Infrastructure for Software Testing Approximately two-thirds of the users of financial services software (respondents were primarily banks and credit unions) surveyed indicated that they had experienced major software errors in the previous year. For the respondents that did have major errors, they reported an average of 40 major and 49 minor software bugs per year in their FEDI or clearinghouse software systems.
Approximately 16 percent of those bugs were attributed to router and switch problems, and 48 percent were attributed to transaction software problems. The source of the remaining 36 percent of errors was unknown. Typical problems encountered due to bugs were Z increased person-hours used to correct posting errors, Z temporary shut down leading to lost transactions, and Z delay of transaction processing. Table ES-3 presents the empirical findings. The total cost impact on the financial services sector from an inadequate software testing infrastructure is estimated to be $3. 3 billion. Potential cost reduction from feasible infrastructure improvements is $1. 5 billion. Table ES-3. Cost Impacts on U. S. Software Developers and
Users in the Financial Services Sector Due to an Inadequate Testing Infrastructure ($ millions) The Cost of Inadequate Software Testing Infrastructure Software Developers Router and switch FEDI and clearinghouse Software Users Banks and savings institutions Credit unions Total Financial Services Sector $789. 3 $216. 5 $3,342. 5 $1,897. 9 $438. 8 Potential Cost Reduction from Feasible Infrastructure Improvements $975. 0 $225. 4 $244. 0 $68. 1 $1,512. 6 Software developers account for about 75 percent of the economic impacts. Users represented the remaining 25 percent of costs, with banks accounting for the majority of user costs. ES-10 Executive Summary ES. 6 NATIONAL IMPACT ESTIMATES
The two case studies generated estimates of the costs of an inadequate software testing infrastructure for software developers and users in the transportation equipment manufacturing and financial services sectors. The per-employee impacts for these sectors were extrapolated to other manufacturing and service industries to develop an approximate estimate of the economic impacts of an inadequate infrastructure for software testing for the total U. S. economy. Table ES-4 shows the national annual cost estimates of an inadequate infrastructure for software testing are estimated to be $59. 5 billion. The potential cost reduction from feasible infrastructure improvements is $22. 2 billion. This represents about 0. 6 and 0. 2 percent of the U. S. ’s $10 trillion dollar GDP, respectively.
Software developers accounted for about 40 percent of total impacts, and software users accounted for the about 60 percent. Table ES-4. Costs of Inadequate Software Testing Infrastructure on the National Economy The Cost of Inadequate Software Testing Infrastructure (billions) Software developers Software users Total $21. 2 $38. 3 $59. 5 Potential Cost Reduction from Feasible Infrastructure Improvements (billions) $10. 6 $11. 7 $22. 2 ES-11 1 Beizer (1990) reports that half the labor expended to develop a working program is typically spent on testing activities. Introduction to Software Quality and Testing Software is an intrinsic part of business in the late 20th century. Virtually every business in the U. S. n every sector depends on it to aid in the development, production, marketing, and support of its products and services. This software may be written either by developers who offer the shrink-wrapped product for sale or developed by organizations for custom use. Integral to the development of software is the process of detecting, locating, and correcting bugs. In a typical commercial development organization, the cost of providing [the assurance that the program will perform satisfactorily in terms of its functional and nonfunctional specifications within the expected deployment environments] via appropriate debugging, testing, and verification activities can easily range from 50 to 75 percent of the total development cost. Hailpern and Santhanam, 2002) In spite of these efforts some bugs will remain in the final product to be discovered by users. They may either develop “workarounds” to deal with the bug or return it to the developer for correction. Software’s failure to perform is also expensive. The media is full of reports of the catastrophic impact of software failure. For example, a software failure interrupted the New York Mercantile Exchange and telephone service to several East Coast cities in February 1998 (Washington Technology, 1998). More common types of software nonperformance include the failure to 1-1 The Economic Impacts of Inadequate Infrastructure for Software Testing
Z conform to specifications or standards, Z interoperate with other software and hardware, and Z meet minimum levels of performance as measured by specific metrics. “[A] study of personalcomputer failure rates by the Gartner Group discover[ed] that there was a failure rate of 25 percent for notebook computers used in large American corporations” (Barron, 2000). Reducing the cost of software development and improving software quality are important objectives of the commercial U. S. software industry and of in-house developers. Improved testing and measurement can reduce the costs of developing software of a given quality and even improve performance. However, the lack of a commonly accepted measurement science for information technology hampers efforts to test software and evaluate the tests’ results.
Software testing tools are available that incorporate proprietary testing algorithms and metrics that can be used to measure the performance and conformance of software. However, the value of these tools and the metrics they produce depend on the extent to which standard measurements are developed by consensus and accepted throughout the software development and user community (NIST, 1997). Thus, development of standard testing tools and metrics for software testing could go a long way toward addressing some of the testing problems that plague the software industry. “Gary Chapman, director of the 21st Century Project at the University of Texas, noted that ‘repeated experiences with software glitches tend to narrow one’s use of computers to familiar and routine.
Studies have shown that most users rely on less than 10 percent of the features of common programs as Microsoft Word or Netscape Communicator’” (Barron, 2000). Improved tools for software testing could increase the net value (value minus cost) of software in a number of ways: Z reduce the cost of software development and testing; Z reduce the time required to develop new software products; and Z improve the performance, interoperability, and conformance of software. However, to understand the extent to which improvements in software testing metrology could provide these benefits, we must first understand and quantify the costs imposed on industry by the lack of an adequate software testing infrastructure.
The objective of this study is to develop detailed information about the costs associated with an inadequate software testing infrastructure for selected software products and industrial sectors. This section describes the commonly used software quality attributes and currently available metrics for measuring software 1-2 Section 1 — Introduction to Software Quality and Testing quality. It also provides an overview of software testing procedures and describes the impact of inadequate software testing. 1. 1 SOFTWARE QUALITY ATTRIBUTES Software consumers choose which software product to purchase by maximizing a profit function that contains several parameters subject to a budget constraint. One of the parameters in that profit function is quality.
Quality is defined as the bundle of attributes present in a commodity and, where appropriate, the level of the attribute for which the consumer holds a positive value. Defining the attributes of software quality and determining the metrics to assess the relative value of each attribute are not formalized processes. Not only is there a lack of commonly agreed upon definitions of software quality, different users place different values on each attribute depending on the product’s use. Compounding the problem is that numerous metrics exist to test each quality attribute. The different outcome scores for each metric may not give the same rank orderings of products, increasing the difficulty of interproduct comparisons.
McCall, Richards, and Walters (1977) first attempted to assess quality attributes for software. His software quality model focused on 11 specific attributes. Table 1-1 lists those characteristics and briefly describes them. McCall, Richards, and Walters’s characteristics can be divided into three categories: product operation, product revision, and product transition. Z Product operation captures how effective the software is at accomplishing a specific set of tasks. The tasks range from the ease of inputting data to the ease and reliability of the output data. Product operation consists of correctness, reliability, integrity, usability, and efficiency attributes.
Z Product revision measures how easy it is to update, change, or maintain performance of the software product. This category is especially important to this analysis because it is concerned with software testing and the cost of fixing any bugs that emerge from the testing process. Maintainability, flexibility, and testability are three subcharacteristics that fit into this category. 1-3 The Economic Impacts of Inadequate Infrastructure for Software Testing Table 1-1. McCall, Richards, and Walters’s Software Quality Attributes Attribute Product Operation Correctness Reliability Integrity Usability Efficiency Product Revision Maintainability Flexibility Testability Product Transition Interoperability Reusability Portability Description
How well the software performs its required function and meets customers’ needs How well the software can be expected to perform its function with required precision How well accidental and intentional attacks on the software can be withstood How easy it is to learn, operate, prepare input of, and interpret output of the software Amount of computing resources required by the software to perform its function How easy it is to locate and fix an error in the software How easy it is to change the software How easy it is to tell if the software performs its intended function How easy it is to integrate one system into another How easy it is to use the software or its parts in other applications How easy it is to move the software from one platform to another Source: McCall, J. , P. Richards, and G. Walters. 1977. Factors in Software Quality, NTIS AD-A049-014, 015, 055.
November. Z Product transition focuses on software migration. The three main factors that make up this category are the software’s ability to interact with other pieces of software, the frequency with which the software can be used in other applications, and the ease of using the software on other platforms. Three subcharacteristics are interoperability, reusability, and portability. Following McCall, Richards, and Walters’s work, Boehm (1978) introduced several additional quality attributes. While the two models have some different individual attributes, the three categories—product operation, product revision, and product transition—are the same.
As software changed and improved and the demands on software increased, a new set of software quality attributes was needed. In 1991, the International Organization for Standardization (ISO) adopted ISO 9126 as the standard for software quality (ISO, 1991). The ISO 9126 standard moves from three main attributes to six and from 11 subcharacteristics to 21. These attributes are presented in Table 1-2. The ISO standard is based on functionality, reliability, 1-4 Section 1 — Introduction to Software Quality and Testing Table 1-2. ISO Software Quality Attributes Attributes Functionality Subcharacteristics Suitability Accurateness Interoperability Compliance
Definition Attributes of software that bear on the presence and appropriateness of a set of functions for specified tasks Attributes of software that bear on the provision of right or agreed upon results or effects Attributes of software that bear on its ability to interact with specified systems Attributes of software that make the software adhere to applicationrelated standards or conventions or regulations in laws and similar prescriptions Attributes of software that bear on its ability to prevent unauthorized access, whether accidental or deliberate, to programs or data Attributes of software that bear on the frequency of failure by faults in the software Attributes of software that bear on its ability to maintain a specified level of performance in case of software faults or of infringement of its specified interface Attributes of software that bear on the capability to re-establish its level of performance and recover the data directly affected in case of a failure and on the time and effort needed for it Attributes of software that bear on the users’ effort for recognizing the logical concept and its applicability Attributes of software that bear on the users’ effort for learning its application Attributes of software that bear on the users’ effort for operation and operation control Attributes of software that bear on response and processing times and on throughput rates in performing its function Attributes of software that bear on the amount of resources used and the duration of such use in performing its function Attributes of software that bear on the effort needed for diagnosis of deficiencies or causes of failures or for identification of parts to be modified Attributes of software that bear on the effort needed for modification, fault removal, or environmental change Attributes of software that bear on the risk of unexpected effect of modifications Attributes of software that bear on the effort needed for validating the modified software (continued) Security Reliability Maturity Fault tolerance Recoverability Usability Understandability Learnability Operability Efficiency Time behavior Resource behavior Maintainability Analyzability Changeability Stability Testability 1-5 The Economic Impacts of Inadequate Infrastructure for Software Testing Table 1-2. ISO Software Quality Attributes (continued) Attributes Portability Subcharacteristics Adaptability
Definition Attributes of software that bear on the opportunity for its adaptation to different specified environments without applying other actions or means than those provided for this purpose for the software considered Attributes of software that bear on the effort needed to install the software in a specified environment Attributes of software that make the software adhere to standards or conventions relating to portability Attributes of software that bear on opportunity and effort using it in the place of specified other software in the environment of that software Installability Conformance Replaceability Source: ISO Standard 9126, 1991. sability, efficiency, maintainability, and portability. The paradigms share several similarities; for example, maintainability in ISO maps fairly closely to product revision in the McCall paradigm, and product transition maps fairly closely to portability. There are also significant differences between the McCall and ISO paradigms. The attributes of product operation under McCall’s paradigm are specialized in the ISO model and constitute four major categories rather than just one. The ISO standard is now widely accepted. Other organizations that set industry standards (e. g. , IEEE) have started to adjust their standards to comply with the ISO standards. 1. 2 SOFTWARE QUALITY METRICS
Although a general set of standards has been agreed upon, the appropriate metrics to test how well software meets those standards are still poorly defined. Publications by IEEE (1988, 1996) have presented numerous potential metrics that can be used to test each attribute. Table 1-3 contains a list of potential metrics. The problem is that no one metric is able to unambiguously measure a particular attribute. Different metrics may give different rank orderings of the same attribute, making comparisons across products difficult and uncertain. 1-6 Section 1 — Introduction to Software Quality and Testing Table 1-3. List of Metrics Available Metric Fault density Defect density Cumulative failure profile Fault-days number Functional or modular test coverage Cause and effect graphing
Requirements traceability Defect indices Error distribution(s) Software maturity index Person-hours per major defect detected Number of conflicting requirements Number of entries and exits per module Software science measures Graph-theoretic complexity for architecture Cyclomatic complexity Minimal unit test case determination Run reliability Design structure Mean time to discover the next K-faults Software purity level Metric Estimated number of faults remaining (by seeding) Requirements compliance Test coverage Data or information flow complexity Reliability growth function Residual fault count Failure analysis elapsed time Testing sufficiently Mean time to failure Failure rate Software documentation and source listing Rely-required software reliability Software release readiness Completeness Test accuracy System performance reliability Independent process reliability Combined hardware and software (system) availability The lack of quality metrics leads most companies to simply count the number of defects that emerge when testing occurs.
Few organizations engage in other advanced testing techniques, such as forecasting field reliability based on test data and calculating defect density to benchmark the quality of their product against others. This subsection describes the qualities of a good metric, the difficulty of measuring certain attributes, and criteria for selecting among metrics. 1. 2. 1 What Makes a Good Metric Several common characteristics emerge when devising metrics to measure product quality. Although we apply them to software 1-7 The Economic Impacts of Inadequate Infrastructure for Software Testing development, these metrics are not exclusive to software; rather they are characteristics that all good metrics should have: Z Simple and computable: Learning the metric and applying the metric are straightforward and easy tasks.
Z Persuasive: The metrics appear to be measuring the correct attribute. In other words, they display face validity. Z Consistent and objective: The results are reproducible. Z Consistent in units or dimensions: Units should be interpretable and obvious. Z Programming language independent: The metrics should not be based on specific tasks and should be based on the type of product being tested. Z Gives feedback: Results from the metrics give useful information back to the person performing the test (Pressman, 1992). 1. 2. 2 What Can be Measured Regardless of the metric’s quality, certain software attributes are more amenable to being measured than other attributes.
Not surprisingly, the metrics that are easiest to measure are also the least important in eliminating the uncertainty the consumer faces over software quality. Pressman (1992) describes the attributes that can be measured reliably and consistently across various types of software programs: Z effort, time, and capital spent in each stage of the project; Z number of functionalities implemented; Z number and type of errors remediated; Z number and type of errors not remediated; Z meeting scheduled deliverables; and Z specific benchmarks. Interoperability, reliability, and maintainability are difficult to measure, but they are important when assessing the overall quality of the software product.
The inability to provide reliable, consistent, and objective metrics for some of the most important attributes that a consumer values is a noticeable failure of software metrics. 1. 2. 3 Choosing Among Metrics Determining which metric to choose from the family of available metrics is a difficult process. No one unique measure exists that a developer can use or a user can apply to perfectly capture the 1-8 Section 1 — Introduction to Software Quality and Testing concept of quality. For example, a test of the “cyclomatic” complexity of a piece of software reveals a significant amount of information about some aspects of the software’s quality, but it does not reveal every aspect. In addition, there is the potential for measurement error when the metric is applied to a piece of software. For example, mean time to failure metrics are not measures of certainty; rather they are measures that create a distribution of outcomes. Determining which metric to use is further complicated because different users have different preferences for software attributes. Some users care about the complexity of the software; others may not. The uncertainty over which metric to use has created a need to test the validity of each metric. Essentially, a second, observable, comprehensive and comparable set of metrics is needed to test and compare across all of the software quality metrics.
This approach helps to reduce the uncertainty consumers face by giving them better information about how each software product meets the quality standards they value. To decide on the appropriate metric, several potential tests of the validity of each metric are available (IEEE, 1998). For a metric to be considered reliable, it needs to have a strong association with the underlying quality construct that it is trying to measure. IEEE standard 1061-1998 provides five validity measures that software developers can apply to decide which metrics are most effective at capturing the latent quality measure: 1. Linear correlation coefficients—Tests how well the variation in the metrics explains the variations in the underlying quality factors.
This validity test can be used to determine whether the metric should be used when measuring or observing a particular quality factor is difficult. 2. Rank correlation coefficients—Provides a second test for determining whether a particular metric can be used as a proxy for a quality factor. The advantage of using a rank order correlation is that it is able to track changes during the development of a software product and see if those changes affect software quality. Additionally, rank correlations can be used to test for consistency across products or processes. 1Cyclomatic complexity is also referred to as program complexity or McCabe’s complexity and is intended to be a metric independent of language and language format (McCabe and Watson, 1994). -9 The Economic Impacts of Inadequate Infrastructure for Software Testing 3. Prediction error—Is used to determine the degree of accuracy that a metric has when it is assessing the quality of a particular piece of software. 4. Discriminative power—Tests to see how well a particular metric is able to separate low quality software components from high quality software components. 5. Reliability—If a metric is able to meet each of the four previous validity measures in a predetermined percentage of tests then the metric is considered reliable. 1. 3 SOFTWARE TESTING Software testing is the process of applying metrics to determine product quality.
Software testing is the dynamic execution of software and the comparison of the results of that execution against a set of pre-determined criteria. “Execution” is the process of running the software on a computer with or without any form of instrumentation or test control software being present. “Predetermined criteria” means that the software’s capabilities are known prior to its execution. What the software actually does can then be compared against the anticipated results to judge whether the software behaved correctly. The means of software testing is the hardware and/or software and the procedures for its use, including the executable test suite used to carry out the testing (NIST, 1997).
Section 2 of this report examines in detail the various forms of software testing, the common types of software testing being conducted and the available tools for software testing activities. In many respects, software testing is an infrastructure technology or “infratechnology. ” In many respects, software testing is an infrastructure technology or “infratechnology. ” Infratechnologies are technical tools, including scientific and engineering data, measurement and test methods, and practices and techniques that are widely used in industry (Tassey, 1997). Software testing infratechnologies provide the tools needed to measure conformance, performance, and interoperability during the software development. These tools aid in testing the relative performance of different software onfigurations and mitigate the expense of reengineering software after it is developed and released. Software testing infratechnologies also provide critical information to the software user regarding the quality of the software. By increasing quality, purchase decision costs for software are reduced. 1-10 Section 1 — Introduction to Software Quality and Testing 1. 4 THE IMPACT OF INADEQUATE TESTING Currently, there is a lack of readily available performance metrics, procedures, and tools to support software testing. If these infratechnologies were available, the costs of performance certification programs would decline and the quality of software would increase.
This would lead to not only better testing for existing products, but also to the testing of products that are not currently tested. The impact on the software industry due to lack of robust, standardized test technology can be grouped into four general categories: Z increased failures due to poor quality, Z increased software development costs, Z increased time to market due to inefficient testing, and Z increased market transaction costs. 1. 4. 1 Failures due to Poor Quality The most troublesome effect of a lack of standardized test technology is the increased incidence of avoidable product defects that emerge after the product has been shipped.
As illustrated in Table 1-4, in the aerospace industry over a billion dollars has been lost in the last several years that might be attributed to problematic software. And these costs do not include the recent losses related to the ill-fated Mars Mission. Large failures tend to be very visible. They often result in loss of reputation and loss of future business for the company. Recently legal action has increased when failures are attributable to insufficient testing. Table 1-4. Recent Aerospace Losses due to Software Failures Airbus A320 (1993) Aggregate cost Loss of life Loss of data 3 Ariane 5 Galileo Poseidon Flight 965 (1996) $640 million 160 Yes Lewis Pathfinder USAF Step (1997) $116. 8 million Zenit 2 Delta 3 Near (1998) $255 million DS-1 Orion 3 Galileo Titan 4B (1999) $1. 6 billion Yes Yes Yes
Note: These losses do not include those accrued due to recent problems with the Mars Mission. Source: NASA IV&V Center, Fairmount, West Virginia. 2000. 1-11 The Economic Impacts of Inadequate Infrastructure for Software Testing Software defects are typically classified by type, location introduced, when found, severity level, frequency, and associated cost. The individual defects can then be aggregated by cause according to the following approach: Z Lack of conformance to standards, where a problem occurs because the software functions and/or data representation, translation, or interpretation do not conform to the procedural process or format specified by a standard.
Z Lack of interoperability with other products, where a problem is the result of a software product’s inability to exchange and share information (interoperate) with another product. Z Poor performance, where the application works but not as well as expected. 1. 4. 2 Increased Software Development Costs Historically, the process of identifying and correcting defects during the software development process represents over half of development costs. Depending on the accounting methods used, testing activities account for 30 to 90 percent of labor expended to produce a working program (Beizer, 1990). Early detection of defects can greatly reduce costs.
Defects can be classified by where they were found or introduced along the stages of the software development life cycle, namely, requirements, design, coding, unit testing, integration testing, system testing, installation/acceptance testing, and operation and maintenance phases. Table 1-5 illustrates that the longer a defect stays in the program, the more costly it becomes to fix it. 1. 4. 3 Increased Time to Market The lack of standardized test technology also increases the time that it takes to bring a product to market. Increased time often results in lost opportunities. For instance, a late product could potentially represent a total loss of any chance to gain any revenue from that product. Lost opportunities can be just as damaging as post-release product failures. However, they are notoriously hard to measure.
If standardized testing procedures were readily available, testers would expend less time developing custom test technology. Standardized test technology would accelerate development by decreasing the need to 1-12 Section 1 — Introduction to Software Quality and Testing Table 1-5. Relative Costs to Repair Defects when Found at Different Stages of the Life-Cycle Life Cycle Stage Requirements Design Coding Unit Testing Integration Testing System Testing Installation Testing Acceptance Testing Operation and Maintenance (1976) study. Baziuk (1995) Study Costs to Repair when Found 1Xb Boehm (1976) Study Costs to Repair when Founda 0. 2Y 0. 5Y 1. 2Y 90X 90X-440X 440X 470X-880Xc 5Y 15Y Assuming cost of repair during requirements is approximately equivalent to cost of repair during analysis in the Boehm bAssuming cost to repair during requirements is approximately equivalent to cost of an HW line card return in Baziuk (1995) study. cPossibly as high as 2,900X if an engineering change order is required. Z develop specific test software for each implementation, Z develop specific test data for each implementation, and Z use the “trial and error” approach to figuring out how to use nonstandard automated testing tools. 1. 4. 4 Increased Market Transaction Costs Because of the lack of standardized test technology, purchasers of software incur difficulties in comparing and evaluating systems.
This information problem is so common that manufacturers have warned purchasers to be cautious when using performance numbers (supplied by the manufacturer) for comparison and evaluation purposes. Standardized test technology would alleviate some of the uncertainty and risk associated with evaluating software choices for purchase by providing consistent approaches and metrics for comparison. 1-13 2 Software Testing Methods and Tools Software testing is the action of carrying out one or more tests, where a test is a technical operation that determines one or more characteristics of a given software element or system, according to a specified procedure. The eans of software testing is the hardware and/or software and the procedures for its use, including the executable test suite used to carry out the testing (NIST, 1997). This section examines the various forms of software testing, the types of software testing, and the available tools for software testing. It also provides a technical description of the procedures involved with software testing. The section begins with a brief history of software development and an overview of the development process. 2. 1 HISTORICAL APPROACH TO SOFTWARE DEVELOPMENT The watershed event in the development of the software industry can be traced to 1969, when the U. S.
Justice Department forced IBM to “unbundle” its software from the related hardware and required that the firm sell or lease its software products. Prior to that time, nearly all operating system and applications software had been developed by hardware manufacturers, dominated by IBM, or by programmers in the using organizations. Software developers in the 1950s and 1960s worked independently or in small teams to tackle specific tasks, resulting in customized one-of-a-kind products. Since this landmark government action, a software development market has emerged, and software developers and engineers have moved through several development paradigms (Egan, 1999). 2-1
The Economic Impacts of Inadequate Infrastructure for Software Testing During the 1970s, improvements in computing capabilities caused firms to expand their use of automated information-processing tasks, and the importance of programming to firms’ activities increased substantially. Simple tools to aid software development, such as programming languages and debugging tools, were introduced to increase the software programmer’s productivity. The introduction of the personal computer and its widespread adoption after 1980 accelerated the demand for software and programming, rapidly outpacing these productivity improvements. Semiconductor power, oughly doubling every 18 months, has dramatically outpaced the rate of improvement in software, creating a “software bottleneck. ” Although software is easily mass-produced, allowing for economies of scale, the entrenched customized approach to software development was so strong that economies of scale were never realized. The historic approach to the software development process, which focused on system specification and construction, is often based on the waterfall model (Andersson and Bergstrand, 1995). Figure 2-1 shows how this process separates software development into several distinct phases with minimal feedback loops. First, the requirements and problem are analyzed; then systems are designed to address the problem.
Testing occurs in two stages: the program itself is tested and then how that program works with other programs is tested. Finally, normal system operation and maintenance take place. Feedback loops only exist between the current stage and its antecedent and the following stage. This model can be used in a component-based world for describing the separate activities needed in software development. For example, the requirements and design phase can include identifying available reusable software. Feedback loops throughout the entire development process increase the ability to reuse components. Reuse is the key attribute in component-based software development (CBSD).
When building a component-based program, developers need to examine the available products and how they will be integrated into not only the system they are developing, but also all other potential systems. Feedback loops exist throughout the process and each step is no longer an isolated event. 2-2 Section 2 — Software Testing Methods and Tools Figure 2-1. Waterfall Model Requirements Analysis and Definition System and Software Design Implementation and Unit Testing Integration and System Testing Operation and Maintenance Adapted from Andersson and Bergstrand (1995), Table 2-1 illustrates where software developers have placed their efforts through time. In the 1960s and 1970s, software development focused on writing code and testing specific lines of that code. Very little effort was spent on determining its fit within a larger system.
Testing was seen as a necessary evil to prove to the final consumer that the product worked. Andersson and Bergstrand estimate that 80 percent of the effort put into early software development was devoted to coding and unit testing. This percentage has changed over time. Starting in the 1970s, software developers began to increase their efforts on requirements analysis and preliminary design, spending 20 percent of their effort in these phases. Additionally, software developers started to invest more time and resources in integrating the different pieces of software and testing the software as a system rather than as independent entities (units).
The amount of effort spent on determining the developmental requirements of a particular software solution has increased in importance. Forty percent of the software developer effort is now spent in the requirements analysis phase. Developers have also increased the time spent in the design phase to 30 percent, which 2-3 The Economic Impacts of Inadequate Infrastructure for Software Testing Table 2-1. Allocation of Effort Requirements Analysis 1960s – 1970s 1980s 1990s 40% 20% Preliminary Design 10% Detailed Design Coding and Unit Testing 80% 60% Integration and Test 10% 20% 30% System Test 30% Source: Andersson, M. , and J. Bergstrand. 1995. Formalizing Use Cases with Message Sequence Charts. ” Unpublished Master’s thesis. Lund Institute of Technology, Lund, Sweden. reflects its importance. Design phases in a CBSD world are extremely important because these phases determine the component’s reuse possibilities. 2. 2 SOFTWARE TESTING INFRASTRUCTURE Figure 2-2 illustrates the hierarchical structure of software testing infratechnologies. The structure consists of three levels: Z software test stages, Z software testing tools, and Z standardized software testing technologies. Software testing is commonly described in terms of a series of testing stages. Within each testing stage, testing tools are used to conduct the analysis.
Standardized testing technologies such as standard reference data, reference implementations, test procedures, and test cases (both manual and automated) provide the scientific foundation for commercial testing tools. This hierarchical structure of commercial software-testing infratechnologies illustrates the foundational role that standardized software testing technologies play. In the following subsections, we discuss software testing stages and tools. 2. 2. 1 Software Testing Stages Aggregated software testing activities are commonly referred to as software testing phases or stages (Jones, 1997). A software testing stage is a process for ensuring that some aspect of a software product, system, or unit functions properly.
The number of software testing stages employed varies greatly across companies and 2-4 Section 2 — Software Testing Methods and Tools Figure 2-2. Commercial Software Testing Infrastructure Hierarchy Stages General Subroutine New Function Regression Integration System Specialized Stress Error Recovery Security Performance Platform Viral User-Involved Usability Field Lab Acceptance Tools: • Test Design • Test Execution and Evaluation • Accompanying and Support Tools Standardized Software Testing Technologies Procedure Tests Automated Scripts Reference Data Reference Value Reference Implementation Test Suites Manual Scripts applications. The number of stages can range from as low as 1 to as high as 16 (Jones, 1997).
For large software applications, firms typically use a 12-stage process that can be aggregated into three categories: Z General testing stages include subroutine testing, unit testing, new function testing, regression testing, integration, and system testing. Z Specialized testing stages consist of stress or capacity testing, performance testing, platform testing and viral protection testing. Z User-involved testing stages incorporate usability testing and field testing. After the software is put into operational use, a maintenance phase begins where enhancements and repairs are made to the software. During this phase, some or all of the stages of software testing will be repeated. Many of these stages are common and well understood by the commercial software industry, but not all 2-5
The Economic Impacts of Inadequate Infrastructure for Software Testing companies use the same vocabulary to describe them. Therefore, as we define each software stage below, we identify other names by which that stage is known. General Testing Stages General testing stages are basic to software testing and occur for all software (Jones, 1997). The following stages are considered general software testing stages:1 Z subroutine/unit testing Z new function testing Z regression testing Z integration testing Z system testing Specialized Testing Stages Specialized software testing stages occur less frequently than general software testing stages and are most common for software with well-specified criteria.
The following stages are considered specialized software testing stages: Z stress, capacity, or load testing Z error-handling/survivability testing Z recovery testing Z security testing Z platform testing stage Z viral protection testing stage User-Involved Testing Stages For many software projects, the users and their information technology consultants are active participants at various stages along the software development process, including several stages of testing. Users generally participate in the following stages. Z usability testing Z field or beta testing Z lab or alpha testing Z acceptance testing 1All bulleted terms listed in this section are defined in Appendix A. 2-6 Section 2 — Software Testing Methods and Tools 2. 2. 2
Commercial Software Testing Tools A software testing tool is a vehicle for facilitating the performance of a testing stage. The combination of testing types and testing tools enables the testing stage to be performed (Perry, 1995). Testing, like program development, generates large amounts of information, necessitates numerous computer executions, and requires coordination and communication between workers (Perry, 1995). Testing tools can ease the burden of test production, test execution, test generation, information handling, and communication. Thus, the proper testing tool increases the effectiveness and efficiency of the testing process (Perry, 1995).
This section categorizes software testing tools under the following headings: Z test design and development tools; Z execution and evaluation tools; and Z accompanying and support tools (which includes tools for planning, reviews, inspections, and test support) (Kit, 1995). Many of the tools that have similar functions are known by different names. Test Design and Development Tools Test design is the process of detailing the overall test approach specified in the test plan for software features or combinations of features and identifying and prioritizing the associated test cases. Test development is the process of translating the test design into specific test cases.
Tools used for test design and development are referred to as test data/case generator tools. As this name implies, test data/case generator tools are software systems that can be used to automatically generate test data/cases for test purposes. Frequently, these generators only require parameters of the data element values to generate large amounts of test transactions. Test cases can be generated based on a user-defined format, such as automatically generating all permutations of a specific, user-specified input transaction. The following are considered test data/case generator tools: Z data dictionary tools Z executable specification tools 2-7 The Economic Impacts of Inadequate Infrastructure for Software Testing
Z exhaustive path-based tools Z volume testing tools Z requirements-based test design Test Execution and Evaluation Tools Test execution and evaluation is the process of executing test cases and evaluating the results. This includes selecting test cases for execution, setting up the environment, running the selected tests, recording the execution activities, analyzing potential product failures, and measuring the effectiveness of the effort. Execution tools primarily are concerned with easing the burden of running tests. Execution tools typically include the following. Z capture/playback tools Z test harnesses and drivers tools Z memory testing tools Z instrumentation tools Z snapshot monitoring tools Z system log reporting tools Z coverage analysis tools Z mapping tools Simulation tools are also used to test execution.
Simulation tools take the place of software or hardware that interacts with the software to be tested. Sometimes they are the only practical method available for certain tests, like when software interfaces with uncontrollable or unavailable hardware devices. These include the following tools: Z disaster testing tools Z modeling tools Z symbolic execution tools Z system exercisers Accompanying and Support Tools In addition to the traditional testing tools discussed above, accompanying and support tools are frequently used as part of the overall testing effort. In the strict sense, these support tools are not considered testing tools because no code is usually being executed as part of their use.
However, these tools are included in this discussion because many organizations use them as part of their 2-8 Section 2 — Software Testing Methods and Tools quality assurance process, which is often intertwined with the testing process. Accompanying tools include tools for reviews, walkthroughs, and inspections of requirements; functional design, internal design, and code are also available. In addition, there are other support tools such as project management tools, database management software, spreadsheet software, and word processors. The latter tools, although important, are very general in nature and are implemented through a variety of approaches.
We describe some of the more common testing support tools: Z code comprehension tools Z flowchart tools Z syntax and semantic analysis tools Z problem management tools 2. 3 SOFTWARE TESTING TYPES Software testing activities can also be classified into three types: Z Conformance testing activities assess the conformance of a software product to a set of industry wide standards or customer specifications. Z Interoperability testing activities assess the ability of a software product to interoperate with other software. Z Performance testing activities assess the performance of a software product with respect to specified metrics, whose target values are typically determined internally by the software developer.
In the following subsections, we define the roles played by each of the three types of software testing in the software development process. 2. 3. 1 Conformance Testing Conformance testing activities assess whether a software product meets the requirements of a particular specification or standard. These standards are in most cases set forth and agreed upon by a respected consortium or forum of companies within a specific sector, such as the Institute of Electrical and Electronics Engineers, Inc. (IEEE) or the American National Standards Institute (ANSI). They reflect a commonly accepted “reference system,” whose standards recommendations are sufficiently defined and tested by 2-9
The Economic Impacts of Inadequate Infrastructure for Software Testing certifiable test methods. They are used to evaluate whether the software product implements each of the specific requirements of the standard or specification. For router software development: Z Conformance testing verifies that the routers can accurately interpret header information and route data given standard ATM specification. Z Interoperability testing verifies that routers from different vendors operate properly in an integrated system. Performance testing measures routers’ efficiency and tests if they can handle the required capacity loading under real or simulated scenarios. Z
One of the major benefits of conformance testing is that it facilitates interoperability between various software products by confirming that each software product meets an agreed-upon standard or specification. Because of its broad usefulness, conformance testing is used in most if not all of the software testing stages and by both software developers and software users. Conformance testing methodologies have been developed for operating system interfaces, computer graphics, document interchange formats, computer networks, and programming language processors. Conformance testing methodologies typically use the same concepts but not always the same nomenclature (NIST, 1997).
Since the specifications in software standards are complex and often ambiguous, most testing methodologies use test case scenarios (e. g. , abstract test suites, test assertions, test cases), which themselves must be tested. Standardization is an important component of conformance testing. It usually includes developing the functional description and language specification, creating the testing methodology, and “testing” the test case scenarios. Executable test codes, the code that tests the scenarios, have been developed by numerous organizations, resulting in multiple conformance testing products on the market. However, many rigorous testing methodology documents have the capability to measure quality across products.
Sometimes an executable test code and the particular hardware/software platform it runs on are accepted as a reference implementation for conformance testing. Alternatively, a widely successful commercial software product becomes both the defacto standard and the reference implementation against which other commercial products are measured (NIST, 1997). 2. 3. 2 Interoperability Testing Interoperability testing activities, sometimes referred to as intersystems testing, assess whether a software product will exchange and share information (interoperate) with other products. Interoperability testing activities are used to determine whether the proper pieces of information are correctly passed between