What is a defect? Before answering this question, we recommend reading the related white paper, "The HY-SPECS TM Development Process", to provide context and perspective. What is a defect? A defect is an imperfection in some component of a development artifact, that causes the component to at least partially fail in achieving its purpose.
Example 1: The software limits the usable portion of a buffer
to N-1 bytes rather than N bytes as indicated in the design spec.
Example 2: The design spec does not address one of the
functional requirements listed in the requirements spec.
Defects can be classified under 2 types of imperfection, inconsistency or incompleteness. Inconsistency exists when 2 different components contradict each other. In Example 1 above, the design spec component (a sentence/paragraph/etc) stated N bytes, while the implementation spec component (a constant/function/etc) stated N-1 bytes, a contradiction. Incompleteness exists when a component that should exist doesn't, ie, something is missing. In Example 2 above, the design spec component that should have expanded upon the requirement was missing.
The implicit purpose of every development artifact and its components is to specify and further decompose the development artifact/components from the previous higher level in the development process. When a development artifact component fails to achieve this purpose, it is defective.
In cases of inconsistency, the defect usually resides in the lower level, derived artifact. In Example 1 above, the implementation spec (the software) is defective. This is not always the case though. Perhaps N-1 bytes is correct, or perhaps the correct units are words or buffer elements instead of bytes, in which case both artifacts are defective. Additional information from an independent source should be gathered to resolve issues of inconsistency.
In cases of incompleteness, the defect almost always resides in the artifact that is missing the component. In Example 2 above, the design spec is defective. Again, it is not always the case; perhaps the requirements spec has not yet been updated to reflect the removal of one of the functional requirements. As in cases of inconsistency, additional information should be gathered to resolve the issue.
Generally speaking, incompleteness is more difficult to detect by a person looking for defects, than inconsistency. And once found, it involves more work to correct, than inconsistency.
What is testing? Testing is the act of verification by means of experimental use. A development artifact is used within a controlled environment (ie, controlled input and state); and its effect on the environment (ie, its output and state) is compared against expected results, in order to confirm that the artifact is as desired. Traditionally the artifact has been executable software; but documents can also be tested based on how readers react, for example, design document inspections and beta testing of user manuals. The rest of this white paper will focus primarily on executable software.
Dijkstra, the 1972 ACM Turing Award winner and a strong proponent of formal verification, once said something along the lines of: testing can never prove the absence of defects, only their presence. Yet, the vast majority of software developers intuitively know that "enough" testing will prove that the software is correct, and that testing is almost always more efficient than formal verification. Why is this so? Primarily for the following 2 reasons.
Firstly, the executable software's operational profiles (ie, the types of actual use) are usually few in number and narrow in scope. This greatly reduces the scenarios for testing, from those that might exist, to those that will exist. Secondly and more formally, even though there may be an infinite number of software states and inputs, these can be partitioned into a finite number of equivalence classes, based on the particular details of the software, so that for each equivalence class, correct operation for any one element directly implies correct operation for every element of the class. Selective testing, based on operational profiles and equivalence classes, is both highly effective and highly efficient in finding defects.
Orthogonal defect classification (ODC), whose goals include process improvement and risk analysis, is a scheme of assigning defects to particular points in a multi-dimensional discrete cartesian defect space. Each axis of the defect space is orthogonal to all the others, ie, a disjoint defect attribute is identified, unique from the others. Along each axis is a finite spanning set of discrete values for the attribute, chosen and named to capture all possible values in a meaningful way. Unfortunately most companies that try to use orthogonal defect classification choose axes that are not orthogonal, and attribute value sets that do not span the range of possible values. Moreover, they usually choose so many axes (6 to 15), that it becomes difficult for a person to visualize the defect space.
One orthogonal defect classification scheme, that achieves all 3 goals of orthogonality, spanning, and easy visualization, utilizes the framework already put into place by the HY-SPECS TM development process. A 5-dimensional defect space is defined; see Appendix I for an illustrative picture. The first axis is the development phase of the defective artifact component, whose spanning set is { concept, requirements, design, implementation }. The second axis is the type of imperfection, whose spanning set is { inconsistency, incompleteness }. The remaining 3 axes are often tailored to the particular project or customer, but have their basic spanning sets as indicated. The third axis is the product feature, whose spanning set is { functional, performance }. The fourth axis is the original problem discoverer, whose spanning set is { developer, tester, user }. The fifth axis is seriousness, whose spanning set is { crucial, important, trivial }. Spanning sets may be defined with more detail; for example, the product feature attribute may be expanded to a lengthy/exhaustive list of all the functional and performance related features, of the particular product.
In industry the following types of testing are commonly mentioned (listed in no particular order): white box, black box, environmental, configuration, compatibility, performance, volume, stress, usability, regression, object-oriented, on-line, off-line, alpha, beta, GUI, data base, client/server, real-time, package, unit, integration, system, acceptance, functional, engineering, and QA testing. Claims such as "test plans with assurance of full coverage" are common. One would think that with so much testing going on, software would be defect-free by the time it shipped. Yeah, right. The sad truth is that much of testing is misdirected, performed poorly, and generally ineffective in meeting the goals of verifying development artifact correctness (ie, confirming requirements are met, defects are minimized, etc).
Before going into an orthogonal spanning taxonomy of testing, the notions of "system" and "scale" need to be discussed. Everything is a system unto itself. Every system is composed of smaller (sub) systems; every system is part of a larger (super) system, ad infinitum (practically) in both directions (open universe and quantum physics). Thus any idea of system scale must be relative to an arbitrarily chosen reference, typically the "system" that is the intended product of the project, and described in the project's concept spec. A small signed integer, which counts how many identified abstraction levels exist between the system under test and the project level system, is taken as the scale, negative values for sub systems, positive for super systems. For example, the customer's encompassing system would have a scale of +1, while the most detailed design element might have a scale of -5.
A 4-dimensional testing space is defined; see Appendix II for an illustrative picture. The first axis is the system scale, whose spanning set is the set of integers which cover the different levels of system abstraction as described above. The second axis is progression, whose spanning set is { enhancement, regression }. The third axis is visibility, whose spanning set is { private, public }. The fourth axis (pseudo axis) is the degree of coverage, whose spanning set is { light, moderate, heavy, full }. Full coverage can usually be obtained only if the scale of the system under test is small enough to permit the finding of equivalence classes for its state and inputs; in industry this is almost never done.
Scale:
Again, spanning sets may be defined with more detail if doing so seems appropriate for the project. The testing taxonomy above, applied to any development artifact component, is sufficient to capture the full range of test types seen in industry, and does so in a more structured logical manner. For example, the customer's final acceptance test is really a public regression test of the concept-level system, at a degree of coverage specified by the customer.
Testing is very tightly integrated into the development process. At the detailed level of implementation, most of the low level source code is developed in parallel with adjunct heavy and full coverage test harnesses for that code. At progressively higher levels of abstraction, a logarithmically decreasing number of test harnesses of lighter coverage are put into place. The main reason for this is efficiency, due to combinatorics and accessibility of state information.
For example, suppose there are 5 components that all interact with 5 other components. There are 25 combinations then. Now suppose that instead of attempting to test all 25 combinations, you test each component individually, which is only 10 tests. By definition, testing a component includes testing its interface(s). Add a few more tests of the combined super system for "good measure", ie, redundancy, and the testing is just as effective for roughly half the cost. In truth, the few extra tests are not just for "good measure"; they exist to also try to catch any defects missed by the non-full coverage tests on the individual components. Roughly speaking, as the number of component combinations increases multiplicatively, the testing cost increases only additively -- a substantial savings in larger systems.
Another reason for this fine grained approach is that the state information of the software is much more accessible, making it much easier to debug problems. If a defect exists way down deep in the bowels of a large system, it will be much more difficult to access the state information needed to debug the problem. With current software technology, the smaller the system under test or under investigation, the easier it is to access the needed state information.
ASCII Pictogram of 5-D Orthogonal Spanning Defect Space:
---------------------------------------------- seriousness / +---+---+ / / discoverer ^ / |\ \ \ / / \ | / + +---+---+ / / \ | / |\| | | / / \| / + +---+---+ / / +-->feature / \| | | / / / +---+---+ / / / | / / / | / / / | / / / | / / / | / / / | / / imperfection / |/ / / /-----------------------------------x---------/ / / / / +-->phase / / / ----------------------------------------------
ASCII Pictogram of 4-D Orthogonal Spanning Testing Space:
visibility +----------+----------+ ^ / / /| | progression / / / | | / +----------+----------+()+ | / / / /| /| |/ / / ( ) / |/ | +-------->scale +----------+----------+ + + | . | | /| / coverage is indicated | | |/ |/ by dot/sphere size +----------+----------+ + | | o | / | | |/ +----------+----------+
Copyright © 1999 UnixYes, Inc., All Rights Reserved.
HY-SPECS, the checkmark logo, and We don't do windows!
are trademarks of UnixYes, Inc.
All other trademarks are held by their respective owners.