Art of Computer Software Debugging
Part-2 Model Based Defect Analysis
The goal of defect analysis is to identify the root cause of defective behavior. Defective behavior is the result of a design, implementation or deployment flaw in a software system. Model based analysis focuses on using a model of the relevant aspects of the system for identifying the flaw. This approach involves the following sequence of activities:
Each of these activities requires different kinds of skills, experience and tools. In the rest of this article, we will discuss each of these in detail. Finally, we will go through a real-life example.
Model The System
The goal of creating a model for the system is to then identify a candidate flaw, which can result in the defective behavior. For the purpose of this discussion, a model of the system is a model that excludes those details of the system that are either irrelevant to the defective behavior of the system or those that are not well understood at the start of defect analysis.
Over the years, we have noticed that defect analysis and problem analysis are the activities that take more time than any other single activity in maintenance projects.
Modeling a defective system often requires good basic understanding of how a similar system is designed and built. In other words, the engineer working on defect analysis should have expertise in the meta-model of the system.
For example, while trying to analyze a defect in the compiler, it is important to know that a compiler consists of a parser, code generator, optimizer, assembler and linker. In addition, knowledge of different kinds of parsers, Code generation technologies, optimizations is essential. Often, these skills should be acquired prior to defect analysis, either as a result of focused learning or prior experience working on similar issues.
Even if you have little or no prior experience, you can still complete this step quickly, by going through system documentation and building a very high level mental model of the system, consistent with your current knowledge.
Identify a Candidate Flaw(s)
Once a model of the system is created, you should ask the question:
Which particular flaw, when introduced into the model, can cause the observed defective behavior in the system ?
Obviously, there are one or more answers to this question (or the system will function correctly !). Often, the ability to identify a flaw increases with one's experience. After analyzing hundreds of defects you (your mind) will build a knowledge base of flaws that cause defective behavior.
Additional information can often be collected by observing logs and trace messages from the software. [well designed systems will optionally generate good logs and trace messages that help in defect analysis]
Test Candidate Flaw (s)
Once you identify a candidate flaw, you should then go ahead and test if your hypothesis is correct using your favorite method:
Refine and Repeat Analysis
It is often not possible to complete defect analysis in one iteration. Successive iterations are generally necessary.
Here are some ways for identifying/refining the candidate flaw:
Here are some ways for refining the model:
If you have spent too much time using the debugger and are unable to make progress with defect analysis, it is clear that you have to refine the model of the system before you make progress. Similarly, if you have spent lot of time analyzing code without making progress, it is time to use the debugger to test your candidate flaw.
Defective Behavior: Context: I was maintaining an optimizer and working on some benchmarks. Defect: I noticed that a program ran faster (and hence had a better benchmark results) when it was renamed ! Defect Analysis: Iteration-1: Model: (Relevent model for the observed behavior) - Better aligned programs run faster - The name of the program is provided as an argument to the program (argv) - Name of program impacts program alignment Candidate Flaw - The processor uses a 16-byte cache line. May be data section is not aligned on a 16-byte boundary, reducing performance ? Test Candidate Flaw - Found that the data section is aligned correctly. Iteration-2: Refine Candidate Flaw - Is it possible that the stack is not aligned on 16-byte boundary. Test Candidate Flaw - From assembly source file: Stack is aligned on 16-byte boundary Iteration-3: Refine Candidate Flaw - I used the debugger and found that the c start-up code which calls the main function, was placing the arguments on stack, and was only aligning the stack to a 4-byte boundary. As a result, when the program name changed, sometimes the stack gets aligned on a 16-byte boundary improving the performance of the benchmark program. CONCLUSION: Root cause of the problem is the fact that the C startup code is not aligning the stack to 16-byte boundary. The FORTRAN main was called from C start, and as a result was not having its initial stack aligned on 16-byte boundary either.