The Art of Computer Software DebuggingIntroductionA large percentage of all software engineers maintain programs and routinely deal with poorly written programs that are bloated and twisted out of shape over years of maintenance, absence of any documentation and even worse, misleading documentation. While most engineers do a good job of maintaining software, there is always a minority that excel at maintaining software developed by others! There appears to be a silent process at work when such an engineer looks at a defect description and starts to identify what is wrong with the system. A clear understanding of this process will certainly make it easier to break down the debugging process into a sequence of well-defined steps. What is Debugging ?Debugging (in the context of software engineering) refers to the process of identifying the cause for defective behavior of a system and addressing that problem. In less complex terms - fixing a bug. A significant percentage of all the time spent on maintenance is often spent on debugging activities. Life Cycle of a Debugging TaskLet us assume that a defect has been identified in a software system, here are roughly the various steps involved in debugging: (a)Defect Identification/Confirmation
(b) Defect AnalysisAssuming that the software engineer concludes that the defect is genuine, the focus shifts to understanding the root cause of the problem. This is often the most challenging step in any debugging task, particularly when the software engineer is debugging complex software. Many engineers debug by starting a debugging tool, generally a debugger and try to understand the root cause of the problem by following the execution of the program step-by-step. This approach may eventually yield success. However, in many situations, takes too much time, and in some cases is not feasible, either due to the complex nature of the program(s). (c) Defect ResolutionOnce the root cause of a problem is identified, the defect can then be resolved by making an appropriate change to the system, which fixes the root cause. [we will revisit this in the future Defect Analysis RevisitedLet us now define a few terms more formally:Expected Behavior: The ideal behavior of the system when it encounters a set of inputs. Actual Behavior: The actual behavior of the system when it encounters a set of inputs. Defective Behavior: A system shows a defective behavior, when the actual behavior deviates from the expected behavior. Root Cause of the Problem: The earliest instant when a defective system deviates from the expected behavior. Often, it may take some more time before such deviation becomes observable to the user of a system. Root Cause Analysis: Root Cause Analysis, is the process of identifying the root cause of a problem. It now becomes clear that defect analysis basically consists of root cause analysis. IntroductionThe goal of defect analysis is to identify the root cause of defective behavior. Defective behavior is the result of a design, implementation or deployment flaw in a software system. Model based analysis focuses on using a model of the relevant aspects of the system for identifying the flaw. This approach involves the following sequence of activities:
Each of these activities requires different kinds of skills, experience and tools. In the rest of this article, we will discuss each of these in detail. Finally, we will go through a real-life example. Model the SystemThe goal of creating a model for the system is to then identify a candidate flaw, which can result in the defective behavior. For the purpose of this discussion, a model of the system is a model that excludes those details of the system that are either irrelevant to the defective behavior of the system or those that are not well understood at the start of defect analysis. Over the years, we have noticed that defect analysis and problem analysis are the activities that take more time than any other single activity in maintenance projects. Modeling a defective system often requires good basic understanding of how a similar system is designed and built. In other words, the engineer working on defect analysis should have expertise in the meta-model of the system. For example, while trying to analyze a defect in the compiler, it is important to know that a compiler consists of a parser, code generator, optimizer, assembler and linker. In addition, knowledge of different kinds of parsers, Code generation technologies, optimizations is essential. Often, these skills should be acquired prior to defect analysis, either as a result of focused learning or prior experience working on similar issues. Even if you have little or no prior experience, you can still complete this step quickly, by going through system documentation and building a very high level mental model of the system, consistent with your current knowledge. Identify a Candidate Flaw(s)Once a model of the system is created, you should ask the question:# Which particular flaw, when introduced into the model, can cause the observed defective behavior in the system? Obviously, there are one or more answers to this question (or the system will function correctly!). Often, the ability to identify a flaw increases with one's experience. After analyzing hundreds of defects you (your mind) will build a knowledge base of flaws that cause defective behavior. Additional information can often be collected by observing logs and trace messages from the software. [Well designed systems will optionally generate good logs and trace messages that help in defect analysis] Test Candidate Flaw (s)Once you identify a candidate flaw, you should then go ahead and test if your hypothesis is correct using your favourite method:
Refine and Repeat AnalysisIt is often not possible to complete defect analysis in one iteration. Successive iterations are generally necessary.Here are some ways for identifying/refining the candidate flaw:
If you have spent too much time using the debugger and are unable to make progress with defect analysis, it is clear that you have to refine the model of the system before you make progress. Similarly, if you have spent lot of time analyzing code without making progress, it is time to use the debugger to test your candidate flaw. ExampleDefective Behavior:Context:I was maintaining an optimizer and working on some benchmarks.Defect:I noticed that a program ran faster (and hence had a better benchmark results) when it was renamed !Defect Analysis:Iteration-1:Model: (Relevant model for the observed behavior)
Iteration-2:Refine Candidate Flaw
Iteration-3:Refine Candidate Flaw
CONCLUSION:Root cause of the problem is the fact that the C startup code is not aligning the stack to 16-byte boundary. The FORTRAN main was called from C start, and as a result was not having its initial stack aligned on 16-byte boundary either. [Author: Gopi Bulusu, Sankhya Technologies].For more information, email info@sankhya.com |