Software Defects and non-Software Defects

Some years ago I've had the pleasure to work with somebody who didn't know anything about software engineering but was smart and ambitious. He had majored in civil engineering in college and his beginner's (fresh, unspoiled) view on some basic software engineering concepts taught me a lot about the nature of software engineering.

One question in particular gave me some pause. He couldn't understand why developers didn't produce bug-free software. In civil engineering, after all, the results are by and large bug-free, very much unlike software – he argued. Cars don't spontaneously explode; buildings don't collapse, but software crashes all the time.

On one level, it is easy to dismiss his point. Physical stuff is defective, sometimes obviously , sometimes subtly, sometimes dubiously. The defects just look very different.

But come to think about it, there is a difference. The defect rate is certainly higher in software than in the physical world (true for hardware, and definitely true for large physical structures). There are lots of tiny bugs, and some that are incredibly frustrating (for example, bugs that cause your computer to crash and lose that paper you've been working on). We tolerate this higher defect rate because the bugs affect us less than, say, a defective bridge would [1]. Moreover, software isn't bound by laws of physics, chemistry, and material sciences. It's bound by information and complexity theory, but compared to the former, it's a much more lenient power. Those factors help us build faster and more cheaply than we otherwise would, which keeps the software evolving, keeps new products coming, and keeps innovation going. You can't build a good bridge in a weekend, but you could build a good website.

The flip side of software allowing us this freedom is that as we build on top of old software, which is built on top of even older software, the complexity of our solutions increases. A typical product consists of a "stack" that may be 20 or 30 layers of abstraction deep. That's a lot that can go wrong. To make things worse, software interacts with a rich, nonlinear environment. There are relatively few variables to consider as "inputs" to a bridge (for instance, the weight of the objects that cross it) while there are hundreds, if not thousands of "inputs" to an operating system.

But there are also factors which can be mitigated. Maybe we tolerate buggy software to a fault, giving engineering teams the latitude to cut more corners than would be optimal, to ensure a less frustrating user experience. Moreover, software engineering is still rather immature – we've been building bridges for thousands of years, but writing software only for seventy or so.  As we standardize our practices, we will get better at managing the input and environment complexity. Our code will become shorter, smarter and more expressive. Software engineering will continue to borrow from other fields (as it has done, for example, with the lean manufacturing model) [2]. As new paradigms, frameworks and best practices emerge, we should expect software to be less crappy.

While it's easy to think that software engineering as just another process that generates defects, it's helpful to look at it from a broader point of view. Let's not get complacent about software engineering only because it's more complex. Let's use other disciplines to show us how we are deficient, and let's address these deficiencies.

 

 

[1] This is usually true, but not always. There have been some very expensive software mistakes in the past. 

[2] Conversely, precisely due to its complexity, software engineering has had to work out a bag of tricks that I think other engineering sciences should adopt. Unit testing or continuous integration come to mind.