How to Diagnose Problems

 The ability to diagnose problems is one of those incredibly useful skills to have in life. From experience and observation, I can say that very few people possess it. While it’s a bit of an art, there are some simple principles that people just don’t seem to apply.

You may think that diagnosing problems is an ability that’s really only useful in a narrow domain, such as programming or fixing cars. But we come across problems all the time, even when we don’t realize it (A clogged drain in the bathroom. Tivo doesn’t want to record your favorite show. You don’t seem to be losing any weight despite being on a diet). So here are the basic principles. Nothing in here will be a surprise to anyone; the bulk of the work is in internalizing those principles–and being able to apply them when solving problems.

Define the problem.
What is the problem exactly? Can you concisely yet precisely formulate what the problem is? If you have to use words such as “broken”, this is usually an indication that you haven’t really defined the problem well. How does the problem manifest itself? It’s very useful not to mix diagnosis with the formulation of the problem; at this point your diagnosis may not be correct and you may be incorrectly leading yourself down a wrong path from the very beginning. A useful trick is to try to simply phrase the problem as something that constrains you, i.e. something that prevents you from doing something.

Replicate the problem.
Can you show me the problem manifesting itself? Most problems in the world are easily replicable; for example, if the TV doesn’t turn on, it doesn’t turn on. But some problems are more tricky than that–maybe your brakes make a noise only when the engine has warmed up enough; if you just turn the car on, you won’t hear the noise. Being able to show the problem occurring is very useful as it allows you to test your theories easily. Note that at this stage you don’t need to figure out all circumstances under which the problem occurs: you just need one.

Contain the problem.
Try to replicate the problem with as simple a setup as possible. This will eliminate a lot of the factors that would otherwise have to be included in the diagnosis. A lot of people either assume that most of the steps they did are necessary for the problem to occur (very rarely is a problem a coincidence of a large number of factors), or assume that most of the steps they did are irrelevant (surprisingly frequently is a problem due to something that one implicitly rules out early in the diagnosis stage).

An Aside: on Theories
Note that while you will certainly run across theories about the problem as you try to replicate it and contain it (and sometimes even define it), you should be careful not to jump to conclusions too quickly. I think a lot of smart people are also lazy, and this combination manifests itself as impatience when diagnosing problems. There’s nothing wrong with having theories, and in fact, experienced diagnosers will be able to come up with a theory with very little information. But what I see a lot is people coming up against a problem they can’t solve and struggling to solve it because they are either stuck with theories that don’t work and can’t find better ones, or went down the wrong path early on in the diagnosis (when I was a teaching assistant in a computer science class back in college, I would help students find bugs in their software. They would be taken aback at the simple questions I asked them; “Of course that’s not the problem,” they would say. I’d tell them, “If everything is an ‘of course that’s not the problem’ yet you can’t fix the bug, at least one of the ‘of course that’s not the problem’s is probably the problem”.

So hold off on theories for as long as you can–if you ran across a hard problem (which is really the domain we’re interested in here), you are likely to want to solve it, and only as a second-order thing to solve it quickly.

Locate the problem.
Chances are, by now you probably will have located the problem to a large extent. If you haven’t, it’s the natural next step: determine where the problem lies. There is only a subtle difference between locating a problem and testing theories, so you will invariably be doing the latter as well. Start with your (small) setup needed to replicate the problem. Vary your setup to draw conclusions about where the problem is (or, more likely, where it is not). This will help you narrow down a set of theories that you will have to try out by a substantial amount. It’s a good idea to do easy things first (to gather as much information as possible about the location of the problem), but of course there’s a bit of an art in trading off cost of the experiment against the expected amount of useful information it will give you.

For example, I once had a clogged drain in my sink. Here is what I did–compare this with the steps above:

  • The problem was this: the sink would accumulate water quickly so I couldn’t have the tap running too long or I’d flood my bathroom.
  • I could easily replicate the problem by turning on cold water to its maximum and waiting for one minute. The sink would fill up with water.
  • I noted that I don’t need to use cold water; hot water would do as well. I also didn’t need to turn it up to the max–at some point (maybe half way through) water would start accumulating
  • Locating the problem is one of my favorite parts of diagnosis. First, I poured water down the “safety” drain located on the upper part of the sink (the one that prevents the water from spilling out of the sink). I could not replicate the problem. This means the problem is somewhere between the hole in the sink and the part where the drain meets the safety drain. This is great because I no longer needed to unscrew the “U”-shaped part in the drain (it was below where the safety drain flows). If I had started with a theory (“there’s probably a bunch of disgusting stuff blocking the ‘U’-shaped part”), I would have wasted some time.

Note that normally we just do this all implicitly, in our heads, but there again, a clogged drain is not a hard problem. It’s a good idea to be a little pedantic once to get a feel for what diagnosing problems well means.

Finally, form theories and test them out
If you’ve gotten to this point, you are likely dealing with a hard problem. Good!–because this is where the most art comes in. The hard part about forming theories is that you have to find theories that help you prune your search tree as much as possible, in as cost effective way as possible.

The “search tree” thing is very important. I cannot stress it enough. Each theory you test eliminates a class of problems. So if you imagine the set of all possible problems, each theory you test in the sequence will eliminate a subset of them. It’s much better to eliminate most of the problems first (this gives you fewer problems left, so fewer theories to test). It’s a little bit like the game of 20 questions–it’s better to ask a generic question that eliminates half the possibilities first, than a specific question that with high probability eliminates a very small set. Of course you don’t know the probabilities or even the eliminatory power of your theories so this is where the art comes in. You have to trade off three things:

  • How expensive is it to test your theory?
  • In the best case, how many problems can the test eliminate?
  • How likely is the best case?

Again, if you’re stuck with a problem, you will likely not care about being the most efficient so really you should strive for your theories to simply eliminate as many potential problems as possible (ideally in a balanced way, just like the questions a good “20 Questions” player would ask).

The nice thing about a problem tree that you thus construct is that if it’s balanced, you will not need to test many theories — just like 20 questions should be enough to find a thing from amongst a million things! However, one thing I see people do over and over again is forget where they were on the tree and either redo many tests, thus getting no useful information, or perform tests that are irrelevant given where in the problem tree they are (for example, if I tried to see if the “U”-shaped part is clogged after I figured out that it’s not the problem).