The consulting industry and entropy

I like entropy because it’s such a simple concept that’s easy to define (a measure of disorder) and universal (it looks the same throughout the universe). Its definition doesn’t depend on other words that can be loaded (such as “happiness” or “language” or “intelligence” or even “evolution”) so it’s a good term to use as the basis of a framework for thinking about life.

Entropy in the thermodynamic sense has a close equivalent in the field of information theory. In fact, its equivalent is so close that it’s also called entropy. It is a testament to the universality of the concept — in fact, I truly believe that I could define everything I can think of in terms of entropy. Perhaps one day I’ll publish a dictionary that does precisely that.

In the meantime, my friend, wishing to see me suffer, challenged me to define a seemingly unrelated thing in terms of entropy. Here it goes.

I’m intrigued by the consulting industry, specifically management consulting. At first I had a somewhat cynical view of it (in fact, as I saw many of my friends prepare for interviews that seemed to ask nothing but Fermi problems. Ironically, most of my friends didn’t know they were Fermi problems… here’s a cheat sheet if you’re preparing for a consulting interview). I interacted with consultants at work (and saw the best of them in the movie Office Space) and found it very difficult to understand how they can be adding any value. After all, they don’t bring any skills to the table. Most importantly, there was an unsettling stench of slickness to many of the consultants I’ve interacted with, almost as if every conversation was a game to be won or lost (or–again, taking a cynical view–as if it was through the slick conversations that the consultants were making an impression of adding value).

Then I thought about it some more, in the abstract, starting from what I imagined to be the history of consulting. I had this impression of the consulting industry as being pioneered by a few incredibly smart people (university professors perhaps)–I call them “founders”–who worked out through the theory of efficient management, information flow and social interations and determined some theoretical framework that they published in a seminal work in the mid-1950s. Following them were ambitious and innovative entrepreneurs who decided to put their theory in practice–they worked out the kinks that usually prevent the elegant theory from being applied to the real world (this is also probably the reason why we hear about groundbreaking research in battery technology yet nothing seems to reach the mass market). I call these people “visionaries”. They came up with principles and set up the first consulting companies. Over time, as it always happens, the principles were lost and the companies lost sight of their mission and their roots; it all became a matter of making money (incidentally, this happens to a lot of industries which is why sophisticated enough companies all look the same, regardless of what they do). Surely, then, in theory consultants add value, even though what we see today obscures it well. Let’s find it (and let’s use entropy to explain it!)

Entropy in the information theoretical sense measures how much information some data contains. A string of zeros: 0000000000 contains no information while a random string of zeros and ones: 001101001011101 contains plenty of information. In other words, entropy tells you how predictable the data is. We can also talk about entropy rate–a measure of the “density” of information. For example, English text has an entropy rate of about 1 bit per letter, which means that if you were to represent English in the most efficient way (but without losing any information), Shakespeare’s Romeo and Juliet would take up about 169 thousand bits. Note that the most popular representation of English today on a computer is to use ASCII (assign an 8-bit sequence to each letter) at which point the same play would take up about 1.3 million bits in such an encoding–the amount of information varies depending on the encoding and this is why entropy assumes you find the most efficient such encoding.

Of course entropy rate depends very much on the domain of the information. While entropy rate of English is 1 bit per character, entropy rate of, say, the scripts of soap operas is much lower than that. It’s because what is said in soap operas is much more predictable–fewer sophisticated words are used.

While it’s hard to find the theoretically correct value for the entropy rates of data, a good first-order approximation is to compress it using a good lossless compression algorithm (for example, make a zip of the data) and look at the compression ratio. For example, Romeo and Juliet takes up 165kB (which translates to the aforementioned 1.3 million bits). Once you compress it, the text takes up 63kB. The compression ratio of 38% means that the entropy rate of Romeo and Juliet is about (38%*8 bits per character that ASCII uses) = 3 bits per character. More than regular English, but, there again, Shakespeare is a little less predictable than your average Joe (plus, there’s overhead of the compression program itself, and it’s not ideal, too).

Now the punch line: consultants are often brought in to synthesize information and come up with recommendations. While the recommendations are usually commonsense, the synthesis is hard. They have to interview lots of people, look through reams of documents, stare at various charts and graphs. This is a lot of information. At the end of their engagement they come up with a powerpoint document that summarizes the important findings. It’s usually an incredibly dense powerpoint — carefully chosen words, terse bullet lists, not even full sentences, articles and prepositions are the first ones to go. If they’ve done the synthesis right, they were able to fairly losslessly (given what they were supposed to recommend on) compress the information into a powerpoint. Hence, the way to assess the value of their work, simply find the entropy rate of their deliverables. High entropy means high value.

You don’t even have to bother reading the powerpoint document!