Taxonomies

One of the most valuable abilities a person can boast, in my view, is the ability to classify.  It's very closely related to the ability to think in terms of layers of abstraction, since categories are just abstractions on top of the objects being classified.

Most people are really bad at any kind of categorization – they either simply don't do it (just look at people's desktops on their computers) or come up with very poor categorizations and as a result find it difficult to locate finds in a large set or synthesize properties of sets efficiently (which are the two operations that good classifications make trivial; and the two operations that are actually fairly commonly needed.  There is also a lot of money to be made on good categorization systems, for example, in systems that allow customers to search for products to purchase).

A friend of mine pointed out that taxonomies are dangerous.  I will agree with him: to create a classification system for the sake of it is not only wasteful, but also risks inaccurate generalizations being made.  But a good classification, supported with the goals of that classification, is invaluable.

Some principles that should guide a good taxonomy are:

  • Unique representation: everything should have a single, deterministic place in the hierarchy
  • Meaningful dimensions: ideally you should be able to express each dimension (or category) in as few words as possible.  Arbitrary divisions don't make it easy to find things and make for a weak hierarchy, even if they allow you to bifurcate your set of objects right down the middle
  • Reasonably sized dimensions: in a perfect classification, each added property halves the number of items in it.  This will, of course, never be true but there are good ways to split the set into by-and-large equivalently-sized sets.  This balances the categorization (it won't take a large number of dimensions to describe an object – for a perfect classification, you only need 12 bits of information to classify four thousand objects, which with a good category system, may mean three dimensions that each take one of sixteen values
  • Separable dimensions: ideally each dimension should be fully disjoint from all other – if shouldn't matter if you apply a condition first or last.  Unfortunately, most times, the further dimensions vary depending on the values of the prior dimensions.  For a good example, visit Amazon.com and see how the filters change based on what category of items you select.  If the dimensions are separable, you can more efficiently find things by picking the relevant dimension first