Clay Shirky
What we think we know about categorization is wrong. Because we're holding onto old outmoded techniques for categorization.

Q: what is Ontology. A: It depends on what the meaning of "is" is. The study of what exists in a domain and how do these elements relate.

The parable of the Travel Agent.
Travel agents exist to distribute the interface between a handful of airlines and a large number of consumers. The web replaces this so the TAs claim they add value. What's surprising is that the internet plays tried to use the same argument. They tried to recapitulate the old order rather than undermine it. It took some time for people to realise the problem had changed.

Classification schemes. Periodic table. Best classification scheme ever. Almost perfect. Context shifts where a whole column were labelled "gasses" where that's only true at some temporary ranges.

Libraries are the commonest classification system. And have huge fundamental mistakes. eg Dewey scheme category religion is all Christian. Library of congress treats Asia and Switzerland as equivalent in size. The essence is actually "number of books" about this topic. Optimises linear shelf space. Not reality. Unfortunately librarians now are using the same approach in the digital domain where shelf space is irrelevant. The argument like travel agents is that they are recapitulating what went before instead of undermining it.

Yahoo grew into a hierarchy of categories. So they hired a professional ontologist. Who built a huge tree. They said "we understand this better than you". They felt they couldn't organise the world without the shelf so they added the shelf back in. And so we get a tree structure. But the world isn't tree structured. So add a few cross links. So let's have a hierarchy with lots and lots of links. But the ontologists said "get outta here" and limited them to a maximum of 3.

In reality, there are lots of links and no tree. And Google took over because there is no filing system. There's only links. Google bought DMOZ, but nobody used it so they downgraded it.

When does ontological organisation work well? Small corpus, formal categories, stable entities, clear edges, coordinated users, expert users, expert cataloguers, authoritative source. Note: ontologists often claim the users don't understand the categories. And see this as a user's problems.

Turn it around and you have where it works badly. And that is a perfect description of the web. Huge scale, uncoordinated users, no authority.

Voodoo categorisation. Act on the model and it changes the world. Classify an SUV as a small truck and it becomes popular. Signal Loss. Ontologists claim that synonyms fail. But actually synonyms refer to different things.

Predicting the future is hard. A. This is a book about Dresden. B. This book is about Dresden, and goes into the category "East Germany". Ooops. Countries are radically different to cities. One is an idea, the other is physical. But we can't change it because we don't have the staff to move the books. Absolutely key. Categorization requires predicting the future.

"My God, it's full of links". Adventures in scale pt.1 Don't merge categories, merge the GUIDs.

Great minds don't think alike. Adventures in scale pt.2 power law distribution of people and numbers of tags they've done. Long tail. classic sign of an unconstrained population behaviour. Look at number of entries for tags for one person, and it's another power law. 10% of the tags have 90% of the entries. Now look at 2 URLs and study the tags used against them. A lot of entries have very clear convergence. Some URLs have classic power law curves with less consensus. Which gives us a measure of the certainty of the popular tags.

Organic Categorization
- Market logic: individual motivation but group value.
- Merged from URLs (links), not categories
- Merges create overlap, not sync
- Merges are probabilistic not binary
- User and time are core attributes
- Signal loss comes from expression not compression
- One off categories are ignored, rather than deflected. Filtering is after the publishing. (very deep idea here).
- The semantics are in the users, not in the system. Does the world make sense or do we make sense of the world. Objective vs subjective. Recognises that there are alternate views.

(note: If you don't understand Unix, you are doomed to re-invent it. There is only World, Group and User)

