Web geeks have long fantasized about a Web taxonomy: a classification scheme that would encompass the entire Web; not just pages, but also page content such as images and blog posts. Yahoo!'s hierarchical directory is an impressive attempt at a kind of Web taxonomy, but it's a weighty construction that doesn't make finding things on the Web all that much easier or faster. Even worse, it's a classification scheme that has been imposed from on high by Yahoo!'s information mavens. But top-down is the old way of doing things on the Web. We now live in the age of the long tail, the collective influence and power of the small sites and users that make up the vast majority of the Web. The big corporations might make up the "head" of the Web, but the hundreds of millions of personal sites, blogs, BitTorrent peers, and Flickr photo albums, not to mention the hundreds of millions of users roaming the Web, these comprise the beast's massively long tail.
What does this long tail model have to do with a Web taxonomy? It tells us that hundreds of millions of people can probably classify what they see and interact with on the Web more efficiently, more comprehensively, and more usefully than a small group of Yahoo! managers. In other words, we won't get a true Web taxonomy until the process switches from top down to ground up.
And that's just we we're starting to see happen all over the Web. At sites such as Flickr, del.icio.us, and Furl, ordinary users are creating their own taxonomic schemes. Only this isn't taxonomy — it's folksonomy, an ad hoc classification scheme that Web surfers invent as-they-surf as a way of categorizing the data they find online. It's also called (take a deep breath) folk categorization, communal categorization, ethnoclassification, distributed classification, social classification, faceted hierarchy, and mob indexing.
Folksonomists apply descriptive keywords — or tags — to the objects they come across. (Which explains a few other synonyms for folksonomy: folk tagging, open tagging, social tagging, and free tagging.) Social software — software that enables users to share information and collaborate online — makes these tags available to other users, who can then take advantage of all this tagging to search for the information they need. At the del.icio.us site, for example, users bookmark interesting pages and assign tags to each site, and those tags can then be searched. This is called social bookmarking and it has caught the attention of some big players, not least of which are the taxonomists at Yahoo!, which not long ago launched My Web 2.0, a social bookmarking service.
Sites such as Flickr (for photos) and Technorati (for blogs) maintain tag clouds, a list of the tags used on the site, although with some kind of visual indication of each tag's relative popularity. (At the Guardian newspaper, they call their tag cloud a folksonomic zeitgeist.) For example, the most popular tags are often shown with the largest font. Some sites even keep track of the tags that each user has applied in the past, the idea being that the user might be inclined to reuse those tags in the future. De.licio.us calls each user's tag cloud a tagroll (a play on blogroll, a blogger's list of links to other blogs that he or she reads).
But how can non-professional taggers hope to create a taxonomy that's as sophisticated as one that true Web mavens would make? The answer lies in something called the architecture of participation: services get better as the number of users increases. The canonical example is BitTorrent, where each user acts as both client (seed) and server (peer). Files are downloaded by taking small chunks from any peers who have the file and who are online. The more peers that are online, the faster the download. For folksonomy, the more folks there are applying tags, the more effective is the result. The writer Bruce Sterling calls the folksonomically enhanced Web, "common wisdom squared."
Folksonomies are not perfect, to be sure. Non-standard tags are problem — one Flickr user might tag a photo of a certain kind of retriever as "flat-coated," another as "flatcoated," and a third as "flatcoat" — and 1- or 2-word tags lack a certain amount of precision.
On the other hand, folksonomy isn't meant to be a Google-killer. It is, instead, a kind of experiment in collective intelligence, that hallmark of what some people are calling Web 2.0 (a prolific language factory that will be the topic of a future column). One person can be pretty smart, but 10,000 or 100,000 people are almost always going to be smarter. The New Yorker's finance writer James Surowiecki calls it the wisdom of crowds, and we're starting to see some pretty big crowds in the folksonomy space: Technorati and del.icio.us each have tens of thousands of users, while Flickr boasts over 400,000. That's a lot of folks.
This post appeared originally as my Technically Speaking column in the February 2006 issue of IEEE Spectrum.