in

Paul McFedries' Tech Tonic

Making the world a better place, one computer book at a time

Lingua Techna

Technology, language, and technical writing (plus some interesting stuff, too)

February 2008 - Posts

  • The Web, Take Two

    Please God, just one more bubble!
    —Silicon Valley bumper sticker

    The English language is a veritable assembly line of new words and phrases. Inventive wordsmiths in all fields are constantly forging new additions to the lexicon by blending words, attaching morphemic tidbits to existing words, and creating neologisms out of thin air. Some of these new words strike a chord in popular culture and go through what I call the cachet-to-cliché syndrome. In other words, the word is suddenly on the lips of cocktail party participants and water-cooler conversationalists everywhere, and on the fingertips of countless columnists and editorialists. As soon as the word takes root, however, the backlash begins. Rants of the if-I-hear-the-word-x-one-more-time-I'll-scream variety start to appear, Lake Superior State University includes the word in its annual list of phrases that should be stricken from the language, and so on.

    If there's a technology buzzphrase that looks like it might go through this linguistic rags-to-riches story right now it's probably Web 2.0. Coined by Dale Dougherty of O'Reilly Media in 2004, this lexico-meme is everywhere: Google returns tens of millions of hits; Factiva (a database of news articles from thousands of media sources) lists over 1,500 citations; the blog search engine Technorati returns nearly 100,000 posts; and O'Reilly hosts an annual Web 2.0 Conference.

    So what the heck is it? That's a good question, but it's unfortunately a devilishly difficult one to answer. Web 2.0 is one of those terms that resists definition, either because the concept is too amorphous, too pie-in-the-sky to have any real meaning, or because the underlying phenomenon is so huge and important that it will burst the shackles of any attempt to pin it down. Here's my provisional (and somewhat stuffy, I admit) definition: A second phase in the evolution of the World Wide Web where developers use new technologies to create websites that look and act like desktop programs and encourage collaboration and communication between users.

    Whatever Web 2.0 is, the one thing you can say for sure is that it's trailing a boatload of new words and phrases in its wake. We looked at some of these neologisms back in the February column: tagging, folksonomy, long tail, architecture of participation, and collective intelligence. In other words, one of the hallmarks of Web 2.0 is user created and maintained content, what some are calling peer production (and others, apparently with straight faces, are calling the user content ecosystem). Wikis — collaborative websites that allow users to add, edit, and delete the site's content — are pure Web 2.0, with the famous (or, on some days, infamous) Wikipedia encyclopedia being the canonical example. Allowing users that much control over site content is an experiment in radical trust.

    The 2.0-ness of a site also depends strongly on how closely the site mimics a desktop application; that is, the site offers a rich user experience. The rallying cry here is the web as platform, or, as Microsoft's Ray Ozzie has said, a platform of platforms, since every Web 2.0 site is a kind of mini-platform on its own. You can see this in action in web services such as Gmail (gmail.google.com) for e-mail, Flickr (www.flickr.com) for photo sharing, and Writely (ww.writely.com) for word processing. Most Web 2.0 sites use Ajax (asynchronous JavaScript and XML), which may now be the most famous collection of programming technologies on the planet.

    Web 2.0 sites are database-driven — what some are now calling infoware — and often supply application programming interfaces (APIs) that enable users to create entirely new services that combine data from two different sources. These are called web application hybrids or, more popularly, mashups. (You may know this term from its older meaning: a musical piece created by combining two songs, particularly the music of one song and the vocals of the other.) The data from such sites is said to be play-enabling and have hackability or user remixability. The first (and possibly still the best) example is HousingMaps.com, created by graphic artist and programmer Paul Rademacher, which uses the Google Maps API to map apartment and house rental data from craigslist.com.

    Of course, it's also possible that all of these Web 2.0 buzzwords are just a bunch of hype as people who missed out on the dot-com bubble of the later 1990s try to breathe life into a new expansion that they can cash in on (building to flip, in the vernacular). This side of Web 2.0 is captured perfectly in the definition proposed by Greg Knaus in the Devil's Dictionary 2.0:

    Web 2.0, proper noun: The name given to the social and technical sophistication and maturity that mark the—Oh, screw it. Money! Money money money! Money! The money's back! Ha ha! Money!

    I'll let you decide.

    IEEE Spectrum, June 2006This post appeared originally as my Technically Speaking column in the June 2006 issue of IEEE Spectrum.

  • Gone Phishin'

    For the past few months I've been beta-testing Internet Explorer 7. It comes with a number of new features but, as a language watcher, the feature that most interested me was the Phishing Filter. Hunh? Microsoft, as corporate and mainstream as a tech company can get, is using the jargon term phishing in its flagship web browser? At first I figured this must be some sort of code name, but no, it's the actual name of the feature.

    This small ripple in the linguistic pool is a reflection not of a newfound coolness on Microsoft's part, but of the phishing phenomenon itself, particularly how pervasive it has become and how the theory and seriousness of this vulnerability are easily grasped by most folks.

    Phishing refers to creating a replica of an existing web page to fool a user into submitting personal, financial, or password data. The term comes from the sad fact that Internet scammers are using increasingly sophisticated lures as they go about "fishing" for users' sensitive data. Hackers have an endearing tendency to change the letter "f" to "ph," so "fishing" becomes "phishing." (The f-to-ph transformation is not new among hackers. It first appeared in the late 1960s among telephone system hackers, who called themselves phone phreaks. There are still plenty of these phreaks around today, but often their targets are more modern. A good example is VoIPhreaking, which involves hacking the Voice over IP telephony system.)

    The most common ploy used by phishers is to copy the web page code from a major site-such as AOL or eBay-and use that code to set up a replica page that appears to be legitimate. (This is why phishing is also called brand spoofing.) A fake email is distributed with a link to this page, which solicits the user's credit card data or password. (If it's the latter, then the page is called a password trap.) When the user submits the form, the data goes to the scammer and the user ends up on an actual page from the company's site so he or she doesn't suspect a thing.

    The easiest way to detect a phishy page is to look at the page address. A legitimate page will have the correct domain-such as aol.com or ebay.com-while a spoofed page will have only something similar-such as aol.whatever.com or blah.com/ebay. However, some phishers employ domain spoofing tricks such as replacing the lowercase letter "L" with the number 1, or the uppercase letter "O" with the number 0. This is also called homograph spoofing and the lookalike attack. A similar ploy is IDN spoofing, which uses domain name ambiguities in the user's chosen browser language. (IDN is short for International Domain Names, which refers to domain names written in languages other than English.)

    Another good way to detect a phishing email is to examine the address of the link that you're supposed to click. Again, this address will point to an obviously non-legitimate site. Or will it? Recent phishing attempts have used a technique called DNS cache poisoning, a Domain Name System exploit where a "poisoned" DNS server is configured to redirect surfers from a legitimate site to the scammer's site.

    As people become more aware of phishing, they're less likely to fall for obvious ploys such as requests for passwords and credit card data. So the world's dot con artists are revising their schemes to compensate. The latest tool in their nefarious arsenal is spear-phishing, which refers to phishing that is targeted at a specific person. This targeting usually consists of sending an email message that has a subject line, body text, and return address that make it appear as though it was sent by someone known to the recipient. For example, you might get an email that appears to come from the head of your IT department requesting that you visit a particular site to update your password.

    Another reason people are less likely to fall for a phishing scam is that big corporations are doing a better job of warning their customers and teaching them how to spot fraudulent requests. Scammers are hip to this, so they're trying a new tactic: targeting smaller companies that might not do as good a job warning their customers. These smaller scale attacks are called puddle phishing. Phishers are also breaking out of the fake-email-and-website paradigm and turning to fraudulent phone calls that attempt to con people out of sensitive data such as their credit card's 3-digit security number. This is called phone phishing.

    So Microsoft is right to include anti-phishing technology in Internet Explorer 7, because clearly we need all the help we can get. Perhaps they'll really get into the spirit of things and hack their name too. Microsopht, perhaps?

    IEEE Spectrum, April 2006This post appeared originally as my Technically Speaking column in the April 2006 issue of IEEE Spectrum.

  • Folk Wisdom

    Web geeks have long fantasized about a Web taxonomy: a classification scheme that would encompass the entire Web; not just pages, but also page content such as images and blog posts. Yahoo!'s hierarchical directory is an impressive attempt at a kind of Web taxonomy, but it's a weighty construction that doesn't make finding things on the Web all that much easier or faster. Even worse, it's a classification scheme that has been imposed from on high by Yahoo!'s information mavens. But top-down is the old way of doing things on the Web. We now live in the age of the long tail, the collective influence and power of the small sites and users that make up the vast majority of the Web. The big corporations might make up the "head" of the Web, but the hundreds of millions of personal sites, blogs, BitTorrent peers, and Flickr photo albums, not to mention the hundreds of millions of users roaming the Web, these comprise the beast's massively long tail.

    What does this long tail model have to do with a Web taxonomy? It tells us that hundreds of millions of people can probably classify what they see and interact with on the Web more efficiently, more comprehensively, and more usefully than a small group of Yahoo! managers. In other words, we won't get a true Web taxonomy until the process switches from top down to ground up.

    And that's just we we're starting to see happen all over the Web. At sites such as Flickr, del.icio.us, and Furl, ordinary users are creating their own taxonomic schemes. Only this isn't taxonomy — it's folksonomy, an ad hoc classification scheme that Web surfers invent as-they-surf as a way of categorizing the data they find online. It's also called (take a deep breath) folk categorization, communal categorization, ethnoclassification, distributed classification, social classification, faceted hierarchy, and mob indexing.

    Folksonomists apply descriptive keywords — or tags — to the objects they come across. (Which explains a few other synonyms for folksonomy: folk tagging, open tagging, social tagging, and free tagging.) Social software — software that enables users to share information and collaborate online — makes these tags available to other users, who can then take advantage of all this tagging to search for the information they need. At the del.icio.us site, for example, users bookmark interesting pages and assign tags to each site, and those tags can then be searched. This is called social bookmarking and it has caught the attention of some big players, not least of which are the taxonomists at Yahoo!, which not long ago launched My Web 2.0, a social bookmarking service.

    Sites such as Flickr (for photos) and Technorati (for blogs) maintain tag clouds, a list of the tags used on the site, although with some kind of visual indication of each tag's relative popularity. (At the Guardian newspaper, they call their tag cloud a folksonomic zeitgeist.) For example, the most popular tags are often shown with the largest font. Some sites even keep track of the tags that each user has applied in the past, the idea being that the user might be inclined to reuse those tags in the future. De.licio.us calls each user's tag cloud a tagroll (a play on blogroll, a blogger's list of links to other blogs that he or she reads).

    But how can non-professional taggers hope to create a taxonomy that's as sophisticated as one that true Web mavens would make? The answer lies in something called the architecture of participation: services get better as the number of users increases. The canonical example is BitTorrent, where each user acts as both client (seed) and server (peer). Files are downloaded by taking small chunks from any peers who have the file and who are online. The more peers that are online, the faster the download. For folksonomy, the more folks there are applying tags, the more effective is the result. The writer Bruce Sterling calls the folksonomically enhanced Web, "common wisdom squared."

    Folksonomies are not perfect, to be sure. Non-standard tags are problem — one Flickr user might tag a photo of a certain kind of retriever as "flat-coated," another as "flatcoated," and a third as "flatcoat" — and 1- or 2-word tags lack a certain amount of precision.

    On the other hand, folksonomy isn't meant to be a Google-killer. It is, instead, a kind of experiment in collective intelligence, that hallmark of what some people are calling Web 2.0 (a prolific language factory that will be the topic of a future column). One person can be pretty smart, but 10,000 or 100,000 people are almost always going to be smarter. The New Yorker's finance writer James Surowiecki calls it the wisdom of crowds, and we're starting to see some pretty big crowds in the folksonomy space: Technorati and del.icio.us each have tens of thousands of users, while Flickr boasts over 400,000. That's a lot of folks.

    IEEE Spectrum, February 2006This post appeared originally as my Technically Speaking column in the February 2006 issue of IEEE Spectrum.

  • Watchwords

    Privacy is already gone.
    —Larry Ellison, CEO, Oracle Corp.

    In his 2005 novel, The Traveler, John Twelve Hawks describes a near-future world in which almost everything we do is traceable and almost everywhere we go is trackable. Using high-tech tools such as Echelon (the global spy network that monitors electronic communication), the Global Positioning System, radio frequency ID tags, centralized databases, the monitoring of credit card charges, surveillance cameras with facial recognition software, the "Vast Machine" can watch anyone, except those few who elect to live "off the Grid." It's a paranoid book, to say the least, but, as they say, being paranoid doesn't mean someone isn't out to get you. In the real world, surveillance is growing at an alarming rate, and so, too, are the words and phrases we need to use to keep up.

    For example, did you know that you cast a data shadow? This is the trackable data that a person creates by using technologies such as credit cards, cell phones, and the Internet. This is also sometimes called the paperless trail, the electronic equivalent of a "paper trail." A similar idea is the pseudonymous profile, a collection of data associated with, usually, the IP address of a user's computer. The profile describes the user's online activities, interests, and habits, so a Web site can use it to personalize pages or, more often, to target advertising at that IP address. Some even envision a world of anticipatory surveillance, where the data collected enables the site (or whatever) to anticipate a person's actions or needs. (The opposite is preemptive surveillance, which tracks behavior in order to prevent someone from doing something he or she shouldn't.) This is a low-end variation on a digital silhouette, a profile generated by a software program that monitors a user's surfing habits. The data for this profile comes from users who agree to install the software in exchange for a cheap computer or cut-rate Internet access. Since the user agrees up front to be monitored, this kind of program is called opt-in surveillance or voluntary surveillance. The opposite would be a data spill, an accidental transmission or display of private online data to a third-party. It's the online analogue to an oil spill, the leakage of petroleum from an oil tanker or other vessel. Whatever the source, we are therefore increasingly susceptible to dataveillance, the ability to monitor a person's activities by studying his or her data shadow. A synonym that isn't as popular, but rolls off the tongue a little better, is consumer espionage.

    We like to think that all this surveillance is part of some dastardly plot cooked up by those twin pillars of the modern Big Brother: Big Government and Big Business. Unfortunately, surveillance is all-too common among us little folk, too. A common example is the nanny-cam, a special video camera — small enough to be concealed inside stereo equipment or a teddy bear — used for spying on babysitters. A similar idea is the kiddie cam, a camcorder that displays a live feed so that parents can monitor either their children or their children's babysitter from a remote location. Kiddie cams are also known as kinder-cams or cradle cams.

    Even creepier are the lengths some husbands and wives are going to in order to detect Internet infidelity, an online romance or affair conducted by their spouse. Web sites such as ChatCheaters.com and InfidelityCheck.org offer not just advice on dealing with an Internet cheater, but also sophisticated electronic tools. For example, you can purchase a keylogger, a program or device that records a computer's keystrokes. (A subset of the genre is the chat logger or IM logger, a utility specifically designed to record chat conversations held in instant messaging environments such as AOL, MSN, and Yahoo!.) Think your lesser half is cheating on the home computer while you're at work? No problem. Just install remote monitoring software, which tracks everything that happens on a computer and sends the results to a remote location (such as your work e-mail account).

    Are we becoming what Queen's University professor David Lyon has called the surveillance society? Is there hope for privacy? Larry Ellison might not think so, but an increasing number of people are fighting back by using a technique called sousveillance (or sometimes inverse surveillance). University of Toronto professor Steve Mann calls it "watchful vigilance from underneath." (The "sous" in sousveillance is French for "under"; the "sur" in surveillance is French for "over.") It's a kind of countersurveillance where people take pictures of surveillance cameras or record people in positions of power or authority and post those recordings on the Web. Think of it as the watched watching the watchers, and that can only be a good thing. If you think so too, be sure to celebrate World Sousveillance Day on December 24.

    IEEE Spectrum, December 2005This post appeared originally as my Technically Speaking column in the December 2005 issue of IEEE Spectrum.

  • Call Me, Ishmael

    When e-mail started to take off in the 90s, more than one pundit prognosticated the death of the phone call, as well as the early demises of writing and social interaction. These last two are in fact thriving thanks to the Internet and, with the proliferation of cellular, phones are now entrenched as a ubiquitous part of the cultural landscape. (It's becoming unusual to see someone walking down the street without a cell phone glued to his or her ear.) As I've argued numerous times before in this space, the importance of a cultural phenomenon is directly related to the number of new words and phrases that surround it, and telephony terms are multiplying with a rabbit-like intensity.

    For starters, consider cell phone types. It really wasn't all that long ago that cell phones did one thing and one thing only: handle voice calls. Now cell phones are being crammed with all kinds of non-voice features: a phone that also plays MP3s is called a music phone; a phone that has a built-in digital camera is a camera phone; a phone that includes PDA-like features-mobile operating system, organizer, e-mail, local storage, and so on-is called a smartphone. The latest phones come with not only MP3 players and cameras, but also built-in Wi-Fi or Bluetooth, text messaging, memory card slots, and more. These everything-but-the-coffee-room-sink phones are called hybrid phones, all-in-one phones, or, my favorite, Swiss army phones.

    Content rules, as always, and many of the newest phones can sync up with a PC to get digital music, ringtones, images, and other content. You could call this "downloading," since it involves data being sent to the phone. However, many people prefer to reserve the term downloading for obtaining data from a remote source. For content sent to a phone from a PC, the up-and-coming neologism is sideloading. Beyond that, isn't it true that cell phones don't really do a good job with most non-voice content? Yes, a lack of storage space and poor picture quality are common complaints, as is the dreaded click-and-wait experience that comes with a Web-enabled cell phone's mobile browser. (That is, with a regular connection you don't usually notice the time it takes for a Web page to download after you click its link, but that wait is interminable-and expensive-on a cellular connection.) We're starting to see hybrid phones with multi-gigabyte hard drives (to hold more MP3s) and multi-megapixel cameras (to take better pictures). Still, lots of people like their all-in-one devices because they avoid the islands of content problem that results from having to use separate devices for different kinds of content. (They prefer, one supposes, the "continent of content" that's available with an all-in-one phone.)

    Cellular isn't the only telephony game in town, of course. POTS (plain old telephone service) continues to evolve, as does the language surrounding it. For example, one of the perils of life in a typical cube farm (the collection of cubicles in an office) is privacy, particularly when talking on the phone. To help, companies are coming up with innovative ways to enhance voice privacy. For example, Sonare (owned by Herman Miller, the company that invented the cubicle) makes a vocal privacy device called Babble that hides your conversation among multiple samples of your voice played over small speakers. Playing a sound to reduce or eliminate the ability for others to hear something is called soundmasking.

    Few people enjoy dealing with call centers and their annoying "Press 1 to do this; Press 2 to do that" systems that so often lead us astray. To help speed the navigation of call center hierarchies, many companies are turning to voice recognition systems that use call steering algorithms to route calls based on natural language input. These systems usually rely on keyword spotting, where certain words or phrases dictate where the caller is sent. In the future, companies hope to install emotion detectors that can sense the caller's current emotional state (the defaults probably being frustration and anger).

    Have you ever been on a bad date and wished someone would call you with some urgent task that required your immediate attention? Wish no more: Cellular providers Cingular Wireless and Virgin Mobile USA offer rescue call services that ring your cell phone at a preset time and supply you with a "script" to make it appear that you've received an emergency call. (Cingular's service is called, memorably, Escape-A-Date.) More usefully, many people are now promoting ICE (in case of emergency) numbers. The idea is that you program an emergency contact number into your cell under the name "ICE." That way, police or paramedics would just have to look up the ICE entry in your phone to contact that person. Now that is a true rescue call.

    IEEE Spectrum, October 2005This post appeared originally as my Technically Speaking column in the October 2005 issue of IEEE Spectrum.

Copyright © 2008 Logophilia Limited and Paul McFedries
Powered by Community Server (Commercial Edition), by Telligent Systems