Tuesday, August 24, 2004
It Used to Suck to be a Search Engine
The New York Times yesterday published an op-ed piece entitled More Is Not Necessarily Better, which raises some concerns regarding "the influence of [Web] search companies in determining what users worldwide can see and do online." Read it, it's interesting.
If you're a regular reader of this blog, however, it should come as no surprise that I don't entirely agree with the authors. I'm sure it's intended to get people thinking about the degree to which we depend on search engines without realizing it, but it takes an approach that seems rather alarmist to me. Let's start at the beginning:
## BEGIN NYT QUOTE
Imagine if one company controlled the card catalog of every library in the world. The influence it would have over what people see, read and discuss would be enormous. Now consider online search engines.
## END NYT QUOTE
While this is a slick metaphor and a killer lead for an op-ed piece, the implied comparison is inaccurate and deceptive. One can understand why they took this approach, though, as after some adjustments to bring it closer into line with reality, that lead just wouldn't read as well:
Imagine if about half a dozen major companies, plus a bunch of smaller ones, all had competing versions of the card catalog of every library in the world, but one of those versions had become a lot more popular than any of the others.
Loses a little impact that way. "Doesn't pop," as one of my journalism professors was wont to say.
Let's think back to the old days, say 1998 or so. It pretty much sucked to be a search engine. You didn't get to lock your customers in with proprietary data formats, your service existed to send them to other people's sites -- not to keep them at your own site, viewing ads and generating revenue for you -- and worst of all, you didn't actually own or even control the data that made your business possible.
Any programmer with a good idea, decent skills, and a lot of hard drives could put together an offering out of their basement that could compete with you. So began the great migration from "search engine" to "portal" (or in some particularly marketing-influenced cases, "vortal"). Search became almost secondary as sites added email services, their own content, discussion groups...anything to get people to come to their site rather than someone else's, because everybody knew that "search" by itself was something that anybody could do, and not enough to really attract users.
1998, of course, was also the year that the Google beta popped up on the Web. One input box, two buttons, and nothing else. Why would anyone possibly go there, when Yahoo offered search plus a whole bunch of other stuff? Because little by little people started discovering that Google -- run by a couple of guys on a couple of machines -- was better at searching the Web than Yahoo. People found the stuff that they were looking for more quickly and easily.
But back to the opinion. With a nice rhetorical shift (simile now, not metaphor), the authors bring up their concerns about Google's methodology:
## BEGIN NYT QUOTE
Google's use of links to find content essentially turns the Web into the world's biggest popularity contest - and just as in high school, this can have negative consequences. Google's great innovation in online searching, and the main reason it is so successful, is that its technology analyzes links among Web pages, not just the content within them. Behind Google's complex ranking system is a simple idea: each link to a page should be considered a vote, and the pages with the most votes should be ranked first. This elegant approach uses the distributed intelligence of Web users to determine which content is most relevant.
## END NYT QUOTE
The authors' concern is that "popular sites become ever more popular, while obscure sites recede even further into the ether." While there's some degree of validity to this concern, the impact of this is somewhat lessened because search ranking isn't actually much like a high school popularity contest.
The fact that the Alien Abuctions Incorporated site holds the top Google spot for searches on "alien abductions" means that a link from AAI to a site about aliens would probably bounce that site up in Google's rankings. A link from AAI to a site about gardening? Not so much effect on the target site -- AAI's "reputation" within one sphere of information doesn't carry over to other spheres. Unlike high school, there isn't one group of "cool kids," but rather a virtually infinite number of topical cliques, each of which has reputation and influence within their own sphere.
Perhaps more important, though: the reason that this approach is widely used right now is that it works better than anything else that we've come up with. Rather than having an infinite number of monkeys that sit around trying to classify and rank all of the content on the Internet, PageRank and its associated programmatic variations and imitators allow the content to do most of the work itself.
But let us not forget that what Google did to Yahoo, Yahoo did to Alta Vista before them. That fundamental suck factor of search engines -- the stuff that you're searching is accessible to everybody and their mother -- is still out there, and the cost of entry is still pretty low. All it takes is a startup with a platinum card to buy the hardware and a few really smart people who can deal with eating ramen three times a day for a couple of years.
That's not to say that it's inevitable, but it is easily possible. Unlike, say, Microsoft, it costs Google's users nothing to switch to using someone else...they just point their browser to a new URL and Google's ranking in the real world drops just a little bit.