|| seamonkeyrodeo ||
| k a r a o k e | m i n d | c o n t r o l |
Tuesday, August 31, 2004
Page of Spammy Goodness
I've got a draft of my "it's pretty much my own fault but I still blame Small Business Server" weekend fun post going, but haven't had the time to finish it yet. Will get to that this week, but found something even more important that I had to note:

After receiving a few odd emails I started checking server logs and found that this page, which was created as a dramatic illustration for this post, is getting a surprising amount of traffic. Then I discovered that according to google I own the phrase "spammy goodness!"

I'm so proud. Another head to put up on the wall with bayesian avocado and alien abductions...
Wednesday, August 25, 2004
Breaking News: Yahoo Entering "Search" Market
Catching up on yesterday's reading I came across an Infoworld story on customer satisfaction levels for Web portals, search engines, and news and information sites in the US.

It's a nice followup to yesterday's post, as the final paragraph reads:

[CEO of ForeSee Results Larry] Freed acknowledged that the boundaries between search engines, portals and news and information are getting increasingly blurred, and that, accordingly, competition is also crossing frontiers. For example, Google has been adding features and services that put it in competition with portals, while Yahoo and MSN are likewise expanding into search engine territory, he said.

Wow. Yahoo is expanding into search engine territory, huh? Who could have predicted that something like that might happen?
Tuesday, August 24, 2004
It Used to Suck to be a Search Engine
The New York Times yesterday published an op-ed piece entitled More Is Not Necessarily Better, which raises some concerns regarding "the influence of [Web] search companies in determining what users worldwide can see and do online." Read it, it's interesting.

If you're a regular reader of this blog, however, it should come as no surprise that I don't entirely agree with the authors. I'm sure it's intended to get people thinking about the degree to which we depend on search engines without realizing it, but it takes an approach that seems rather alarmist to me. Let's start at the beginning:

Imagine if one company controlled the card catalog of every library in the world. The influence it would have over what people see, read and discuss would be enormous. Now consider online search engines.

While this is a slick metaphor and a killer lead for an op-ed piece, the implied comparison is inaccurate and deceptive. One can understand why they took this approach, though, as after some adjustments to bring it closer into line with reality, that lead just wouldn't read as well:

Imagine if about half a dozen major companies, plus a bunch of smaller ones, all had competing versions of the card catalog of every library in the world, but one of those versions had become a lot more popular than any of the others.

Loses a little impact that way. "Doesn't pop," as one of my journalism professors was wont to say.

Let's think back to the old days, say 1998 or so. It pretty much sucked to be a search engine. You didn't get to lock your customers in with proprietary data formats, your service existed to send them to other people's sites -- not to keep them at your own site, viewing ads and generating revenue for you -- and worst of all, you didn't actually own or even control the data that made your business possible.

Any programmer with a good idea, decent skills, and a lot of hard drives could put together an offering out of their basement that could compete with you. So began the great migration from "search engine" to "portal" (or in some particularly marketing-influenced cases, "vortal"). Search became almost secondary as sites added email services, their own content, discussion groups...anything to get people to come to their site rather than someone else's, because everybody knew that "search" by itself was something that anybody could do, and not enough to really attract users.

1998, of course, was also the year that the Google beta popped up on the Web. One input box, two buttons, and nothing else. Why would anyone possibly go there, when Yahoo offered search plus a whole bunch of other stuff? Because little by little people started discovering that Google -- run by a couple of guys on a couple of machines -- was better at searching the Web than Yahoo. People found the stuff that they were looking for more quickly and easily.

But back to the opinion. With a nice rhetorical shift (simile now, not metaphor), the authors bring up their concerns about Google's methodology:

Google's use of links to find content essentially turns the Web into the world's biggest popularity contest - and just as in high school, this can have negative consequences. Google's great innovation in online searching, and the main reason it is so successful, is that its technology analyzes links among Web pages, not just the content within them. Behind Google's complex ranking system is a simple idea: each link to a page should be considered a vote, and the pages with the most votes should be ranked first. This elegant approach uses the distributed intelligence of Web users to determine which content is most relevant.

The authors' concern is that "popular sites become ever more popular, while obscure sites recede even further into the ether." While there's some degree of validity to this concern, the impact of this is somewhat lessened because search ranking isn't actually much like a high school popularity contest.

The fact that the Alien Abuctions Incorporated site holds the top Google spot for searches on "alien abductions" means that a link from AAI to a site about aliens would probably bounce that site up in Google's rankings. A link from AAI to a site about gardening? Not so much effect on the target site -- AAI's "reputation" within one sphere of information doesn't carry over to other spheres. Unlike high school, there isn't one group of "cool kids," but rather a virtually infinite number of topical cliques, each of which has reputation and influence within their own sphere.

Perhaps more important, though: the reason that this approach is widely used right now is that it works better than anything else that we've come up with. Rather than having an infinite number of monkeys that sit around trying to classify and rank all of the content on the Internet, PageRank and its associated programmatic variations and imitators allow the content to do most of the work itself.

But let us not forget that what Google did to Yahoo, Yahoo did to Alta Vista before them. That fundamental suck factor of search engines -- the stuff that you're searching is accessible to everybody and their mother -- is still out there, and the cost of entry is still pretty low. All it takes is a startup with a platinum card to buy the hardware and a few really smart people who can deal with eating ramen three times a day for a couple of years.

That's not to say that it's inevitable, but it is easily possible. Unlike, say, Microsoft, it costs Google's users nothing to switch to using someone else...they just point their browser to a new URL and Google's ranking in the real world drops just a little bit.
Friday, August 20, 2004
Feed Splicing, Shell Scripts, and the Internet
Being an Assortment of Vaguely Related Thoughts

The handful of you who subscribe to the feed will notice that yesterday I started taking advantage of FeedBurner's feed splicing capability. I set up an account with del.icio.us, dragged a couple of bookmarks into Firefox's nav bar, and checked a box in my feedburner preferences...total time investment about two minutes.

The result is that now when I hit something interesting on the Web I just click my "post to del.icio.us" link, type in my notes about the page and that link is automagically added to my Web-based del.icio.us bookmarks; even better, though, is that FeedBurner then grabs that content every night and splices it into the seamonkeyrodeo RSS feed that you know and love. A cool idea, easy to set up and use, and exactly the sort of thing that companies like FeedBurner should be coming up with -- my RSS feed just became more useful because I use FeedBurner.

Now on to the tangents...

I particularly like this because it fits in well with the way that I do things. One of the dumb little reasons that I stay with linux boxes on the desktop at home and work is that I can keep a command line shell open all the time. Whether I'm at home or at work, I can just type "note" into that shell and my machine dumps whatever else I type into a file, adds a date/time stamp, and (once I've finished the note) shoots that file over to another server where it is added to all the other notes that I've made over the last few years -- all accessible to me via a Web page.

If you're reading this on the Web, you may have noticed the "Notes" section in the sidebar...same idea: I note down random thoughts and a script just converts them to a .js file that gets pulled into this blog.

This is all the sort of stuff that reminds me of why I think the Internet is cool...I can easily fling information around, slice and dice it, and present it or not to the outside world. There are ongoing discussions in various places about how weak Web browsers are as content creation tools; I suppose that's because many people are concerned about the technological barrier to content creation on the Web being too high...in some ideal worlds, I believe, all content is on the Web and everybody both creates and consumes with a Web browser. Probably not a Microsoft Web browser.

What's particularly great about stuff like FeedBurner's feed splicing is that it's addressing this issue of content creation in a very different way; by taking advantage of the basic interconnectedness of the Web, using that interconnectedness to make it easy to combine information that already exists somewhere, things like the FeedBurner/del.icio.us combination get past the idea that everyone must be able to easily create HTML documents in order to "contribute" to the Web. I'm not against Wikis, nor the slick little blogger interface that I'm using to make this post, but I am really glad to see people looking at...well, "networked content," perhaps...in a different way.
Thursday, August 19, 2004
If you don't know what this number refers to, you've probably been living in a small, dark box for the last few months. Happy Google-day, everyone.
Bugmenot.com down...but why?
Anyone who happens to use Bugmenot (or its excellent Firefox/Mozilla plugin) to skip over the multitude of registrations required to access content on many Web sites will notice that the site's not there anymore. Usenet suggests that it's been down since Tuesday, but I haven't come across any reliable information on why.

There's a machine sitting there that looks like somebody could be restoring from bare metal and backups following an ugly hardware mishap [yes, that's why you don't put your coffee cup down on top of the server], but it could also be due to the appearance of "legal pressure" of the sort that the operators have been concerned about.

Anyone got more information?

Update: According to a post on Mozillazine, bugmenot's hosting provider pulled the plug, and they're looking for a new home. I don't expect that it'll take too long for them to find one. Bets on how long before the site is back up and running?
Tuesday, August 17, 2004
Lemons and Lemonade. Pulpy Goodness.
Three days ago I was in the Garden of the Gods, just outside Colorado Springs, climbing around on incredible rock formations and enjoying the sun.

Today? Fate takes its cut. Four days before we planned on a scheduled "real world learning experience" changeover from our production database to our standby, we had to make the change to our standby. Like, now. And since we were rebuilding a nice, clean standby for the scheduled change, we had recently blown away the standby that was keeping up very nicely, about three minutes behind production. And the new standby was still 24 hours behind when production went all pear-shaped. And that was fucking excellent.

Not the sort of day that one would choose to have, were one given a choice about such things, but I've said it before and I'll say it again: I work with some really good people. We've long since been up and running, and it's starting to look like our tests hold true: the standby machine is doing an excellent job...possibly better than the much more expensive production box. Perhaps that's something that we'll discuss with the production machine's vendor in the near future.

Anyway, the point is: we were given lemons, so we made lemonade. Actually, nobody gave us any sugar or water, so I think that what we did was pound the crap out of some lemons and then laugh maniacally as the juice splattered everywhere. Though maybe that was just me. Whatever.
Wednesday, August 11, 2004
How Not to Sell Stuff to Me
I'm currently in Return Path's Colorado office, finishing up three days of talking, planning, and starting to get to know the people I hadn't met before. Starting tomorrow I've got three days of hiking, climbing, and hanging around Colorado in a non-work capacity, which makes for a really good week all together..

Because I'm in Colorado, and because the creaky old powerbook that I grabbed as I was running out my door has a mysteriously screwed up copy of VNC on it, I'm stuck accessing my company's exchange server via the Web interface, which is kind of like...well, it's kind of like something really irritating. No way to sort messages, no easy way to choose how many messages to view in a window, slow...all the stuff that people always complain about. Net, accessing email is a bit of a chore right now, so I'm not pleased with having unnecessary messages in my inbox.

Into this mood comes a helpful email sent by my company's sales rep at a major computer manufacturer. As usual, it's a notification of the wonderful special offers that they have available now, mailed out to everyone in that rep's address book. These messages irritate me at the best of times: if I wanted ads from them I'd sign up for a newslettter. If the rep notices that I've bought a half dozen of their 2650 servers in the last couple of years and wants to make an effort to sell me some of the new 2850s that they're pushing now, that's fair enough. I expect that a sales rep is going to check in with me periodically and try to convince me to buy things I don't yet know I need...that's their job. I don't, however, expect to get a homemade, multi-colored, big font abomination of an ad for every random little thing that the company has a discount on this week. Not good. Doesn't make me buy things.

Now what made this message even better was that the salesperson accidentally used CC rather than BCC to spam their clients, so I now have a handy dandy list of contacts with purchasing authority at other small to mid-size businesses. This was simple operator error, which I can completely understand, but in this case it just illustrates why this approach to email as a sales tool is a really bad idea. There are lots and lots of tools out there to handle mass email communication that are designed to minimize this sort of error. Many of these tools also allow you to track when you last contacted these people, group and mail to them based on buying habits or other information that you choose...these tools are designed to allow you to use email intelligently, as part of a sales process.

There's a whole other post in the various (mature and otherwise) responses that were then sent via the "reply all" button, but I'm too busy deleting them all (very sloooowly ) to bother right now. Use your imagination. See you all in a few days.
Friday, August 06, 2004
The Spam I'm Not Seeing
So as we all know, AOL has acquired Mailblocks, the challenge/response spam filtering company that incidentally holds a bunch of apparently questionable patents covering the use of challenge/response for email filtering. Yeah, well, whatever. So in early 2005 AOL will offer another ca. 2003 spam filtering tool to any subscribers they might still have. Again, let me say: whatever.

It's interesting, though, how effectively challenge/response has managed to maintain a sort of minor "up and coming technology" status in the face of a gigantic collective yawn on the part of its potential user base. Nobody bothers to use c/r. Seriously. I say this not only because I can't remember when I last got an email challenge, but also because of the compelling evidence of the spam that I'm not seeing.

Whatever else they may be, spammers are reactive. Spam changes -- and changes quickly -- in response to each and every attempt to stop it. Admins start blocking the machines that send spam? Spammers figure out how to effectively distribute their spam sending chores across many machines. People start filtering based upon words frequently used in spam? Words like "v1@gr@" and "|S|L|U|T|S|" are coined. People create smarter methods of identification, like collaborative message fingerprinting and bayesian analysis? Enter the randomizers: random words, paragraphs from books, and miscellaneous other text appear in spam.

So what does that have to do with challenge/response? The spam that I haven't yet seen is a message something like this:

To: somebody@example.com
From: verification@legitimate-sounding-domain.com
Subject: Please Authenticate your Message
You recently sent a message to a Legitimate Sounding Spam Stopping Tool user. Your message has been quarantined, and will not be delivered until you click the link below to verify that this message is not spam or automatically generated bulk email. You will only have to do this once, after which any messages you send to this user will be automatically delivered to their inbox.


Thank you,
The Legitimate Sounding Spam Stopping Tool Team

If you click the link above, you'll see the potential problem here. If I'm a really clever spammer, I'll just copy the text used by some legitimate challenge/response system for my fake messages. If I'm an extra-special clever spammer, I'll tie this into a nice little worm of some sort, which would allow me to use the infected machine's address book to send out "verification" messages that include an email address that the recipient is likely to recognize. Cool, huh?

The day after c/r systems become popular (assuming that ever actually happens), I fully expect to see these messages...and because I haven't seen them, I just don't think that c/r is commonly used yet. It's interesting, actually, because just a couple of biggish spammers doing something like this could make c/r completely worthless. Every time you clicked on a challenge link it'd be a crapshoot, which takes challenge/response from "minor annoyance" all the way up to "way more trouble than it's worth."

Maybe I'm giving spammers too much credit. Maybe challenge/response is being widely used, and spammers are just too dim to have figured out this approach. Hmmm...maybe I should have filed for a patent on this before writing this post -- this could be a gold mine! Got to go, there's work to be done...

Powered by Blogger