seamonkeyrodeo: No, seriously, comparing two things is "non-obvious"...

Whitney McNamara's Sea Monkey Rodeo. This blog moved over here in August 2006. Come by if you're interested in newer posts.

rss feed

del.icio.us links
My ten most recent del.icio.us bookmarks listed below. Also spliced into the RSS feed, courtesy of FeedBurner.

monthly archives
June 2004
July 2004
August 2004
September 2004
October 2004
November 2004
December 2004
January 2005
February 2005
March 2005
April 2005
May 2005
June 2005
July 2005
August 2005
September 2005
October 2005
November 2005
December 2005
January 2006
February 2006
March 2006
April 2006
May 2006
June 2006
September 2006

contact email
seamonkey (located at) absono.us

creative commons

Licensed under a Creative Commons License

standard disclaimer
All opinions expressed here are, of course, entirely my own personal views, and do not reflect the views of my employer or its associated companies.

|| seamonkeyrodeo ||
| k a r a o k e | m i n d | c o n t r o l |

Tuesday, September 14, 2004

No, seriously, comparing two things is "non-obvious"...
And so the wonderful world of intellectual property law rolls along. A company by the name of Commtouch today announced that it has acquired a patent covering a method of identifying and eliminating spam.

While -- as always -- you should take a look at the actual source documents yourselves, let's take a look at a quick snippet from the patent, shall we?

The bulk e-mail is detected by monitoring live e-mail flow streams, typically at a central server location in the Internet system, but also capable of installation at separate subscriber sites. Detection is effected by reading the e-mail message, eliminating the personalization and addressing portions and processing the remaining text to establish a signature identification code. Bulk mailings are detected when there are at least two e-mail messages identified containing the same non-address contents being sent to different e-mail addresses.

Sounds a lot like they're hashing the body of an email message, doesn't it? Now it's possible that Brightmail, at least, may be able to slap this down with some prior art examples anyway, but there's one other thing that seems odd on a quick scan of the patent: it's appropriately specific about the techniques involved -- so specific, in fact, that this patent may actually be useless. The patent seems to cover comparing the "signatures" of the message body only of two messages to find an exact match, which is a technique that spammers have already defeated by adding randomized content to each outgoing spam message.

I suppose that I can understand why Commtouch bought the patent, though...I imagine that the conversation went something like this:

Lawyer: Hey, boss? We may have a problem -- apparently some guy has a patent on comparing two email messages to see whether they're the same.

Boss: Yeah, right. Pull the other one, why don't you? It's got bells on it.

Lawyer: No, seriously. You know how the U.S. Patent Office is. I actually patented my ass the other day, just for fun. It was approved. I'm thinking about patenting respiration next week.

Boss: It's been a long week and I don't need to deal with this crap. We just got $3.9 million thrown at us, let's buy the damn patent. There are too many companies in the anti-spam software space anyway, maybe we can pick up some extra revenue by suing people.

As I said, I hope and trust that prior art could invalidate this patent, but I'm still irritated by craptacular IP claims like this. Yes, the "private inventor" who came up with this gets some credit for realizing early on that existing tools could be used to identify spam. Good for him. Was it non-obvious? Was it a non-intuitive leap to go from checking for keywords in messages to just comparing the whole message (again, using tools that were already well known)? This is left as a question for the reader.

If the private inventor had been marketing his own product all the way along I might feel a little better about this, but as it is this smacks of patent farming. While that's certainly a direction that interests a lot of people these days, people a lot smarter and better informed than I am have pointed out the potential for long term harm in this "patent them all and let God sort them out" approach to technological advancement.

The only real question that I'm left with is whether "craptacular" was really the right word to apply here...I was also considering "ass-tastic." Feel free to let me know if you have an opinion on this important matter.
- posted by whitneymcn @ 4:52 PM
rate this post, complements of NewsGator Online:

Comments:

# posted by

Anonymous : 2:01 PM

Replace "matches exactly" with "matches except for the random content", and it's a potentially useful scheme again.
Dynamically building a filter than can distinguish the randomly added content from the rest of the message may not be trivial, but should get easier as the number of examples compared increases.
5dm.Yep, absolutely -- and that's one of the approaches that companies like Brightmail are already using. Rather than looking for exact matches on an entire message body, they look for matches within subsections of different messages, similarity of texts...all that good stuff.

What's interesting here, though, is that it seems entirely possible that none of the techniques listed above would infringe on the patent in question. At this late date, this patent may now cover a technique that nobody wants to employ, anyway.

# posted by

whitneymcn : 7:50 AM

There doesn't appear to be much detail around about Brightmail's
BrightSig2 algorithm:

"Brightmail says its fingerprinting technology, called BrightSigs,
can remove the random noise inserted by spam tools. "We generate
an intelligent signature so that as long as a certain percentage
of the signature matches, we can say it's spam," says Salem".

The key phrase there may be: "certain percentage".

Techniques do exist for document clustering (e.g. shingling) that can
be useful for creating signatures than can help distinguish similar
documents even with random noise, but is BrightSig2 actually using an
intelligent algorithm such as that?

It may be academic at this point if spammers are simply inserting
"non-random" noise, but as they become better at it, it's only the
message itself that will still have to retain some invariance, and
even more intelligent document clustering approaches may be needed.
5dm.

# posted by

Anonymous : 2:32 PM