One thing many of you who've discussed the issue with me know: I LOVE my Bayesian spam filter! It's built right into my mail client (Mozilla), all I have to do is mark incoming spam and it learns as it goes, and no problems with false positives. I've had to tweak the settings lately because of the changes in spam behavior, mostly because they're putting in garbage words and, by default (and in line with Paul Graham's paper on the subject) unknown words are rated as .4 because it was thought that spammers would use common words—but, for that very reason, they started using the garbage words. I changed it to .6 and it's much better now. So all in all I've been just marvellously happy with it.
But I've noticed something disturbing lately: I'm opening up more spams because it's getting harder for me to tell the difference between the spams that make it through the filter and my legitimate email.
Example: I received an EMail with "Randi" in the subject line. The rest of it was kind of generic, but thinking it might be from someone on here I opened it...and got treated to an ad for "v1agr@." Just opening a spam can be a problem, especially if you have HTML mail turned on (I don't) and they stick in one of those tracking graphics. This has been happening more and more lately as I see familiar names or concepts I'm discussing showing up in the From or Subject lines.
But it isn't the spammers somehow finding personal information about me and targeting me. It's simply, I'm convinced, an unintended consequence of the Bayesian filter. I determined this by going through the Junk folder and seeing what was rejected. There are names and words all over the place. It makes sense that the ones that happen to match words and names I get a lot are going to have a greater tendency to be marked as good by the filter. But that just makes it more likely that I'm going to open the spam.
Now, I'm not about to throw out my Bayesian filter. It's still, from what I've seen, the best method out there. It gets rid of about 98% of my incoming spams with no false positives. The benefits more than outweigh this side effect. But as long as spam is even remotely effective, this scourge is not going to go away anytime soon. And it disturbs me to think that the Bayesian filters might be contributing to the long-term staying power of spam.
But I've noticed something disturbing lately: I'm opening up more spams because it's getting harder for me to tell the difference between the spams that make it through the filter and my legitimate email.
Example: I received an EMail with "Randi" in the subject line. The rest of it was kind of generic, but thinking it might be from someone on here I opened it...and got treated to an ad for "v1agr@." Just opening a spam can be a problem, especially if you have HTML mail turned on (I don't) and they stick in one of those tracking graphics. This has been happening more and more lately as I see familiar names or concepts I'm discussing showing up in the From or Subject lines.
But it isn't the spammers somehow finding personal information about me and targeting me. It's simply, I'm convinced, an unintended consequence of the Bayesian filter. I determined this by going through the Junk folder and seeing what was rejected. There are names and words all over the place. It makes sense that the ones that happen to match words and names I get a lot are going to have a greater tendency to be marked as good by the filter. But that just makes it more likely that I'm going to open the spam.
Now, I'm not about to throw out my Bayesian filter. It's still, from what I've seen, the best method out there. It gets rid of about 98% of my incoming spams with no false positives. The benefits more than outweigh this side effect. But as long as spam is even remotely effective, this scourge is not going to go away anytime soon. And it disturbs me to think that the Bayesian filters might be contributing to the long-term staying power of spam.