Re: [math-fun] Spam-blocking, Bayesian methods

18 Sep 2003

      Mozilla uses Bayesian methods.  They are based on Paul Graham's article.
http://www.paulgraham.com/spam.html

Mozilla is at 
http://www.mozilla.org/

The Mozilla Bayesian implementation was by
Patrick C. Beard and Seth Spitzer.

You can get a spam analyzer for Mozilla via 
http://bayesjunktool.mozdev.org/

The Netscape write-up is at
http://devedge.netscape.com/viewsource/2003/junkmail-filtering/

Ars-Technica did a nice comparison of implementations at
http://www.arstechnica.com/ask-ars/2003/anti-spam/

I have a Mathematica notebook for analyzing spam Bayesianly, based on my
Mozilla filters.  As an example, here are some top "bad" words for indicating
spam.  

{remove, 0, 281}, {missing, 0, 263}, {styledata, 0, 247}, {border-bottom, 0,
218}, {75pt, 1, 433}, {border-top-style, 0, 215}, {border-right-style, 0, 215},
{blank, 0, 209}, {italic, 0, 199}, {fontslant-, 0, 198}, {dstruct, 0, 162},
{ff0000, 0, 144}, {nitelifeinfo, 0, 141}, {spacer, 0, 137}, {nums, 0, 131}

--Ed Pegg Jr, www.mathpuzzle.com

--- Guy Haworth <guy_haworth@hotmail.com> wrote:
...
I would appreciate a pointer to a readily accessible article or three on
how Bayesian methods can be used to block Spam.
Pointers to any products implementing these methods also welcomed.
Guy