[math-fun] Spam-blocking, Bayesian methods
I would appreciate a pointer to a readily accessible article or three on how Bayesian methods can be used to block Spam. Pointers to any products implementing these methods also welcomed. Guy
I recommend (and use popfile), open source, from http://popfile.sourceforge.net/ There are some papers about the methods on their web site.
Mozilla uses Bayesian methods. They are based on Paul Graham's article. http://www.paulgraham.com/spam.html Mozilla is at http://www.mozilla.org/ The Mozilla Bayesian implementation was by Patrick C. Beard and Seth Spitzer. You can get a spam analyzer for Mozilla via http://bayesjunktool.mozdev.org/ The Netscape write-up is at http://devedge.netscape.com/viewsource/2003/junkmail-filtering/ Ars-Technica did a nice comparison of implementations at http://www.arstechnica.com/ask-ars/2003/anti-spam/ I have a Mathematica notebook for analyzing spam Bayesianly, based on my Mozilla filters. As an example, here are some top "bad" words for indicating spam. {remove, 0, 281}, {missing, 0, 263}, {styledata, 0, 247}, {border-bottom, 0, 218}, {75pt, 1, 433}, {border-top-style, 0, 215}, {border-right-style, 0, 215}, {blank, 0, 209}, {italic, 0, 199}, {fontslant-, 0, 198}, {dstruct, 0, 162}, {ff0000, 0, 144}, {nitelifeinfo, 0, 141}, {spacer, 0, 137}, {nums, 0, 131} --Ed Pegg Jr, www.mathpuzzle.com --- Guy Haworth <guy_haworth@hotmail.com> wrote:
I would appreciate a pointer to a readily accessible article or three on how Bayesian methods can be used to block Spam.
Pointers to any products implementing these methods also welcomed.
Guy
participants (3)
-
Dave Dyer -
Ed Pegg Jr -
Guy Haworth