19 Sep
2003
19 Sep
'03
12:54 p.m.
REVOLU<!--Am-->TIONARY wpj<...html stuff...>a osnkni tolzbo mxwzys a i
Of course, this kind of subtrafuge is a spammer's dodge to avoid being caught by simple spam filters. The problem with trying to analyze the text to reduce it to the "real message" is that the real message is innocuous except to an intellegent reader. For example, I've received messages containing this kind of gibberish, for which the intellegible message is "gentleman's johnson enhancer". The beauty of the baysean approach is that the gibberish and html noise is itself a filterable criterion, and I don't have to decide what parts are safe to use as the filter.