New subject: [math-fun] ISO a perfect pangram

10 Aug 2017

      Veit Elser <ve10@cornell.edu> wrote:
...
1.  Extract sequences of letters, including spaces (as word
separators), from actual text.
I tried that decades ago.  It picks up far too much junk:  Proper
names, abbreviations, acronyms, jargon, computer codes, ham radio
codes, misspelled words, foreign words, misspelled foreign words, etc.
If I were to try it again today, now that lots of people send HTML
email, it would probably tell me that "msonormal" was the most common
English word.

I've tried searching for word lists online that list each word by its
frequency.  In a perverse equivalent of Godel's theorem, it appears
that every such list is either incomplete or contains trash.

For instance http://norvig.com/ngrams/count_1w.txt starts promisingly:

the 23135851162
of  13151942776
and 12997637966
to  12136980858
a    9081174698
in   8469404971
for  5933321709
is   4705743816

but if, for instance, I search it for the anagrams of "post" I get:

post  392956436
stop   77749471
spot   26750929
tops   11771127
pots    3854743
opts     662207
tsop     205591
tpos      43379
ostp      41988
ptos      38390
otps      23858
ptso      21839

Any idea where I can find a clean and complete list?  Thanks.

Re: [math-fun] ISO a perfect pangram

Keith F. Lynch

Veit Elser

Andy Latto

tags

participants (3)