Re: [math-fun] ISO a perfect pangram

10 Aug 2017

      There are some terrific just plain word lists online, e.g., at
puzzlers.org go to "Solving tools of the Enigma" and then
"grep dictionary search".
...
From there you can download the list (or lists; there are a lot
of choices) and toss out all the items in the otherwise-promising 
data that are not found in the list.
Kind of a pain but doable.

—Dan
...
From: "Keith F. Lynch" <kfl@KeithLynch.net>
...
Veit Elser <ve10@cornell.edu> wrote:
...
1.  Extract sequences of letters, including spaces (as word
separators), from actual text.
I tried that decades ago.  It picks up far too much junk:  Proper
names, abbreviations, acronyms, jargon, computer codes, ham radio
codes, misspelled words, foreign words, misspelled foreign words, etc.
If I were to try it again today, now that lots of people send HTML
email, it would probably tell me that "msonormal" was the most common
English word.
I've tried searching for word lists online that list each word by its
frequency.  In a perverse equivalent of Godel's theorem, it appears
that every such list is either incomplete or contains trash.
For instance http://norvig.com/ngrams/count_1w.txt starts promisingly:
the 23135851162
of  13151942776
and 12997637966
to  12136980858
a    9081174698
in   8469404971
for  5933321709
is   4705743816
but if, for instance, I search it for the anagrams of "post" I get:
post  392956436
stop   77749471
spot   26750929
tops   11771127
pots    3854743
opts     662207
tsop     205591
tpos      43379
ostp      41988
ptos      38390
otps      23858
ptso      21839
Any idea where I can find a clean and complete list?  Thanks.
_______________________________________________
math-fun mailing list
math-fun@mailman.xmission.com
https://mailman.xmission.com/cgi-bin/mailman/listinfo/math-fun

Re: [math-fun] ISO a perfect pangram

Dan Asimov