The quick brown fox jumps over the lazy dog. Although not quite what you want, I can still recall typing this sentence (using thumbs only for the space bar!!) for hours during typing class while in junior high school in the early 1960's. A closely allied problem from the 1960's: what sequence of characters, when printed on an IBM 1403 chain printer, will break the chain? Is there an English sentence which could do this? (IBM went to a lot of trouble to make sure that no such sentence existed; I wonder if they ever published their research on this topic.) At 09:46 PM 8/2/2017, Keith F. Lynch wrote:
Bill Gosper's recent anagrams reminded me of my search for a perfect pangram, i.e. a sentence in English containing every letter once and only once. For instance, "Cwm fjord bank glyphs vext quiz" and "Mr. Jock, TV quiz PhD, bags few lynx," neither of which is really satisfactory.
My approach was to find a complete list of words, discard any words containing duplicate letters (e.g. "containing," as it contains two "n"s (and also two "i"s)), turn each of them into a 32-bit integer which indicates which letters are present ("of" becomes 2^4 + 2^13 as it contains the 5th and 14th letters). Anagrams are merged (e.g. the numbers for "opts," "post," "pots," "spot," "stop," and "tops" are the same). A depth-first tree search is then done during which the numbers are ORed together. If an AND gets a non-zero result, the branch is immediately abandoned. If 2^26-1 is reached, the result is logged, so that I can manually try to arrange the words (or anagrams of them) into a coherent sentence.
I tried this about 30 years ago, on a 286. Needless to say, I didn't get anywhere. It probably would have run for eons.
I'm considering trying again, on the same general plan, but only with the most common ten thousand words. Better yet, the most common thousand words containing "a", the most common thousand words containing "b", ..., and the most common thousand words containing "z," with the many duplicates merged, for a total of perhaps ten thousand. I might also include the most common thousand words *not* containing "a", the most common thousand words not containing "b", ..., and the most common thousand words not containing "z." Again, duplicates would be merged. And words that are anagrams would move up in the rankings as the usage of each anagram would be summed as if they were all the same word.
With memory as cheap as it is, I might also include every *pair* of words in the list as a word, even though that would increase memory usage from 40 kilobytes to 400 megabytes.
Has anyone tried this already? Does anyone have a better algorithm? Thanks.