[math-fun] Similar permutations problem
Here's a real life problem. I have a lot of strings (~20,000) which are permutations of 14 letter "words" starting with ABCDEFGHIJKLMN I suspect a lot of these words are similar, meaning they have a large number of letters in the same position. Checking for identical words is trivial - just sort and compare adjacents. Is there an approach better than the brute force 20,000! pairwise comparisons to find similar permutations? (For those who like a less abstract descriptions, the 20,000 are solutions to a set of puzzles, and I suspect a lot of the solutions are similar even though the puzzles are distinct.)
I can think of 2 1/2 ways to attach a number to how similar two 14-letter strings are. Specifically to define a metric that actually measures how different they are: a) The number of places where they differ ("similar, meaning they have a large number of letters in the same position"). IF the ordering of the alphabet (let's say A...Z) is relevant: b) The least number of transpositions of 2 adjacent letters that transform one string into the other. IF rather it's the *circular ordering* that matters, then: b.5) The least number of transpositions of two circularly adjacent letters — meaning A <-> Z as well as the 25 others — that transform one string into the other. --Dan
On May 7, 2015, at 10:41 AM, Dave Dyer <ddyer@real-me.net> wrote:
Here's a real life problem. I have a lot of strings (~20,000) which are permutations of 14 letter "words" starting with ABCDEFGHIJKLMN
I suspect a lot of these words are similar, meaning they have a large number of letters in the same position. Checking for identical words is trivial - just sort and compare adjacents. Is there an approach better than the brute force 20,000! pairwise comparisons to find similar permutations?
(For those who like a less abstract descriptions, the 20,000 are solutions to a set of puzzles, and I suspect a lot of the solutions are similar even though the puzzles are distinct.)
This may be applicable to Dan D's edit distance question: Proof that a 40-year-old algorithm is the best possible will come as a relief to computer scientists. http://newsoffice.mit.edu/2015/algorithm-genome-best-possible-0610 Wagner-Fisher Algorithm http://en.wikipedia.org/wiki/Wagner%E2%80%93Fischer_algorithm On Thu, May 7, 2015 at 1:41 PM, Dave Dyer <ddyer@real-me.net> wrote:
Here's a real life problem. I have a lot of strings (~20,000) which are permutations of 14 letter "words" starting with ABCDEFGHIJKLMN
I suspect a lot of these words are similar, meaning they have a large number of letters in the same position. Checking for identical words is trivial - just sort and compare adjacents. Is there an approach better than the brute force 20,000! pairwise comparisons to find similar permutations?
(For those who like a less abstract descriptions, the 20,000 are solutions to a set of puzzles, and I suspect a lot of the solutions are similar even though the puzzles are distinct.)
_______________________________________________ math-fun mailing list math-fun@mailman.xmission.com https://mailman.xmission.com/cgi-bin/mailman/listinfo/math-fun
participants (3)
-
Dan Asimov -
Dave Dyer -
Jeff Caldwell