This reminds me of a very well-known distance measure: The one used by the game Mastermind (or Bulls and Cows, or MIT folks might remember it from the old computer program Moo). Here, distance is given by a pair of numbers: The first is the number of symbols that match and are in the same position, and the second is the number of remaining symbols that match but are not in the same position. There are of course many ways in which these two numbers could be combined to produce a single number. Tom Dan Asimov writes:
Besides comparing identical places in two strings, one might also care how close they are to being rearrangements of each other. E.g., one might want to consider XYXYXY and YXYXYX much closer than XYXYXY and ZZZZZZ, even though both pairs differ in every place.
Assuming finite strings from a finite alphabet in which all pairs of letters are equally "distant", one might define lexical distance as the fewest "moves" to go from one string to another, where a move is either 1) the transposition T of two adjacent string elements, or 2) the substitution S of one letter by another in the same place. In general the two types of moves would be weighted differently, depending on the application.
D = p*(#T) + (1-p)*(#S)
For an alphabet where letters have various distances, this could be changed slightly to reflect that.
--Dan
Marc wrote:
<< A very rough kind of ³lexical distance² between strings (possibly infinite) is just the number of symbols by which they differ.
_____________________________________________________________________ "It don't mean a thing if it ain't got that certain je ne sais quoi." --Peter Schickele
_______________________________________________ math-fun mailing list math-fun@mailman.xmission.com http://mailman.xmission.com/cgi-bin/mailman/listinfo/math-fun