Re: [math-fun] Lexical distance?
Besides comparing identical places in two strings, one might also care how close they are to being rearrangements of each other. E.g., one might want to consider XYXYXY and YXYXYX much closer than XYXYXY and ZZZZZZ, even though both pairs differ in every place. Assuming finite strings from a finite alphabet in which all pairs of letters are equally "distant", one might define lexical distance as the fewest "moves" to go from one string to another, where a move is either 1) the transposition T of two adjacent string elements, or 2) the substitution S of one letter by another in the same place. In general the two types of moves would be weighted differently, depending on the application. D = p*(#T) + (1-p)*(#S) For an alphabet where letters have various distances, this could be changed slightly to reflect that. --Dan Marc wrote: << A very rough kind of ³lexical distance² between strings (possibly infinite) is just the number of symbols by which they differ.
_____________________________________________________________________ "It don't mean a thing if it ain't got that certain je ne sais quoi." --Peter Schickele
This reminds me of a very well-known distance measure: The one used by the game Mastermind (or Bulls and Cows, or MIT folks might remember it from the old computer program Moo). Here, distance is given by a pair of numbers: The first is the number of symbols that match and are in the same position, and the second is the number of remaining symbols that match but are not in the same position. There are of course many ways in which these two numbers could be combined to produce a single number. Tom Dan Asimov writes:
Besides comparing identical places in two strings, one might also care how close they are to being rearrangements of each other. E.g., one might want to consider XYXYXY and YXYXYX much closer than XYXYXY and ZZZZZZ, even though both pairs differ in every place.
Assuming finite strings from a finite alphabet in which all pairs of letters are equally "distant", one might define lexical distance as the fewest "moves" to go from one string to another, where a move is either 1) the transposition T of two adjacent string elements, or 2) the substitution S of one letter by another in the same place. In general the two types of moves would be weighted differently, depending on the application.
D = p*(#T) + (1-p)*(#S)
For an alphabet where letters have various distances, this could be changed slightly to reflect that.
--Dan
Marc wrote:
<< A very rough kind of ³lexical distance² between strings (possibly infinite) is just the number of symbols by which they differ.
_____________________________________________________________________ "It don't mean a thing if it ain't got that certain je ne sais quoi." --Peter Schickele
_______________________________________________ math-fun mailing list math-fun@mailman.xmission.com http://mailman.xmission.com/cgi-bin/mailman/listinfo/math-fun
These are cool string metrics (thanks, I'll remember them!) but they don't scale to infinite strings for the application I envision. OK, here's a sketch of something a lot like what I'm after: Usually when you sum a series, for pi, say, the leftmost digits tend to settle down first. Consequently, the approximations move closer in distance to the limit, and we say it converges in the familiar delta-epsilon way. Now instead of summing a series, imagine a process where the digits are all fizzing around, maybe even chaotically, but as time goes by a greater percentage (whatever that means) match the digits of the limit value. For example imagine each bit gets randomly flipped from its correct value with decreasing frequency. That feels a lot like what I mean intuitively by convergence, but it's all a bit sketchy. So I'm interested in learning of relevant prior art, best practices, or simply better ways to talk about this.
="Tom Karzes" <karzes@sonic.net>
This reminds me of a very well-known distance measure: The one used by the game Mastermind (or Bulls and Cows, or MIT folks might remember it from the old computer program Moo). Here, distance is given by a pair of numbers: The first is the number of symbols that match and are in the same position, and the second is the number of remaining symbols that match but are not in the same position. There are of course many ways in which these two numbers could be combined to produce a single number.
Tom
Dan Asimov writes:
Besides comparing identical places in two strings, one might also care how close they are to being rearrangements of each other. E.g., one might want to consider XYXYXY and YXYXYX much closer than XYXYXY and ZZZZZZ, even though both pairs differ in every place.
Assuming finite strings from a finite alphabet in which all pairs of letters are equally "distant", one might define lexical distance as the fewest "moves" to go from one string to another, where a move is either 1) the transposition T of two adjacent string elements, or 2) the substitution S of one letter by another in the same place. In general the two types of moves would be weighted differently, depending on the application.
D = p*(#T) + (1-p)*(#S)
For an alphabet where letters have various distances, this could be changed slightly to reflect that.
--Dan
Marc wrote:
<< A very rough kind of ³lexical distance² between strings (possibly infinite) is just the number of symbols by which they differ.
_____________________________________________________________________ "It don't mean a thing if it ain't got that certain je ne sais quoi." --Peter Schickele
_______________________________________________ math-fun mailing list math-fun@mailman.xmission.com http://mailman.xmission.com/cgi-bin/mailman/listinfo/math-fun
_______________________________________________ math-fun mailing list math-fun@mailman.xmission.com http://mailman.xmission.com/cgi-bin/mailman/listinfo/math-fun
participants (3)
-
Dan Asimov -
Marc LeBrun -
Tom Karzes