[math-fun] Lexical distance?
A very rough kind of ³lexical distance² between strings (possibly infinite) is just the number of symbols by which they differ. Is there an Official Name for this?
You might look at "edit distance": http://en.wikipedia.org/wiki/Edit_distance At 12:24 PM 8/26/2010, Marc LeBrun wrote:
A very rough kind of ³lexical distance² between strings (possibly infinite) is just the number of symbols by which they differ.
Is there an Official Name for this?
http://en.wikipedia.org/wiki/Levenshtein_distance discusses some of the different string distances. If your model is just symbol substitution, then Hamming distance is the official name. -Thomas C On Thu, Aug 26, 2010 at 5:38 PM, Henry Baker <hbaker1@pipeline.com> wrote:
You might look at "edit distance":
http://en.wikipedia.org/wiki/Edit_distance
At 12:24 PM 8/26/2010, Marc LeBrun wrote:
A very rough kind of ³lexical distance² between strings (possibly infinite) is just the number of symbols by which they differ.
Is there an Official Name for this?
_______________________________________________ math-fun mailing list math-fun@mailman.xmission.com http://mailman.xmission.com/cgi-bin/mailman/listinfo/math-fun
For arbitrarily long strings or texts, I use my own algorithm to measure their "distance". I choose a window size N (as in a "dissociated press" algorithm, see http://en.wikipedia.org/wiki/Dissociated_press) and count the number of occurrences of all N-character substrings in both source texts. The amount of agreement between the two texts is based on these counts. Balanced binary trees are used to keep track of all the counts, so the algorithm is O(n log n) given an input dataset size of n characters. I normalize the result to fall in the range (0.0...1.0). There are more details at this page: http://mrob.com/pub/math/sloandora/index.html which describes my search engine and exploration tool for the Sloane Integer Sequences database. Using this substring-counting metric (which I call a "character-level concordance metric") the exploration tool does a very good job of predicting which database entry the user should see given his ratings on previously-viewed entries. - Robert On Thu, Aug 26, 2010 at 15:24, Marc LeBrun <mlb@well.com> wrote:
A very rough kind of ³lexical distance² between strings (possibly infinite) is just the number of symbols by which they differ.
Is there an Official Name for this?
-- Robert Munafo -- mrob.com
Quoting Marc LeBrun <mlb@well.com>:
A very rough kind of ³lexical distance² between strings (possibly infinite) is just the number of symbols by which they differ.
Is there an Official Name for this?
This has already been pretty well answered, but it might be worth noting that this metric is the basis of the topology of symbolic dynamics, as defined by Hedlund, or discussed in the book by Marcus and Lind. - hvm ------------------------------------------------- www.correo.unam.mx UNAMonos Comunicándonos
="mcintosh@servidor.unam.mx"
This has already been pretty well answered,
Yes--and many thanks to everyone here for the many good leads to follow up. Having names for googling sure helps a lot!
this metric is the basis of the topology of symbolic dynamics
Interesting. I'm trying to grok processes where infinite strings can be said to converge element-wise, but the strings themselves are only roughly inter-related--having Hamming distance, but no meaningful lexical order.
participants (5)
-
Henry Baker -
Marc LeBrun -
mcintosh@servidor.unam.mx -
Robert Munafo -
Thomas Colthurst