This idea of fudging the median by introducing some kind of averaging bothers me in a more fundamental way. If a distribution Y exists over a set X, then for the median to be defined, it suffices that X be an ordered set. There is no necessity that X possess an addition, although of course Y must have an addition so that you know when it's split in half. For example, if I define the order: mouse < cat < bear, and I have a mouse and 2 cats, then my median pet is a cat. But if I adopt a bear, what would the median become? Perhaps the pair (cat, bear); that's much more sensible than (cat+bear)/2. Even for distributions over the real line, medians and quantiles are often preferred to averages and standard deviations, since the former always exist while the latter need not. Example: the Cauchy distribution 1/(1+x^2). Suppose one has a distribution p(x), and one wishes to describe it by a single number, a central value, c, of x. Under the criterion of minimizing the mean square error int(p(x) (x-c)^2 dx), assuming the integral exists, we know that c is the mean. But here is something perhaps not as familiar. If instead we wish to minimize the mean absolute error, int(p(x) |x-c| dx), again assuming that the integral exists, then c is the median. Gene __________________________________ Yahoo! Mail - PC Magazine Editors' Choice 2005 http://mail.yahoo.com
participants (1)
-
Eugene Salamin