[math-fun] Trimmed means and Multi-dimensional medians

29 Sep 2005

      While we're at it, the "trimmed mean" is another way to combine the
robustness of the median with the acuity of the mean.
the k-trimmed-mean throws out the k most extreme data points (if I remember
correctly, and probably k should be even) then takes the mean of the rest.
The pure mean is then equivalent to the 0-trimmed mean, and the pure median
is equivalent
to the (n-1)-trimmed mean (where n is the number of data points in the set).

Here's an interesting question: suppose we have data X_1, ..., X_n drawn
from a Gaussian distribution with unknown mean
mu and known variance 1. We wish to estimate mu with a guess muhat.
Virtually everyone uses the sample mean of the dataset as an estimate of mu,
but note that mu is also the *median* of the distribution. Under what
circumstances would we be justified in prefering the sample median of the
data to estimate mu? Since the sample average is a sufficient statistic, the
answer might be never, but I'm not sure. Might it be the case that that the
sample median is preferable if we are using L1 loss, i.e., seeking to
minimize E_mu |mu - muhat| ?

Here is another question about the median: is there a median that makes
sense in two or more dimensions?
Suppose (X,Y) ~ f(x,y) where f(x,y) is the continuous joint pdf of the
random variables X and Y. Is there a reasonable
quantity to call the median?

-Joshua

On 9/29/05, Mike Speciner <speciner@ll.mit.edu> wrote:
...
So, if y(x) is the histogram, the median is the m such that
integral(x<m) y(x) = integral(x>m) y(x)
while the mean is the m such that
integral(x<m) |x-m|*y(x) = integral(x>m) |x-m|*y(x)
This suggests a whole family of averages (using various functions of
(x-m) for the weighting), though what use they might have escapes me.
--ms
David Gale wrote:
...
Jim, what is the Propp median if there are m zeros and m fives (and zero
everything else)?
Dan, if you're going to bring in averages at all then why not go all the
way and use THE average? But maybe the CDC was using some sort of hybrid
like the one you suggest.
D
At 09:14 PM 9/28/2005, you wrote:
...
The picture was supposed to show a rectangle of width 1 and height 2
whose bottom is centered at x=1, and to the right of it, a rectangle
of width 1 and height 1 whose bottom is centered at x=2.
The base of the first rectangle goes from x=1/2 to x=3/2, and the base
of the second rectangle goes from x=3/2 to x=5/2.
The total area under the histogram is (1)(2)+(1)(1) = 3.
The area to the left of the line x=5/4 is (5/4-1/2)(2) = 3/2, which
is half of the total area. So x=5/4 is the "median".
Jim
_______________________________________________
math-fun mailing list
math-fun@mailman.xmission.com
http://mailman.xmission.com/cgi-bin/mailman/listinfo/math-fun
_______________________________________________
math-fun mailing list
math-fun@mailman.xmission.com
http://mailman.xmission.com/cgi-bin/mailman/listinfo/math-fun
_______________________________________________
math-fun mailing list
math-fun@mailman.xmission.com
http://mailman.xmission.com/cgi-bin/mailman/listinfo/math-fun

joshua sweetkind-singer

Eugene Salamin

joshua sweetkind-singer

tags

participants (2)