Re: [math-fun] Trimmed means and Multi-dimensional medians

29 Sep 2005

      Two or three points about this:

First, the standard median depends only on the topology of the line.   
In that spirit, it might be best to leave the answer as either a  
point or an interval. There are many contexts where the topology is  
clear, but the linear structure is not.     In 2 dimensions, any  
finite set is homeomorphic to any other finite set of the same  
cardinality---there can't be an analogous notion of median, without  
more structure.  Dan's definition assumes the notion of convexity,  
which is equivalent to knowing what straight lines are.  The only  
homeomorphisms of the plane that preserve straightness of lines are  
also affine, so the mean would also be preserved.  You can't be so  
agnostic about linearity and get an answer!    But more often, 2- 
dimensional data has natural coordinates. In that case, you may as  
well look at the median in the two axes.  The ordering for 2  
projections is a lot less structure than the affine structure, and  
may be preferable for most applications.

Second: the convex peeling definition is interesting, but, how much  
can it  jump around?  I suspect it can jump too far to be a very good  
statistical  measure.  I.e. imagine you have 50 points that are  
nearly along the x-axis, and a
bunch of other points scattered around in the upper half plane. In  
one arrangement, the 50 could all be on the
boundary of the convex hull, but in a nearby arrangement (on an arc  
curved downward) it could take 25 levels of peeling to remove them all.

Third: If you really have data where the affine structure makes  
sense, how about looking at the median for every linear projection,  
and just taking the convex hull of all answers you get as the "median"?

     Bill
On Sep 29, 2005, at 3:51 PM, "" <dasimov@earthlink.net> wrote:
...
Joshua writes:
<<
. . .
Here is another question about the median: is there a median that  
makes
sense in two or more dimensions?
Suppose (X,Y) ~ f(x,y) where f(x,y) is the continuous joint pdf of the
random variables X and Y. Is there a reasonable quantity to call  
the median?
...
...
For a *finite uniform* distribution, a reasonable way to generalize  
the 1D median (on R)
to R^n is to use "convex peeling":
do
Take the convex hull of the data, then remove data on the  
boundary of its convex hull;
while data remains.
When eventually a removal leaves no remaining data, put these last  
points back
and let their vector mean be the median.
(For a continuous distribution, I suspect that by taking an  
arbitrarily large finite
sample, the result of convex peeling should converge, with  
probability 1, to a point
depending only on the original distribution.
Better yet, there's probably some differential equation that  
implements this limit
without resorting to samples.  WPT ?)
--Dan
_______________________________________________
math-fun mailing list
math-fun@mailman.xmission.com
http://mailman.xmission.com/cgi-bin/mailman/listinfo/math-fun

Re: [math-fun] Trimmed means and Multi-dimensional medians

Bill Thurston