[math-fun] Re: n-D median redefined
I should point out explicitly that Richard's definition minimum of sum of absolute values of differences and mine: zero-crossing-like-point of sum of unit vectors of differences Are like the minimum of a function and the zero of its derivative-- they tend to agree. (Waffling because I'm not equal to the hairiness.)
From: "Fred lunnon" <fred.lunnon@gmail.com>
It's not at all obvious whether this function has multiple (local) minima --- in which case it may be nontrivial to compute. Almost certainly, the sum of the squares of the distances would be more tractable.
More tractable, yes, but isn't the minimum of the sum of squares the average rather than the median? Maybe you could sum sqrt( distance^2 + c )...but for what c?
What relation might these definitions bear to the convex hull stripping algorithm mentioned earlier? WFL
Besides that, as Gareth McCaughan points out, they're identical in the 1D case, both ignore the distance to the points and pay attention to their direction. I suspect the convex peel has a different purpose: it lets you strip away some fraction of points as "outliers" and consider the remaining points more reliable data. The convex peel can be used as a multi- dimensional version of the quantile (as in "in the top 10%") or as a way to estimate density.
From: Gareth McCaughan <gareth.mccaughan@pobox.com>
(quoting Richard Harter:)
and find the set of x's such f(x) is a minimum. All of these are valid medians; however the centroid of the set is in some sense the most central median.
The centroid needn't be a median in this sense at all in dimensions other than 1.
Richard is talking about the centroid of the set of median- like points, not the centroid of the original set.
In two dimensions, suppose your points consist of the vertices of a regular 999-gon inscribed in |z|=1, together with a single point at (10^6,0). Then the centroid is (1000,0) but f(1000-h,0) is approximately f(1000,0) - 998h and so the centroid isn't even a local minimum of f.
Sure, this is the whole idea of the difference between an average and a median.
Of course the corresponding property doesn't hold for f even in two dimensions. In fact, if g from R^2 to R has the property that sum_A g(x-a) is constant when x is a convex combination of the elements of A, then I claim g is constant; [...]
In any case my/Harter's median functions inside the convex hull aren't constant.
Is there some f of less simple form with some such property that would let us play the hull-stripping game?
I think of hull-stripping as a cheap substitute. It's like you're asking for a better approximation to a calculation error. But... with the sum of the absolute differences, you can draw contours (I should improve my applet). By the way there are also concave peels based on the Delaunay triangulation. The funny thing is, I don't think the convex peel is easier to calculate. J.L. Gastwirth, its inventor, has kindly sent a couple papers, saying,
I am not sure that what you did is exactly what I did but it is related. Here is my paper and an extension I did in 1985. The issue you discuss is also related to work of Prof. Regina Liu at Rutgers on defining the median in terms of Depth in multivariate data.
Gastwirth, J.L. (1966), On Robust Procedures. Journal of the American Statistical Association, 61, 929-948 http://tinyurl.com/2xt3qp The Use of Maximin Efficiency Robust Tests in Combining Contingency Tables and Survival Analysis Joseph L. Gastwirth Journal of the American Statistical Association, Vol. 80, No. 390. (Jun., 1985), pp. 380-384. http://tinyurl.com/2y8t2b Also found a paper that mentions... Barnett (1976) and Small (1990) offer comprehensive surveys of different proposed alternative definitions of multivariate medians. Barnett, V. (1976) The ordering of multivariate data. Journal of the Royal Statistical Society A, 139(3), 318-344. Small, C.G. (1990) A survey of multidimensional medians. International Statistical Review, 58(3), 263-277. --Steve
On Thursday 06 December 2007, Steve Witham wrote:
From: Gareth McCaughan <gareth.mccaughan@pobox.com>
(quoting Richard Harter:)
and find the set of x's such f(x) is a minimum. All of these are valid medians; however the centroid of the set is in some sense the most central median.
The centroid needn't be a median in this sense at all in dimensions other than 1.
Richard is talking about the centroid of the set of median- like points, not the centroid of the original set.
Ooops, I see. My apologies for the misunderstanding. -- g
participants (2)
-
Gareth McCaughan -
Steve Witham