[math-fun] n-D median?
I said:
o Rotating, translating or scaling the points should do the same to the result. o Embedding in a higher-dimensional space shouldn't matter.
"Afine equivariant."
More generally, you want a function that finds the middle of the pack, tending to pay less attention to stragglers.
"Outliers". In statistics, trying to be insensitive to outliers or noise is called "robustness," and convex peeling (as well as later non-convex peeling) is one way to do this for "multivariate" data. Gastwirth, J.L. (1966), On Robust Procedures. Journal of the American Statistical Association, 61, 929-948 My first method (the second was nonsense but could be patched but doesn't matter) starts out seeming simple and well-motivated, but when it has to iterate it gets complicated in a way that's ill-motivated anyway. Peelings seem more arbitrary at first, but second-through-nth peels are at least arbitrary in the same way. You could get more plausible by building a model that's a mapping of a normal distribution. Or, you could get simpler by taking my method one step and then taking an average. The nice thing about the 1D median is its grand indifference to details. --Steve
More generally, you want a function that finds the middle of the pack
There's an alternative interpretation of median that you could conceivably want instead: In 1-D the median elements split the set into two chunks, one "lesser" and one "greater", with at most half the elements in each chunk. In n-D we might generalize "lesser/greater" to "inner/outer", and only peel half-way. That is, instead of finding the innermost city core, we take as the median elements those on the boundary between downtown and the suburbs. -------- A carefully implemented peeling algorithm must account for the population at each point. Each peel step decrements the multiplicity, removing the point from the hull candidates when it reaches zero. This naturally generalizes to point distributions with fractional densities: each step subtracts the minimum multiplicity from the points on the hull. -------- I can even visualize how this metaphor could be applied to a continuous spatial distribution--kind of like a rock in acid--the higher densities dissolve slower. -------- So, how does this all work for distributions with infinite domains? Do they have medians if they have finite integrals? -------- What happens if we generalize the density further, say from fractional values to complex quantities?
participants (2)
-
Marc LeBrun -
Steve Witham