Here is a question that's bugged me for a while: Is there a standard or obviously-right way of defining the median for a bag (like a set but with possibly repeated members) of points in 2D or higher space? One way of closing in on it is to ask what properties I want it to have. I can think of a few: o The 1-D case should be the 1-D median. o Rotating, translating or scaling the points should do the same to the result. o Embedding in a higher-dimensional space shouldn't matter. More generally, you want a function that finds the middle of the pack, tending to pay less attention to stragglers. Here's one naive definition, for X, a bag of points in S: Define dir(x) = { 0 if x = 0, x / |x| if x != 0 } (Is there a standard name for dir()? It's like normalizing, only with a special case for zero. "The Sorting Hat") median_like( c, X ) if sum( { dir(x-c) for x in X } ) = 0. median_set( X ) = { c for c in S if median_like( c, X ) } median( X ) = { c if c is the only member of median_set( X ), median( median_set( X ) ) otherwise One nice thing about this recursive definition (given caviats below) is that it gives, e.g.: median( { 0, 1, 2, 3 } ) = 1.5 Here are some of it's naivetes: 1) There are sets of points in n-D space whose median_sets are continuous sets of points--but maybe (n-1) D curves in n-D space. (This happens with discrete(*) points in 1-D, so I assume it can happen with bags of discrete points in higher dimensions.) 2) If so, you need to take the median of a continuous set--which means summing over the highest-dimensional volume you have, in a higher- dimensional space (for instance over the length of a curve in 2D). Maybe this is implied by using special arithmetic where count < length < area < volume ...? 3) The sum of the dir vectors can have discontinuities that cross zero but don't have a zero. For instance, the bag { 0, 0, 1 } has sums ... +3 ... ( +1 ) ... -1 ... ( -2 ) ... -3 ... Should I count zero-crossings like zeroes and say there's a zero- crossing at zero although the value is 1 there? Is 0 the median of { 0, 0, 1 }? 4) Are there situations where the recursion never finishes? If so, must it have a limit at a point? --Steve (*) "individual?" "named?" like discrete but not necessarily distinct? Does the bag { 0, 0 } have two discrete points?
Consider points more-or-less equally spaced around a circle in 2D. What should be the "median" ? If I cut the set with hyperplanes, I'd like the hyperplanes to cut the set in half. Unfortunately, the intersection of these hyperplanes is likely to be empty. Perhaps you have to turn the problem around. "Having a median in n-D" might be an interesting property of a set, rather than a universal property. At 12:48 PM 11/15/2007, Steve Witham wrote:
Here is a question that's bugged me for a while: Is there a standard or obviously-right way of defining the median for a bag (like a set but with possibly repeated members) of points in 2D or higher space?
One way of closing in on it is to ask what properties I want it to have. I can think of a few: o The 1-D case should be the 1-D median. o Rotating, translating or scaling the points should do the same to the result. o Embedding in a higher-dimensional space shouldn't matter.
More generally, you want a function that finds the middle of the pack, tending to pay less attention to stragglers.
Here's one naive definition, for X, a bag of points in S:
Define dir(x) = { 0 if x = 0, x / |x| if x != 0 }
(Is there a standard name for dir()? It's like normalizing, only with a special case for zero. "The Sorting Hat")
median_like( c, X ) if sum( { dir(x-c) for x in X } ) = 0.
median_set( X ) = { c for c in S if median_like( c, X ) }
median( X ) = { c if c is the only member of median_set( X ), median( median_set( X ) ) otherwise
One nice thing about this recursive definition (given caviats below) is that it gives, e.g.: median( { 0, 1, 2, 3 } ) = 1.5
Here are some of it's naivetes:
1) There are sets of points in n-D space whose median_sets are continuous sets of points--but maybe (n-1) D curves in n-D space. (This happens with discrete(*) points in 1-D, so I assume it can happen with bags of discrete points in higher dimensions.) 2) If so, you need to take the median of a continuous set--which means summing over the highest-dimensional volume you have, in a higher- dimensional space (for instance over the length of a curve in 2D). Maybe this is implied by using special arithmetic where count < length < area < volume ...? 3) The sum of the dir vectors can have discontinuities that cross zero but don't have a zero. For instance, the bag { 0, 0, 1 } has sums ... +3 ... ( +1 ) ... -1 ... ( -2 ) ... -3 ... Should I count zero-crossings like zeroes and say there's a zero- crossing at zero although the value is 1 there?
Is 0 the median of { 0, 0, 1 }? 4) Are there situations where the recursion never finishes? If so, must it have a limit at a point?
--Steve (*) "individual?" "named?" like discrete but not necessarily distinct? Does the bag { 0, 0 } have two discrete points?
participants (2)
-
Henry Baker -
Steve Witham