Andy Latto wrote:
Suppose we take n independent samples from a normal distribution (let's fix the distrubution as having mean 0 and variance 1). As n increases, what happens to the expected value of (largest sample - 2nd largest sample)?
How sensitive is this answer to the normality of the distribution? What do we need to know about the distribution to conclude that as n increases, this difference will behave the way it does for a normal distribution?
Let {X[i], i = 1 to n} be a set of n independent identically distributed random variables. For each i, the cumulative distribution function is F(x) = Prob[X[i] <= x], and (assuming F(x) is differentiable) the probability density function is f(x) = F'(x). Let X and Y be, respectively, the largest and second largest numbers in this set. Then the joint probability density function of X and Y is: f(x,y) = n(n-1) f(x) f(y) F(y)^(n-2) for x >= y = 0 for x < y E[X-Y] = C n Integral(F(x)^(n-1) - F(x)^n, all x) My calculus is rusty so I'm pretty sure I lost a constant factor somewhere! ... When the random variables are normally distributed my Monte Carlo estimates agreed with this formula using C = Sqrt(2 Pi). For the normal distribution, the formula E[X-Y] = K / (log(n)^q) works pretty well for q just above 1/2. But E[X-Y] actually shrinks a bit more slowly than the right hand side. Paul