I'm just going to wing it here, but I think this is a problem that was important in W.W. I, where occasionally a German tank would be captured and its serial number noted. Eventually this information was used to estimate how many tanks they had altogether. (Here I use a well-known mathematician's trick dating back millennia: If you don't know the answer to a problem, solve a different one.) Suppose we pick n random points {x_j} = x_1,...,x_n from an interval [0,1] in R. Also, rename the points according to their order: x_(1) < x_(2) < ... < x_(n) . Therefore, the probability that the maximum point x_(n) on [0,1] is <= t (t in [0,1]) is given by t^n, the probability that all n points lie at or to the left of t. So the density of the maximum x_(n) is d_max(t) = n t^(n-1), so its expected value is E(x_(n)) = Integral_{0<=t<=1} n t^n dt = n/(n+1) . Symmetrically, the expected value of x_(1) = min{x_j} is E(x_(1)) = 1/(n+1) . The probability that x_(1) >= t is given by (1-t)^n (all x_j are at least t). So Prob(x_(1) <= t) is 1 - (1-t)^n and so its density is (*) d_min(t) = n (1-t)^(n-1). The n+1 points consisting of the n random points plus the point 0 := (0 ~ 1) may be thought of as uniformly distributed on the resulting circle C = R/Z. So the setup is the same as n+1 points uniformly distributed on C. Therefore by symmetry we can conclude that the length of each interval between successive points x_(1) < x_(2) < ... < x_(n) has the same density (*) as does x_(1) = x_(1) - 0. Now suppose we don't know the length L of the original interval, which we assume to be [0,L]. ((( We want the joint distribution of x_(1) and x_(n) to infer the maximum likelihood value of ML(n) = L/(x_(n) - x_(1)). This can be used to infer L as L = approx. ML(n)* (x_(n) - x_(1)). ))) BUT: For now, back to the unit interval [0,1]: Wikipedia states that the joint density of u = x_(1) and v = x_(n) is f(u,v) = (n! / (n-2)!) (v-u)^(n-2) At this point I have to go; maybe more later. —Dan
On Feb 3, 2016, at 9:13 AM, Fred Lunnon <fred.lunnon@gmail.com> wrote:
. . .
I don't know in advance how many distinct coupons are available, but have collected m among which there are just k distinct.
What is the probability that n distinct coupons are available?
What is the most likely value of n ?
What are asymptotic expressions for large m ?