https://en.wikipedia.org/wiki/German_tank_problem On Wed, Feb 3, 2016 at 3:46 PM, Fred Lunnon <fred.lunnon@gmail.com> wrote:
I had heard Dan's serial-number anecdote as well, though my informant claimed it related to rockets. [ I did often wonder how one retrieved such data from the exploded fragments, but supposed the missiles might often fail to detonate ... ] And I had that in mind when posing this problem. But here we are sampling with replacement; and no distance measure seems available for coupons!
Is the problem well posed, doubts Alan. Well, the expected number of trials to acquire all n coupons asymptotically equals n log n --- a result which, while still a student, I succeeded in solving at the cost of several pages of convoluted algebra, which I later proudly showed to an older colleague. Who proceeded to derive the same result in a single line. A similar fate has overtaken most of my forays into probability.
Ahem, I digress. Now I argue that if a trial reaches reaches k distinct coupons in m > c k log k trials, where (say) c = 2, or 10, then it's a damned good bet that n = k . Anybody disagree?
Fred Lunnon
On 2/3/16, Dan Asimov <dasimov@earthlink.net> wrote:
I'm just going to wing it here, but I think this is a problem that was important in W.W. I, where occasionally a German tank would be captured and its serial number noted. Eventually this information was used to estimate how many tanks they had altogether.
(Here I use a well-known mathematician's trick dating back millennia: If you don't know the answer to a problem, solve a different one.)
Suppose we pick n random points {x_j} = x_1,...,x_n from an interval [0,1] in R.
Also, rename the points according to their order:
x_(1) < x_(2) < ... < x_(n)
.
Therefore, the probability that the maximum point x_(n) on [0,1] is <= t (t in [0,1]) is given by t^n, the probability that all n points lie at or to the left of t.
So the density of the maximum x_(n) is
d_max(t) = n t^(n-1),
so its expected value is
E(x_(n)) = Integral_{0<=t<=1} n t^n dt
= n/(n+1)
. Symmetrically, the expected value of x_(1) = min{x_j} is
E(x_(1)) = 1/(n+1)
. The probability that x_(1) >= t is given by (1-t)^n (all x_j are at least t). So Prob(x_(1) <= t) is 1 - (1-t)^n and so its density is
(*) d_min(t) = n (1-t)^(n-1).
The n+1 points consisting of the n random points plus the point 0 := (0 ~ 1) may be thought of as uniformly distributed on the resulting circle C = R/Z.
So the setup is the same as n+1 points uniformly distributed on C.
Therefore by symmetry we can conclude that the length of each interval between successive points
x_(1) < x_(2) < ... < x_(n)
has the same density (*) as does x_(1) = x_(1) - 0.
Now suppose we don't know the length L of the original interval, which we assume to be [0,L].
((( We want the joint distribution of x_(1) and x_(n) to infer the maximum likelihood value of
ML(n) = L/(x_(n) - x_(1)).
This can be used to infer L as
L = approx. ML(n)* (x_(n) - x_(1)). )))
BUT: For now, back to the unit interval [0,1]:
Wikipedia states that the joint density
of u = x_(1) and v = x_(n) is
f(u,v) = (n! / (n-2)!) (v-u)^(n-2)
At this point I have to go; maybe more later.
—Dan
On Feb 3, 2016, at 9:13 AM, Fred Lunnon <fred.lunnon@gmail.com> wrote:
. . .
I don't know in advance how many distinct coupons are available, but have collected m among which there are just k distinct.
What is the probability that n distinct coupons are available?
What is the most likely value of n ?
What are asymptotic expressions for large m ?
_______________________________________________ math-fun mailing list math-fun@mailman.xmission.com https://mailman.xmission.com/cgi-bin/mailman/listinfo/math-fun
_______________________________________________ math-fun mailing list math-fun@mailman.xmission.com https://mailman.xmission.com/cgi-bin/mailman/listinfo/math-fun
-- Mike Stay - metaweta@gmail.com http://www.cs.auckland.ac.nz/~mike http://reperiendi.wordpress.com