Re: [math-fun] Getting a complete set

3 Aug 2004

      --- Michael Kleber <kleber@brandeis.edu> wrote:
...
Thomas Colthurst wrote:
...
This appears to be a variant of the coupon collector's problem,
as discussed in http://www.math.uci.edu/~mfinkels/COUPON.PDF .
They give a maximum likelihood estimate of N as the smallest
integer j >= C_k satisfying
j + 1     (  j  )^k
      -----------  (-----)    < 1
      j + 1 - C_k  ( j+1 )
where C_k is the number of distinct things you saw in your k
samples.
Thanks for the pointer; I'll read through the paper.  Both in their
maximum likelihood answer above, and in the paper in general,
they seem to use only the total number of distinct things seen,
and not the distribution of multiplicities with which you saw them.
I wonder whether allowing the use of that extra data helps at all.
--Michael Kleber
You certainly do want to use all the available information.  What you
want is the probability

  P(n1, n2, ... | N)

that you get n1 objects once, n2 objects twice, etc., where

  1 n1 + 2 n2 + ... = n,

n is the number of draws, and N is the unknown actual number of
objects.  By Bayes theorem, this equals

  P(N | n1, n2, ...),

the relative likelihood that there are N objects, based upon the
observation n1, n2, ... .  Multiply this by the prior probability
P0(N), and you get the (relative) posterior probability for N.

I don't have time to work out the general case, so I'll just do an
example with n = 3 draws.

  P(0,0,1|N) = 1/N^2  (one object thrice).

  P(3,0,0|N) = (N-1)(N-2)/N^2  (3 objects once each).

  P(1,1,0|N) = 3(N-1)/N^2  (1 object once, another twice).

If you get 3 different objects, you know that N can't be 1 or 2, and
that N=3 has become 2/9 less likely than any one large value of N.  But
you can't discern between different large values of N, and for these,
you have no further information beyond that contained in your prior.

If you get the same object three times, you get a convergent posterior,
even if you pick the improper uniform prior.

Common sense predicts that when each object is observed multiple times,
the posterior will sharply select a unique N.  On the other hand, a
distribution with a long tail of objects appearing just once will
provide little information at large N.  An observation (n1, n2, ...)
that is atypical of any N may lead one, in a real world situation, to
entertain additional hypotheses, for example, that we have been
disinformed.

Problems like this one, Bayes theorem, scientific inference,
probability and common sense, probability as an extension of deductive
logic, these are the subject matter of Edwin T. Jaynes' book
"Probability, the Logic of Science".

Gene

__________________________________
Do you Yahoo!?
New and Improved Yahoo! Mail - 100MB free storage!
http://promotions.yahoo.com/new_mail

Re: [math-fun] Getting a complete set

Eugene Salamin