--- Michael Kleber <kleber@brandeis.edu> wrote:
Thomas Colthurst wrote:
This appears to be a variant of the coupon collector's problem, as discussed in http://www.math.uci.edu/~mfinkels/COUPON.PDF . They give a maximum likelihood estimate of N as the smallest integer j >= C_k satisfying
j + 1 ( j )^k ----------- (-----) < 1 j + 1 - C_k ( j+1 )
where C_k is the number of distinct things you saw in your k samples.
Thanks for the pointer; I'll read through the paper. Both in their maximum likelihood answer above, and in the paper in general, they seem to use only the total number of distinct things seen, and not the distribution of multiplicities with which you saw them. I wonder whether allowing the use of that extra data helps at all.
--Michael Kleber
You certainly do want to use all the available information. What you want is the probability P(n1, n2, ... | N) that you get n1 objects once, n2 objects twice, etc., where 1 n1 + 2 n2 + ... = n, n is the number of draws, and N is the unknown actual number of objects. By Bayes theorem, this equals P(N | n1, n2, ...), the relative likelihood that there are N objects, based upon the observation n1, n2, ... . Multiply this by the prior probability P0(N), and you get the (relative) posterior probability for N. I don't have time to work out the general case, so I'll just do an example with n = 3 draws. P(0,0,1|N) = 1/N^2 (one object thrice). P(3,0,0|N) = (N-1)(N-2)/N^2 (3 objects once each). P(1,1,0|N) = 3(N-1)/N^2 (1 object once, another twice). If you get 3 different objects, you know that N can't be 1 or 2, and that N=3 has become 2/9 less likely than any one large value of N. But you can't discern between different large values of N, and for these, you have no further information beyond that contained in your prior. If you get the same object three times, you get a convergent posterior, even if you pick the improper uniform prior. Common sense predicts that when each object is observed multiple times, the posterior will sharply select a unique N. On the other hand, a distribution with a long tail of objects appearing just once will provide little information at large N. An observation (n1, n2, ...) that is atypical of any N may lead one, in a real world situation, to entertain additional hypotheses, for example, that we have been disinformed. Problems like this one, Bayes theorem, scientific inference, probability and common sense, probability as an extension of deductive logic, these are the subject matter of Edwin T. Jaynes' book "Probability, the Logic of Science". Gene __________________________________ Do you Yahoo!? New and Improved Yahoo! Mail - 100MB free storage! http://promotions.yahoo.com/new_mail