Allan complained that maximum likelihood breaks down when k = m (every coupon in sample occurring once). However this behaviour is perfectly appropriate, since such a sample is obviously too small to yield any further information, other than " m is too small" ! Now I (think that I) at last have a usable expression for the probability that just n-l (ell) from n coupons occur in m trials: p(n, l, m) = SUM_{l <= i <= n} (-1)^(l-i) (1 - i/n)^m ( n! / (n-i)! (i-l)! l! ) --- it would be much appreciated if some kind soul could validate this! Feeding in my somewhat parsimonious data yields approx. p(4, 0, 18) = 0.98 , p(5, 1, 18) = 0.088 , p(6, 2, 18) = 0.0099 , p(7, 3, 18) = 0.0014 , ... The probabilities drop off exponentially at first, though eventually their successive ratio approaches unity. If I have correctly understood the maximum likelihood principle, this constitutes statistically clear evidence that n = 4 in this case; though at this stage I'm not entirely sure how best to attach a numerical indicator of significance to these numbers. Fred Lunnon On 2/4/16, Gareth McCaughan <gareth.mccaughan@pobox.com> wrote:
On 04/02/2016 02:44, Fred Lunnon wrote:
Without some explicit expressions to discuss, I'm grasping at air here. But I find it difficult to credit that assuming a uniform distribution of n in (say) [1..100] would result in an estimated n substantially different from [1..10^10] , when --- in my present experiment --- I had k = 4 coupons at m = 7 trials, and k remains unchanged at m = 17 .
I think this (suitably generalized) is probably correct. (I haven't actually done any of the relevant calculations.) It's equivalent to max likelihood.
But Allan's original claim wasn't that max likelihood doesn't give a definite answer, it was that since you may have prior opinions about how likely any given n is, you need to take them into account when choosing an n in light of the evidence. And for that there's no particular reason why you should use max-likelihood -- and the fact that for k=m max-likelihood says bigger n are always better seems sufficient to indicate that it isn't a good general answer.
-- g
_______________________________________________ math-fun mailing list math-fun@mailman.xmission.com https://mailman.xmission.com/cgi-bin/mailman/listinfo/math-fun