Eugene Salamin wrote:
Common sense predicts that when each object is observed multiple times, the posterior will sharply select a unique N. On the other hand, a distribution with a long tail of objects appearing just once will provide little information at large N.
Unfortunately, I expect that my real-world application will turn out to be the latter case; that my number of samples will be small enough that I'll be lucky to ever see anything three times. (In which case, I suppose, knowing the total number of distinct elements seen is all the information there is!) On Thomas's suggestion (off-list), I had already started trying to work out the Bayesian approach, but to say that my skills here are rusty would be misrepresenting the presence of some mettle (heh) in the first place. Maybe tomorrow I'll try to work out the case where each thing is seen at most three times.
An observation (n1, n2, ...) that is atypical of any N may lead one, in a real world situation, to entertain additional hypotheses, for example, that we have been disinformed.
Yes, that's quite likely for me too. (In particular, my ability to recognize when two draws are actually the same or different is almost certainly imperfect.)
Problems like this one, Bayes theorem, scientific inference, probability and common sense, probability as an extension of deductive logic, these are the subject matter of Edwin T. Jaynes' book "Probability, the Logic of Science".
On that note, let me announce my change in employment. Having been a math professor at MIT and Brandeis for six years, I've just started doing something different. As of last month, I'm working at the Broad (rhymes with "road") Institute, an umbrella organization for research on genomics in medicine, affiliated with MIT, Harvard, and the Whitehead Institute (which the whole structure was a part of until last November). This is Eric Lander's institute, the birthplace of the Human Genome Project and probably the world's leading center for genome sequencing. You may have seen the news story two or three weeks ago that the dog genome had just been released; that was us. I'm in the Whole Genome Assembly group. We do "shotgun assembly" : the people in the laboratory take the DNA from an organism and chop it up into lots of little pieces, and read the sequences of letters near each end of each little piece; we take all the pieces and put them back together. Immediate change: the old mathematician's dilemma of how to deal with the question "But what real use is your work?" is entirely gone. "Oh, we're going to cure cancer" is sort of an ace in the hole... --Michael Kleber kleber@broad.mit.edu