[math-fun] Salamin's "Bayesian" coin-tester
From: Eugene Salamin <gene_salamin@yahoo.com> It's beginning to look like we don't agree on what Bayesian analysis is. I'm saying that the posterior probability after n tosses updates and supersedes the posterior after k < n tosses.? The posterior provides everything that can be known about x, the probability that the coin comes up heads.? Bayes can't tell you if the coin is fair; that's for you to decide knowing the posterior.
? --? Gene
--so: your idea is this(?): initial ("prior") distribution of the coin-bias x is uniform on (0,1). After T+H coin tosses, the posterior is a beta(1+T,1+H) distribution on (0,1). One keeps tossing coins and as we do so the posterior keeps changing. If at any moment the CDF of the beta(1+T, 1+H) distribution at 1/2, is either less than K/2 or greater than 1-K/2, then we stop and declare "coin appears unfair, with confidence>=1-K." Is that right? If so, that was wrong. Why? Well it is good in the sense that if the coin were unfair, then this test would (with probability=1) ultimately indeed terminate and say so (regardless of the K with 0<K<1). But it is no good in the sense that if the coin genuinely were fair, then by the "law of iterated logarithm" with probability=1 an infinite number of days will come, during which |T-H| > sqrt(1.99 * N * lnlnN) where N=T+H. Therefore, it is guaranteed (with probability=1) this procedure will terminate and announce the coin is unfair. Regardless of the value of K with 0<K<1. Because (in Wolfram notation) Beta[ 1/2, N/2-Sqrt[N*0.49*Log[Log[N]]], N/2+Sqrt[N*0.49*Log[Log[N]]] ] goes to 0 when N-->infinity. Here is a table for N=10^k k | Beta value 1 | 0.00369675 2 | 1.73794*10^-30 3 | 9.6119*10^-302 4 | 2.1738412136*10^-3011 5 | 1.715030375*10^-30104 6 | 6.5603975*10^-301032 7 | 2.6446880*10^-3010302 8 | 2.344465*10^-30103002 9 | 6.6534*10^-301029999 In short: this procedure always claims coin is unfair, whether it is or not. After 10^9 tosses it typically will report enormous confidence a fair coin is unfair, like 0.9999...9 with about 300 million nines. The root cause of the problem here was the standard "repeated test banana peel" that I'd mentioned previously.
Because (in Wolfram notation) Beta[ 1/2, N/2-Sqrt[N*0.49*Log[Log[N]]], N/2+Sqrt[N*0.49*Log[Log[N]]] ] goes to 0 when N-->infinity. Here is a table for N=10^k k | Beta value 1 | 0.00369675 2 | 1.73794*10^-30 3 | 9.6119*10^-302 4 | 2.1738412136*10^-3011 5 | 1.715030375*10^-30104 6 | 6.5603975*10^-301032 7 | 2.6446880*10^-3010302 8 | 2.344465*10^-30103002 9 | 6.6534*10^-301029999
--Actually, I'm skeptical this table output by Wolfram software, is numerically accurate. But the beta value really does go to 0, which is what matters for my argument. -- Warren D. Smith http://RangeVoting.org <-- add your endorsement (by clicking "endorse" as 1st step)
Your values agree well with using 49/100`99 instead of .49 . Aside: In[168]:= 2.^-10^9 Out[168]= 2.167797967617*10^-301029996 If we multiply your table by 2^10^N, 3.785469816580231 2.203100387484833 1.02992329497252 0.4336950427046037 0.1713318934465506 0.06495224068002137 0.02393394372191234 0.008638570596050886 0.003069201032101502 --rwg On 2016-03-05 17:44, Warren D Smith wrote:
Because (in Wolfram notation) Beta[ 1/2, N/2-Sqrt[N*0.49*Log[Log[N]]], N/2+Sqrt[N*0.49*Log[Log[N]]] ] goes to 0 when N-->infinity. Here is a table for N=10^k k | Beta value 1 | 0.00369675 2 | 1.73794*10^-30 3 | 9.6119*10^-302 4 | 2.1738412136*10^-3011 5 | 1.715030375*10^-30104 6 | 6.5603975*10^-301032 7 | 2.6446880*10^-3010302 8 | 2.344465*10^-30103002 9 | 6.6534*10^-301029999
--Actually, I'm skeptical this table output by Wolfram software, is numerically accurate. But the beta value really does go to 0, which is what matters for my argument.
On 06/03/2016 01:34, Warren D Smith wrote:
--so: your idea is this(?): initial ("prior") distribution of the coin-bias x is uniform on (0,1). After T+H coin tosses, the posterior is a beta(1+T,1+H) distribution on (0,1). One keeps tossing coins and as we do so the posterior keeps changing. If at any moment the CDF of the beta(1+T, 1+H) distribution at 1/2, is either less than K/2 or greater than 1-K/2, then we stop and declare "coin appears unfair, with confidence>=1-K."
As you correctly observe, that would be a bad idea. But I don't think it's what Eugene meant -- and if it is, all that means is that he applied a good idea badly. Your own proposed approach has a related property: it will never say "the coin is fair", so instead of always terminating at some point and saying the coin is unfair it will always either do that *or else run for ever until you die*. It's not clear that that's actually any better. If you have a testing procedure that, with probability p>0, terminates within N tosses saying "fair" when the coin is perfectly fair, then for any q<p there's a range of Pr(heads) within which the test will say "fair" within N tosses with probability at least q. So *if you want a procedure that can ever actually tell you the coin is fair* I think you have to accept that it can only do so within epsilon; you get to choose epsilon but you can't make it zero. Once you've chosen your epsilon, here's a version of Eugene's test that (I think) doesn't have a bias problem like the one you describe: - Keep tossing until Pr(|p-1/2|<epsilon) is either >= 1-K or <= K. Then stop and report "fair" or "unfair" respectively. Here Pr() denotes your posterior probability as given by the beta distribution. -- g
From: Gareth McCaughan <gareth.mccaughan@pobox.com> To: math-fun@mailman.xmission.com Sent: Saturday, March 5, 2016 6:12 PM Subject: Re: [math-fun] Salamin's "Bayesian" coin-tester On 06/03/2016 01:34, Warren D Smith wrote:
--so: your idea is this(?): initial ("prior") distribution of the coin-bias x is uniform on (0,1). After T+H coin tosses, the posterior is a beta(1+T,1+H) distribution on (0,1). One keeps tossing coins and as we do so the posterior keeps changing. If at any moment the CDF of the beta(1+T, 1+H) distribution at 1/2, is either less than K/2 or greater than 1-K/2, then we stop and declare "coin appears unfair, with confidence>=1-K."
As you correctly observe, that would be a bad idea. But I don't think it's what Eugene meant -- and if it is, all that means is that he applied a good idea badly. Your own proposed approach has a related property: it will never say "the coin is fair", so instead of always terminating at some point and saying the coin is unfair it will always either do that *or else run for ever until you die*. It's not clear that that's actually any better. If you have a testing procedure that, with probability p>0, terminates within N tosses saying "fair" when the coin is perfectly fair, then for any q<p there's a range of Pr(heads) within which the test will say "fair" within N tosses with probability at least q. So *if you want a procedure that can ever actually tell you the coin is fair* I think you have to accept that it can only do so within epsilon; you get to choose epsilon but you can't make it zero. Once you've chosen your epsilon, here's a version of Eugene's test that (I think) doesn't have a bias problem like the one you describe: - Keep tossing until Pr(|p-1/2|<epsilon) is either >= 1-K or <= K. Then stop and report "fair" or "unfair" respectively. Here Pr() denotes your posterior probability as given by the beta distribution. Gene: That's exactly what I said in an earlier post, except that I allowed Pr(0.5-e1<p<0.5+e2)>1-e3. -- g
The posterior is P(x) = x^h (1-x)^t. Taking the log and differentiating, we see that it peaks at x = p = h/n, where n = h+t. Expanding about x = p, log P(x) = constant - (1/2) (n/pq) (x-p)^2, where q = t/n = 1-p. For h,t >> 1, P(x) is approximately Gaussian with mean p, variance n/pq, and standard deviation s = sqrt(n/pq) = 2 sqrt(n) if we're fussing about a nearly fair coin. For a fairness decision, we need two parameters k and e (and the opinions enter via their choice). We can say that the coin is (k,e)-fair if the center band of the Gaussian, the interval [p-ks, p+ks] is contained within the interval [1/2-e, 1/2+e], (k,e)-unfair if these intervals are disjoint, and otherwise undecided. The larger is k or the smaller is e, the more stringent is the test, and the longer it must run. Remark: If a coin of known heads probability p is tossed n times, the mean number of heads is np and the standard deviation is sqrt(npq). Did I make a mistake when I said s = sqrt(n/pq)? Well, its not quite the same situation. The direct probability involves a binomial coefficient, while the Bayesian posterior does not. The law of the iterated logarithm states that the limsup of the deviation from the mean measured in standard deviations is sqrt(2 log log n). If we have the bad luck to take the posterior at one of these peak deviations, our test could give a false result. Once a large deviation occurs, there is no "restoring force" to undo it; it sticks. Does LIL undermine the utility of Bayes theorem? Bayes theorem uses only the final posterior. Is there extra information to be gleaned from earlier posteriors? I need to think about it. Bayes doesn't guarantee certainty, just a known confidence limit, so perhaps LIL is one of the ways it can occasionally fail. As a practical matter, LIL may not even intrude into any feasible certification test. A popular criterion is 3-sigma. If sqrt(2 log log n) = 3, n = 1.2e39. -- Gene
participants (4)
-
Eugene Salamin -
Gareth McCaughan -
rwg -
Warren D Smith