Multiple comparisons is an interesting statistical problem but I wouldn't waste effort trying to correct p-values. P-values aren't probabilities, or at least not probabilities of any interest. The prior probability that some difference is exactly nil is usually zero anyway. Better to take a Bayesian approach and ask what model is best supported by the data. http://www.stat.columbia.edu/~gelman/research/unpublished/multiple2.pdf Brent Meeker On 11/18/2014 11:08 AM, Charles Greathouse wrote:
this was a well known literature area that I was probably just rediscovering the wheel about. The magic key words he suggested are "multiple comparison problem."
I think most of us on math-fun are familiar with the problem of multiple comparisons. I searched for that exact phrase, as well as [Bonferroni correction], when this topic first came up last week, and I'm surely not the only one who read Ioannidis' "Why most published research findings are false".
I just don't think that you've clearly explained the conditions under which your formula is applicable. Your example deals with a problem essentially of the form 'Given n values, what are the odds that when picking n values uniformly at random from [0, 1) the largest, second-largest, ..., smallest is at least as large as its corresponding given value?'. I don't doubt that your approach is valid here. But I don't think that this can be applied blindly to a collection of n p-values, even when they are known to be independent, as indeed my earlier example with dice demonstrates. When I brought that up you said that the tests were dependent, then corrected yourself to say that they were independent but "not interchangeable". Whatever you might have meant it seems to show that your formula isn't as general as you seem to be claiming -- even now I don't see any such condition in your paper.
Charles Greathouse Analyst/Programmer Case Western Reserve University
On Tue, Nov 18, 2014 at 11:40 AM, Warren D Smith <warren.wds@gmail.com> wrote:
So my preliminary paper "The chance that N independent statistical tests fail simultaneously" http://rangevoting.org/CombinedTestFail.html
has now obtained 5 mathematicians in a row not obtaining the right formula for F2, followed by 2 disputants on the ElectionIntegrity email list who both claimed statistical expertise, disputing it... followed by Paul F. Velleman, an actual professor of statistics at Cornell, telling me that not only was I right, in fact this was a well known literature area that I was probably just rediscovering the wheel about. The magic key words he suggested are "multiple comparison problem."
Amazing.
Well, Velleman is certainly more right than the others. It turns out there are several entire books on this area of statistics, and the whole "multiple comparison problem" is, at least generally speaking, a well known phenomenon. Evidently, however, it is not well known enough. In particular, at least one election-integrity paper by Josh Mitteldorf and others, is invalidated because he did not know about this effect, and it would not surprise me if every paper on that topic he ever wrote or co-wrote, also is invalidated, plus quite likely some of Mitteldorf's other papers in non-election areas. (His response? He emailed me "I don't have time to continue this discussion." You know what Josh? If a goodly fraction of my life's scientific work had just been invalidated, I'd find the effing time, especially if somebody was very generously and helpfully pointing it out to you.) And Mitteldorf is by no means the only victim -- a large number of medical experimental papers also contain wrong statistical calculations due to their authors not knowing about this effect, which probably means lives have been lost. I would guess thousands of papers are invalidated.
QUOTE from Yoav Benjamini & Yosef Hochberg: Controlling the false discovery rate: a practical and powerful approach to multiple testing". J. Royal Statistical Society, Series B 57,1 (1995) 125-133:
"Even though MCPs have been in use since the early 1950s and in spite of advocacy for their use (e.g. mandatory for some journals, as well as in institutions like the FDA) researchers have not yet widely adopted these procedures. In medical research for example, Godfrey (1985), Pocock et al (1987) and Smith et al (1987) examined samples of reports of comparative studies from major medical journals. They found that researchers overlook various kinds of multiplicity, and as a result reporting tends to exaggerate treatment differences"
Want more? Here's a second QUOTE from the review paper D.A.Berry: The difficult and ubiquitous problems of multiplicities, Pharmaceutical Statistics 6 (2007) 155-160: "Most scientists are oblivious to the problems of multiplicities. Yet they are everywhere. In one or more of its forms, multiplicities are present in every statistical application. They may be out in the open or hidden. And even if they are out in the open, recognizing them is but the first step in a difficult process of inference. Problems of multiplicities are the most difficult that we statisticians face. They threaten the validity of every statistical conclusion."
Anybody get the picture yet?
So now, I am trying to look into the literature Velleman so helpfully pointed me toward. Books on this topic include:
Rupert G. Miller Jr: Simultaneous statistical inference, Springer-Verlag, 2nd ed, 1981. QA276 .M474
Jason C.Hsu: Multiple comparisons: Theory and methods. London, UK: Chapman and Hall 1996.
Larry E. Toothaker: Multiple comparisons for researchers, Newbury Park, Calif. : Sage Publications, 1991. Q180.55.M4 T66
Shanti Swarup Gupta & Deng-Yuan Huang: Multiple statistical decision theory :recent developments, New York : Springer-Verlag, c1981. QA279.7 .G87
Peter H.Westfall & S.S.Young: Resampling-based multiple testing: Examples and methods for p-value adjustment. New York, NY: Wiley 1993.
There also are several web pages devoted to this area, including wikipedia's and "Beware of multiple comparisons"
http://www.graphpad.com/guides/prism/6/statistics/index.htm?beware_of_multip...
In addition to those books just on this topic, apparently at least 10 general-purpose statistics guidebooks at least make some mention of the Multiple Comparisons Problem. E.g.
Statistics for anthropology (Cambridge U.P.) / Lorena Madrigal 2012
Using statistical methods in social work practice : a complete SPSS guide (Lyceum Books 2006) / Soleman H. Abu-Bader
Modern data analysis / edited by Robert L. Launer, Andrew F. Siegel. New York : Academic Press, 1982
Handbook of Biological Statistics / John McDonald
Another good magic keyphrase is "false discovery rate."
What do I think of all this literature? I'm trying to figure that out... I'll let you know after I read some more of it.
A simple and safe idea which many sources recommend is, if you are doing T different tests then your p-level cutoff for statistical significance (e.g. if seeking 99.9% confidence, it would be 0.001) should be divided by T for each test, then proceed. That'll protect you. I already knew that since I was a child, but it is a weak idea. If you want to wring the most confidence from your tests, you need stronger methods, i.e. need a better understanding than just that. A large fraction (in fact virtually all of the ones I looked at so far) of the statistics theory papers on this topic DO NOT GIVE CLEARLY STATED THEOREMS, WITH PROOFS. I think for a topic clearly tricky like this, that is unacceptable behavior. So I will say straight off that the workers in this area have, in the vast majority, done poor work.
The allegedly strongest result in one line of work on this is Yosef Hochberg: A Sharper Bonferroni Procedure for Multiple Tests of Significance, Biometrika 75,4 (1988) 800-802 which is available electronically:
http://www-stat.wharton.upenn.edu/~steele/Courses/956/Resource/MultipleCompa... and
http://svn.donarmstrong.com/don/trunk/projects/research/linkage/papers/multi...
and it seems to me still to be a quite weak result.
-- Warren D. Smith http://RangeVoting.org <-- add your endorsement (by clicking "endorse" as 1st step)
_______________________________________________ math-fun mailing list math-fun@mailman.xmission.com https://mailman.xmission.com/cgi-bin/mailman/listinfo/math-fun
Charles Greathouse Analyst/Programmer Case Western Reserve University
On Tue, Nov 18, 2014 at 11:40 AM, Warren D Smith <warren.wds@gmail.com> wrote:
So my preliminary paper "The chance that N independent statistical tests fail simultaneously" http://rangevoting.org/CombinedTestFail.html
has now obtained 5 mathematicians in a row not obtaining the right formula for F2, followed by 2 disputants on the ElectionIntegrity email list who both claimed statistical expertise, disputing it... followed by Paul F. Velleman, an actual professor of statistics at Cornell, telling me that not only was I right, in fact this was a well known literature area that I was probably just rediscovering the wheel about. The magic key words he suggested are "multiple comparison problem."
Amazing.
Well, Velleman is certainly more right than the others. It turns out there are several entire books on this area of statistics, and the whole "multiple comparison problem" is, at least generally speaking, a well known phenomenon. Evidently, however, it is not well known enough. In particular, at least one election-integrity paper by Josh Mitteldorf and others, is invalidated because he did not know about this effect, and it would not surprise me if every paper on that topic he ever wrote or co-wrote, also is invalidated, plus quite likely some of Mitteldorf's other papers in non-election areas. (His response? He emailed me "I don't have time to continue this discussion." You know what Josh? If a goodly fraction of my life's scientific work had just been invalidated, I'd find the effing time, especially if somebody was very generously and helpfully pointing it out to you.) And Mitteldorf is by no means the only victim -- a large number of medical experimental papers also contain wrong statistical calculations due to their authors not knowing about this effect, which probably means lives have been lost. I would guess thousands of papers are invalidated.
QUOTE from Yoav Benjamini & Yosef Hochberg: Controlling the false discovery rate: a practical and powerful approach to multiple testing". J. Royal Statistical Society, Series B 57,1 (1995) 125-133:
"Even though MCPs have been in use since the early 1950s and in spite of advocacy for their use (e.g. mandatory for some journals, as well as in institutions like the FDA) researchers have not yet widely adopted these procedures. In medical research for example, Godfrey (1985), Pocock et al (1987) and Smith et al (1987) examined samples of reports of comparative studies from major medical journals. They found that researchers overlook various kinds of multiplicity, and as a result reporting tends to exaggerate treatment differences"
Want more? Here's a second QUOTE from the review paper D.A.Berry: The difficult and ubiquitous problems of multiplicities, Pharmaceutical Statistics 6 (2007) 155-160: "Most scientists are oblivious to the problems of multiplicities. Yet they are everywhere. In one or more of its forms, multiplicities are present in every statistical application. They may be out in the open or hidden. And even if they are out in the open, recognizing them is but the first step in a difficult process of inference. Problems of multiplicities are the most difficult that we statisticians face. They threaten the validity of every statistical conclusion."
Anybody get the picture yet?
So now, I am trying to look into the literature Velleman so helpfully pointed me toward. Books on this topic include:
Rupert G. Miller Jr: Simultaneous statistical inference, Springer-Verlag, 2nd ed, 1981. QA276 .M474
Jason C.Hsu: Multiple comparisons: Theory and methods. London, UK: Chapman and Hall 1996.
Larry E. Toothaker: Multiple comparisons for researchers, Newbury Park, Calif. : Sage Publications, 1991. Q180.55.M4 T66
Shanti Swarup Gupta & Deng-Yuan Huang: Multiple statistical decision theory :recent developments, New York : Springer-Verlag, c1981. QA279.7 .G87
Peter H.Westfall & S.S.Young: Resampling-based multiple testing: Examples and methods for p-value adjustment. New York, NY: Wiley 1993.
There also are several web pages devoted to this area, including wikipedia's and "Beware of multiple comparisons"
http://www.graphpad.com/guides/prism/6/statistics/index.htm?beware_of_multip...
In addition to those books just on this topic, apparently at least 10 general-purpose statistics guidebooks at least make some mention of the Multiple Comparisons Problem. E.g.
Statistics for anthropology (Cambridge U.P.) / Lorena Madrigal 2012
Using statistical methods in social work practice : a complete SPSS guide (Lyceum Books 2006) / Soleman H. Abu-Bader
Modern data analysis / edited by Robert L. Launer, Andrew F. Siegel. New York : Academic Press, 1982
Handbook of Biological Statistics / John McDonald
Another good magic keyphrase is "false discovery rate."
What do I think of all this literature? I'm trying to figure that out... I'll let you know after I read some more of it.
A simple and safe idea which many sources recommend is, if you are doing T different tests then your p-level cutoff for statistical significance (e.g. if seeking 99.9% confidence, it would be 0.001) should be divided by T for each test, then proceed. That'll protect you. I already knew that since I was a child, but it is a weak idea. If you want to wring the most confidence from your tests, you need stronger methods, i.e. need a better understanding than just that. A large fraction (in fact virtually all of the ones I looked at so far) of the statistics theory papers on this topic DO NOT GIVE CLEARLY STATED THEOREMS, WITH PROOFS. I think for a topic clearly tricky like this, that is unacceptable behavior. So I will say straight off that the workers in this area have, in the vast majority, done poor work.
The allegedly strongest result in one line of work on this is Yosef Hochberg: A Sharper Bonferroni Procedure for Multiple Tests of Significance, Biometrika 75,4 (1988) 800-802 which is available electronically:
http://www-stat.wharton.upenn.edu/~steele/Courses/956/Resource/MultipleCompa... and
http://svn.donarmstrong.com/don/trunk/projects/research/linkage/papers/multi...
and it seems to me still to be a quite weak result.
-- Warren D. Smith http://RangeVoting.org <-- add your endorsement (by clicking "endorse" as 1st step)
_______________________________________________ math-fun mailing list math-fun@mailman.xmission.com https://mailman.xmission.com/cgi-bin/mailman/listinfo/math-fun
_______________________________________________ math-fun mailing list math-fun@mailman.xmission.com https://mailman.xmission.com/cgi-bin/mailman/listinfo/math-fun