Re: [math-fun] The probability that N independent statistical tests fail simultaneously (review)

18 Nov 2014


      Multiple comparisons is an interesting statistical problem but I wouldn't waste effort 
trying to correct p-values.  P-values aren't probabilities, or at least not probabilities 
of any interest.  The prior probability that some difference is exactly nil is usually 
zero anyway.  Better to take a Bayesian approach and ask what model is best supported by 
the data.

http://www.stat.columbia.edu/~gelman/research/unpublished/multiple2.pdf

Brent Meeker

On 11/18/2014 11:08 AM, Charles Greathouse wrote:
...
...
this was a well known literature area that I was probably just
rediscovering the wheel about.  The magic key words he suggested are
"multiple comparison problem."
I think most of us on math-fun are familiar with the problem of multiple
comparisons. I searched for that exact phrase, as well as [Bonferroni
correction], when this topic first came up last week, and I'm surely not
the only one who read Ioannidis' "Why most published research findings are
false".
I just don't think that you've clearly explained the conditions under which
your formula is applicable. Your example deals with a problem essentially
of the form 'Given n values, what are the odds that when picking n values
uniformly at random from [0, 1) the largest, second-largest, ..., smallest
is at least as large as its corresponding given value?'. I don't doubt that
your approach is valid here. But I don't think that this can be applied
blindly to a collection of n p-values, even when they are known to be
independent, as indeed my earlier example with dice demonstrates. When I
brought that up you said that the tests were dependent, then corrected
yourself to say that they were independent but "not interchangeable".
Whatever you might have meant it seems to show that your formula isn't as
general as you seem to be claiming -- even now I don't see any such
condition in your paper.
Charles Greathouse
Analyst/Programmer
Case Western Reserve University
On Tue, Nov 18, 2014 at 11:40 AM, Warren D Smith <warren.wds@gmail.com>
wrote:
...
So my preliminary paper
  "The chance that N independent statistical tests fail simultaneously"
  http://rangevoting.org/CombinedTestFail.html
has now obtained 5 mathematicians in a row not obtaining the right
formula for F2,
followed by 2 disputants on the ElectionIntegrity email list who both
claimed statistical expertise, disputing it... followed
by Paul F. Velleman, an actual professor of statistics at Cornell, telling
me that not only was I right, in fact this was a well known literature area
that I was probably just rediscovering the wheel about.  The magic key
words he
suggested are "multiple comparison problem."
Amazing.
Well, Velleman is certainly more right than the others.
It turns out there are several entire books on this area of statistics,
and the whole "multiple comparison problem" is, at least generally
speaking, a well known phenomenon.  Evidently, however, it is not well
known enough.  In particular,
at least one election-integrity paper by Josh Mitteldorf and others,
is invalidated because he did not know about this effect, and it would
not surprise me if every paper on that topic he ever wrote or
co-wrote, also is invalidated, plus quite likely some of Mitteldorf's
other papers in non-election areas.  (His response? He emailed me "I
don't have time to continue this discussion."  You know what Josh?  If
a goodly fraction of my life's scientific work had just been
invalidated, I'd find the effing time, especially if somebody was very
generously and helpfully pointing it out to you.)
And Mitteldorf is by no means the only victim -- a large number of
medical experimental papers also contain wrong statistical
calculations due to their authors not knowing about this effect, which
probably means lives have been lost.   I would guess thousands of
papers are invalidated.
QUOTE from Yoav Benjamini & Yosef Hochberg:
  Controlling the false discovery rate: a practical and powerful
  approach to multiple testing". J. Royal Statistical
  Society, Series B 57,1 (1995) 125-133:
"Even though MCPs have been in use since the early 1950s
and in spite of advocacy for their use (e.g. mandatory for some journals,
as well as in institutions like the FDA) researchers have not yet widely
adopted these procedures. In medical research for example, Godfrey
(1985), Pocock et al (1987) and Smith et al (1987) examined samples of
reports of comparative studies from major medical journals. They found
that researchers overlook various kinds of multiplicity, and as a
result reporting tends to exaggerate treatment differences"
Want more? Here's a second QUOTE from the review paper
D.A.Berry: The difficult and ubiquitous problems of multiplicities,
Pharmaceutical Statistics 6 (2007) 155-160:
  "Most scientists are oblivious to the problems of multiplicities. Yet
they are everywhere. In one or more of its forms, multiplicities are
present in every statistical application. They may be out in the open
or hidden. And even if they are out in the open, recognizing them is
but the first step in a difficult process of inference. Problems of
multiplicities are the most difficult that we statisticians face. They
threaten the validity of every statistical conclusion."
Anybody get the picture yet?
So now, I am trying to look into the literature Velleman so helpfully
pointed me toward.
  Books on this topic include:
Rupert G. Miller Jr:
  Simultaneous statistical inference, Springer-Verlag, 2nd ed, 1981.
   QA276 .M474
Jason C.Hsu: Multiple comparisons: Theory and methods. London, UK:
  Chapman and Hall 1996.
Larry E. Toothaker: Multiple comparisons for researchers,
  Newbury Park, Calif. : Sage Publications, 1991.
  Q180.55.M4 T66
Shanti Swarup Gupta &  Deng-Yuan Huang:
  Multiple statistical decision theory :recent developments,
  New York : Springer-Verlag, c1981.
  QA279.7 .G87
Peter H.Westfall & S.S.Young: Resampling-based multiple testing:
  Examples and methods for p-value adjustment. New York, NY: Wiley 1993.
There also are several web pages devoted to this area, including
wikipedia's
  and
  "Beware of multiple comparisons"
http://www.graphpad.com/guides/prism/6/statistics/index.htm?beware_of_multip...
In addition to those books just on this topic, apparently at least 10
  general-purpose statistics guidebooks at least make some mention of
  the Multiple Comparisons Problem.  E.g.
Statistics for anthropology (Cambridge U.P.) / Lorena Madrigal 2012
Using statistical methods in social work practice : a complete SPSS guide
  (Lyceum Books 2006) / Soleman H. Abu-Bader
Modern data analysis / edited by Robert L. Launer, Andrew F. Siegel.
  New York : Academic Press, 1982
Handbook of Biological Statistics / John McDonald
Another good magic keyphrase is "false discovery rate."
What do I think of all this literature?
  I'm trying to figure that out...  I'll let you know after I read some
  more of it.
A simple and safe idea which many sources recommend is, if you are
  doing T different tests then your p-level cutoff for statistical
  significance (e.g. if seeking 99.9% confidence,
  it would be 0.001) should be divided by T for each test, then proceed.
  That'll protect you.
  I already knew that since I was a child, but it is a weak idea.  If
  you want to wring
  the most confidence from your tests, you need stronger methods,
  i.e. need a better understanding than just that.   A large fraction
  (in fact virtually all of the ones I looked at so far) of the
  statistics theory papers on this topic DO NOT GIVE CLEARLY STATED
  THEOREMS, WITH PROOFS.
  I think for a topic clearly tricky like this, that is unacceptable
  behavior.  So I will say straight off that the workers in this area
  have, in the vast majority, done poor work.
The allegedly strongest result in one line of work on this is
  Yosef Hochberg:
  A Sharper Bonferroni Procedure for Multiple Tests of Significance,
   Biometrika 75,4 (1988) 800-802
  which is available electronically:
http://www-stat.wharton.upenn.edu/~steele/Courses/956/Resource/MultipleCompa...
  and
http://svn.donarmstrong.com/don/trunk/projects/research/linkage/papers/multi...
and it seems to me still to be a quite weak result.
--
Warren D. Smith
http://RangeVoting.org  <-- add your endorsement (by clicking
"endorse" as 1st step)
_______________________________________________
math-fun mailing list
math-fun@mailman.xmission.com
https://mailman.xmission.com/cgi-bin/mailman/listinfo/math-fun
Charles Greathouse
Analyst/Programmer
Case Western Reserve University
On Tue, Nov 18, 2014 at 11:40 AM, Warren D Smith <warren.wds@gmail.com>
wrote:
...
So my preliminary paper
  "The chance that N independent statistical tests fail simultaneously"
  http://rangevoting.org/CombinedTestFail.html
has now obtained 5 mathematicians in a row not obtaining the right
formula for F2,
followed by 2 disputants on the ElectionIntegrity email list who both
claimed statistical expertise, disputing it... followed
by Paul F. Velleman, an actual professor of statistics at Cornell, telling
me that not only was I right, in fact this was a well known literature area
that I was probably just rediscovering the wheel about.  The magic key
words he
suggested are "multiple comparison problem."
Amazing.
Well, Velleman is certainly more right than the others.
It turns out there are several entire books on this area of statistics,
and the whole "multiple comparison problem" is, at least generally
speaking, a well known phenomenon.  Evidently, however, it is not well
known enough.  In particular,
at least one election-integrity paper by Josh Mitteldorf and others,
is invalidated because he did not know about this effect, and it would
not surprise me if every paper on that topic he ever wrote or
co-wrote, also is invalidated, plus quite likely some of Mitteldorf's
other papers in non-election areas.  (His response? He emailed me "I
don't have time to continue this discussion."  You know what Josh?  If
a goodly fraction of my life's scientific work had just been
invalidated, I'd find the effing time, especially if somebody was very
generously and helpfully pointing it out to you.)
And Mitteldorf is by no means the only victim -- a large number of
medical experimental papers also contain wrong statistical
calculations due to their authors not knowing about this effect, which
probably means lives have been lost.   I would guess thousands of
papers are invalidated.
QUOTE from Yoav Benjamini & Yosef Hochberg:
  Controlling the false discovery rate: a practical and powerful
  approach to multiple testing". J. Royal Statistical
  Society, Series B 57,1 (1995) 125-133:
"Even though MCPs have been in use since the early 1950s
and in spite of advocacy for their use (e.g. mandatory for some journals,
as well as in institutions like the FDA) researchers have not yet widely
adopted these procedures. In medical research for example, Godfrey
(1985), Pocock et al (1987) and Smith et al (1987) examined samples of
reports of comparative studies from major medical journals. They found
that researchers overlook various kinds of multiplicity, and as a
result reporting tends to exaggerate treatment differences"
Want more? Here's a second QUOTE from the review paper
D.A.Berry: The difficult and ubiquitous problems of multiplicities,
Pharmaceutical Statistics 6 (2007) 155-160:
  "Most scientists are oblivious to the problems of multiplicities. Yet
they are everywhere. In one or more of its forms, multiplicities are
present in every statistical application. They may be out in the open
or hidden. And even if they are out in the open, recognizing them is
but the first step in a difficult process of inference. Problems of
multiplicities are the most difficult that we statisticians face. They
threaten the validity of every statistical conclusion."
Anybody get the picture yet?
So now, I am trying to look into the literature Velleman so helpfully
pointed me toward.
  Books on this topic include:
Rupert G. Miller Jr:
  Simultaneous statistical inference, Springer-Verlag, 2nd ed, 1981.
   QA276 .M474
Jason C.Hsu: Multiple comparisons: Theory and methods. London, UK:
  Chapman and Hall 1996.
Larry E. Toothaker: Multiple comparisons for researchers,
  Newbury Park, Calif. : Sage Publications, 1991.
  Q180.55.M4 T66
Shanti Swarup Gupta &  Deng-Yuan Huang:
  Multiple statistical decision theory :recent developments,
  New York : Springer-Verlag, c1981.
  QA279.7 .G87
Peter H.Westfall & S.S.Young: Resampling-based multiple testing:
  Examples and methods for p-value adjustment. New York, NY: Wiley 1993.
There also are several web pages devoted to this area, including
wikipedia's
  and
  "Beware of multiple comparisons"
http://www.graphpad.com/guides/prism/6/statistics/index.htm?beware_of_multip...
In addition to those books just on this topic, apparently at least 10
  general-purpose statistics guidebooks at least make some mention of
  the Multiple Comparisons Problem.  E.g.
Statistics for anthropology (Cambridge U.P.) / Lorena Madrigal 2012
Using statistical methods in social work practice : a complete SPSS guide
  (Lyceum Books 2006) / Soleman H. Abu-Bader
Modern data analysis / edited by Robert L. Launer, Andrew F. Siegel.
  New York : Academic Press, 1982
Handbook of Biological Statistics / John McDonald
Another good magic keyphrase is "false discovery rate."
What do I think of all this literature?
  I'm trying to figure that out...  I'll let you know after I read some
  more of it.
A simple and safe idea which many sources recommend is, if you are
  doing T different tests then your p-level cutoff for statistical
  significance (e.g. if seeking 99.9% confidence,
  it would be 0.001) should be divided by T for each test, then proceed.
  That'll protect you.
  I already knew that since I was a child, but it is a weak idea.  If
  you want to wring
  the most confidence from your tests, you need stronger methods,
  i.e. need a better understanding than just that.   A large fraction
  (in fact virtually all of the ones I looked at so far) of the
  statistics theory papers on this topic DO NOT GIVE CLEARLY STATED
  THEOREMS, WITH PROOFS.
  I think for a topic clearly tricky like this, that is unacceptable
  behavior.  So I will say straight off that the workers in this area
  have, in the vast majority, done poor work.
The allegedly strongest result in one line of work on this is
  Yosef Hochberg:
  A Sharper Bonferroni Procedure for Multiple Tests of Significance,
   Biometrika 75,4 (1988) 800-802
  which is available electronically:
http://www-stat.wharton.upenn.edu/~steele/Courses/956/Resource/MultipleCompa...
  and
http://svn.donarmstrong.com/don/trunk/projects/research/linkage/papers/multi...
and it seems to me still to be a quite weak result.
--
Warren D. Smith
http://RangeVoting.org  <-- add your endorsement (by clicking
"endorse" as 1st step)
_______________________________________________
math-fun mailing list
math-fun@mailman.xmission.com
https://mailman.xmission.com/cgi-bin/mailman/listinfo/math-fun
_______________________________________________
math-fun mailing list
math-fun@mailman.xmission.com
https://mailman.xmission.com/cgi-bin/mailman/listinfo/math-fun

Re: [math-fun] The probability that N independent statistical tests fail simultaneously (review)

meekerdb