[math-fun] The chance that N statistical tests fail simultaneously

10 Nov 2014

      The chance that N statistical tests fail simultaneously
=========Warren D. Smith====Nov 2014========

Suppose you perform N statistical tests.
One fails with p-level a.
Another fails with p-level b.
...
Another fails with p-level z.

What is the p-level for this combined event?
Call the answer F_N(a,b,c,...,z).
We may assume wlog that 0<a<b<c<...<z<1.

F_1(a)=a,  of course.

It is NOT the case that F_2(a,b)=ab.  Actually,
F_2(a,b) = 2ab-aa = a(2b-a).

Because that is the chance that if x and y are independent uniform01 randoms,
then either x<a and y<b, or x<b and y<a.

What about F_3(a,b,c), F_4(a,b,c,d), and so on?
It is possible to determine these by working out certain N-dimensional volumes
via something like an inclusion-exclusion argument.
But it gets more and more painful.  As an easy upper bound,

F_N(a,b,c,...,z) < N! abc...z.

But this bound can be pretty weak.  For example it is easy to exactly
work out the special case
F_N(x,x,x,...,x) = x^N.

I would like the exact answer in general.
The following recurrence

F_{N+1}(a,b,c,...z) = (N+1) * integral(from x=0..a)
F_N( (b-x)/(1-x), (c-x)/(1-x), ..., (z-x)/(1-x) ) * (1-x)^N  dx

seems an easier way to get the answer.  From this I compute:

F_3(a,b,c) = [3b(2c-b)+a(a-3c)]a

F_4(a,b,c,d) = [4(b-3d)b^2+6c(4bd+(a-2b)c-2da)+(4d-a)a^2]a

F_5(a,b,c,d,e) =
[20bc^3-5b^4+30b^2d^2-60bcd^2+20b^3e-60bc^2e-60b^2de+120bcde-10c^3a+30cd^2a+30c^2ea-60cdea-10d^2a^2+20dea^2-5ea^3+a^4]a.

These keep getting messier.  But there might be some simple general form
for the answer (e.g. if we go back to the original inclusion-exclusion
idea and do not get confused), and/or there might be some algorithm
for computing answer whose runtime grows only polynomially with N.
Can you find them?

At present, the best algorithm I thought of runs in time exponential in N,
roughly like 4^N in fact.

[math-fun] The chance that N statistical tests fail simultaneously

Warren D Smith