Re: [math-fun] Scientific Method, Experiments and Causality

24 Jun 2018

      Bingo, Adam !!!

So "Science" isn't my naive schoolboy intuition as laid out in
my original post, but an *adversarial mathematical/crypto game*,
with players of differing capabilities.

Science must therefore make the same progression that Electrical
Engineering recently made from non-conscious adversaries like
"AWGN" (Additive White Gaussian Noise) to "conscious" actors
with varying computational capabilities like Alice, Bob, Eve
and Mallory.

Non-conscious adversaries like AWGN were defeated by now-
classical *error-correcting codes*, whereas conscious adversaries
require far more sophisticated crypto codes and crypto protocols.

Scientific adversaries are no longer just measurement "noise",
with names like Nolan or Norman, but an adversarial Mother
Nature, with a name like Naomi or Nadia, potentially malevolent
scientists, with names like Sabrina, Samantha, Scott, Sly, and
journal editors, with names like Jo/Joy/Joan, Joe/Joel/John/Jose.

One of the current suggestions for scientists is to take a pile
of data, randomly shuffle it into two equal piles, use the first
pile to construct a theory/hypothesis, and subsequently use the
second pile to test that hypothesis.

The current suggestion is that *new hypotheses require new data*.
This protocol is an attempt to demonstrate the *future predictive
power* of a theory/hypothesis.

But this is far too strong a requirement, as it assumes that
Mother Nature ("Naomi") is capable of *diagonalizing* -- i.e.,
looking at the previous data & theory, and then perversely
choosing new data that *disagrees with* the current theory.

It is conceivable that Naomi is that perverse, but most of us
consider Naomi's talents to be perseverity (a la brute force
*evolution*), not perverseness (a la *a devil*).

Obviously, the most likely malevolent actors today are bad
scientists (e.g., Dr. Sabrina, Dr. Sly) and possibly bad
journals and journal editors.  There are also bad funding
agencies, but perhaps we're getting a little too far afield.

As Adam has already pointed out, crypto protocols may have a
lot to offer this new *science-as-a-game* ("SaaG") regime. 
For example, *bit commitment protocols* can be used to
pre-register hypotheses, and *zero knowledge protocols* can
be used to communicate among the various parties: hypothesizer,
experimenter, statistical analyst.

We could even use the "B-word" here: "Bl**kch**n", to make
sure that embarrassing/unpleasant data cannot be "corrected"
(a la Newton's Moon data) or simply erased and forgotten, and
that embarrassing/unpleasant published papers cannot be
"disappeared", as in the Nazi cold tolerance tests, the
Tuskegee tests, and the Cincinnati Radation Tests.

At 01:50 PM 6/24/2018, Adam P. Goucher wrote:
...
It's cryptographically possible to prevent p-hacking, provided the
experimenter does not have access to the (unencrypted) data on which
the experiment is being performed.
Specifically:
(a) You 'pre-register' the experiment not with a journal, but with
a fully-homomorphic-encrypted virtual machine.
(b) If your experiment has a probability threshold of p, then you
need to pay p bitcoins to an invalid address (to 'destroy' them),
where the address is a function of the hash of the algorithm you
intend to run.
(c) The VM takes the proof of payment and the algorithm, runs it
on the encrypted data, and returns a (digitally signed by the VM)
certificate saying either 'true' or 'false'.
-- APG.
...
Sent: Sunday, June 24, 2018 at 9:23 PM
From: "Mike Stay" <metaweta@gmail.com>
To: math-fun <math-fun@mailman.xmission.com>
Subject: Re: [math-fun] Scientific Method, Experiments and Causality
https://plus.google.com/+DanPiponi/posts/dcGDyMgDtJ9
Researcher: I've just completed this amazing experiment that's given me
results significant at the p<0.01 level. Can I publish it in your journal?
Journal: Did you pre-register?
R: No? Why do I need to do that?
J: For all I know you carried out 100 experiments and just picked the best
result. In that case you'd likely get p<0.01 just by chance.
R: But I didn't do that.
J: I don't know that. My readers don't know that. So you need to
pre-register any experiment with us first. If you pre-register 100
experiments but only write up one my readers will know you've just been
trawling for significant results.
R: Oh, OK. I have another great experiment called A coming up and I'll
pre-register that.
J: Can I help you with anything else?
R: Well, we've developed this new piece of hardware to automate experiments
and we have a string of 100 statistically independent experiments B1-100
that we want to run on it.
J: Sure. Just register 100 experiments on our web site.
R: No way. If experiment A shows a significant result, but I then register
100 more experiments, people will think that A is just the result of
trawling.
J: Well them's the rules. Register or go unpublished.
R: OK, I have an idea. I'll batch B1-100 together as one experiment. If the
individual experiments have p-values p1-100 I'll compute a "meta" p-value,
q=min(p1/0.01, p2/0.01, ..., p100/0.01). For small enough x, P(q<x) is
around x. So I'll treat q like an ordinary p-value. If it's significant,
I'll write a paper about the individual underlying pi that made it
significant.
J: Um...well this is a bit irregular, but I have to admit it follows the
letter of the rules, so I'll allow that.
R: But even this is going to dilute experiment A by a factor of two. I
really care about A, whereas B1-100 are highly speculative. Your journal
policy is going to tend to freeze speculative research.
J: I'm sorry you feel that way.
R: I have another idea. I'll batch everything together. If p0 is the
significance level of experiment A, let me construct another meta p-value
q=min(p0/0.9, p1/0.001, p2/0.001, ..., p100/0.001). I'll pre-register just
one meta-experiment based on this single q-value. If I get q<0.01 we know
that something significant happened even though I performed 101
experiments. And now experiments B1-100 only have a small dilution effect
on A.
J: Um...I guess I have to accept that.
R: In fact, why don't you just give me a budget of 1 unit? I'll share out
this unit between any experiments I like as I see fit. I'll choose wi so
that w1+...+wn=1. I'll then incrementally pre-register the individual parts
of my one meta-experiment. For each individual experiment, instead of the
usual p-value pi I'll use the modified p-value, pi/wi to judge
significance. [1]
J: OK
R: Even better, why don't you just give me some "currency" to spend as
modifiers to my p-values. I'll pre-register all of my experiments but with
this proviso: for each one you publish the amount I spent on it and I'll
divide the p-value by what I spent on it. So even though I appear to have
many experiments, people can see which ones really were just me trawling,
and which ones are significant.
J: This will mean rewriting the policy. But it seems like a good scheme.
You can do as many experiments as you like, even using combinatorial
methods to trawl millions of possibilities. As long as you pay some of your
budget for each experiment, and don't go over your budget, we can easily
judge the significance of your work. And you get to choose which
experiments you think are more important.
Question: Is this a reasonable policy for a journal?
[1] Suppose w1+...+wn=1, wi>0. Define q = min(pi/wi).
Suppose pi uniform on [0,1].
P(min(pi/wi)<x) = P(p1/w1<x or ... or pn/wn<x)
= 1-(1-P(p1<w1 x))...(1-P(pn<wn x))
= approx. w1 x+...+wn x = x for small x.
So we can treat q like an ordinary p-value for small enough x.
On Sun, Jun 24, 2018 at 11:59 AM, Henry Baker <hbaker1@pipeline.com> wrote:
...
I'm not an expert in statistical analysis, and I'm having a hard time
reconciling all of the features of the modern scientific method.
In particular, the usual process goes something like the following:
1.  A scientist observes some phenomena and detects some correlations
between observations of type A and observations of type B.
2.  The scientist *hypothesizes* some causality among the observations.
3.  The scientist designs some *experiment* to try to determine causality.
4.  But since the scientist already has preconceived notions about the
causality, he/she is not the appropriate person to *perform* the
experiment; better to *double blind* the study and have someone *completely
ignorant of the experimental design* perform the experiment on subjects (in
the case of animate subjects) who are also *completely ignorant of the
experimental design*.
5.  The data from the experiment can be analyzed by yet another party who
is *completely ignorant of the experimental design*, so that his/her biases
cannot affect the analysis.
In a perfect, causal world, such a proper experiment should show causality
if and only if the causality exists.  In particular, a "proper" experiment
should have N large enough so that the probability of false positives and
false negatives are unbelievably small.
-----
Here's my problem:
Scientists have been accused of *fitting to the facts* -- i.e., coming up
with hypotheses *after the experiment* that match the experimental
results.  Furthermore, some have recommended that all such "a posteriori"
papers be firmly rejected as scientific fraud.
My question is: "how can our universe possibly tell whether the hypothesis
was suggested before or after the experiment?"
In a classically causal universe, the timing of the hypothesis and the
timing of the experiment should make no difference, because the mental
state of the scientist can't possibly affect the results of the experiment.
If an experiment is indeed performed completely blind by disinterested
third parties, why should anyone care how or *when* the hypothesis was
obtained?

Henry Baker

tags

participants (1)