Re: [math-fun] Scientific Method, Experiments and Causality
Bingo, Adam !!! So "Science" isn't my naive schoolboy intuition as laid out in my original post, but an *adversarial mathematical/crypto game*, with players of differing capabilities. Science must therefore make the same progression that Electrical Engineering recently made from non-conscious adversaries like "AWGN" (Additive White Gaussian Noise) to "conscious" actors with varying computational capabilities like Alice, Bob, Eve and Mallory. Non-conscious adversaries like AWGN were defeated by now- classical *error-correcting codes*, whereas conscious adversaries require far more sophisticated crypto codes and crypto protocols. Scientific adversaries are no longer just measurement "noise", with names like Nolan or Norman, but an adversarial Mother Nature, with a name like Naomi or Nadia, potentially malevolent scientists, with names like Sabrina, Samantha, Scott, Sly, and journal editors, with names like Jo/Joy/Joan, Joe/Joel/John/Jose. One of the current suggestions for scientists is to take a pile of data, randomly shuffle it into two equal piles, use the first pile to construct a theory/hypothesis, and subsequently use the second pile to test that hypothesis. The current suggestion is that *new hypotheses require new data*. This protocol is an attempt to demonstrate the *future predictive power* of a theory/hypothesis. But this is far too strong a requirement, as it assumes that Mother Nature ("Naomi") is capable of *diagonalizing* -- i.e., looking at the previous data & theory, and then perversely choosing new data that *disagrees with* the current theory. It is conceivable that Naomi is that perverse, but most of us consider Naomi's talents to be perseverity (a la brute force *evolution*), not perverseness (a la *a devil*). Obviously, the most likely malevolent actors today are bad scientists (e.g., Dr. Sabrina, Dr. Sly) and possibly bad journals and journal editors. There are also bad funding agencies, but perhaps we're getting a little too far afield. As Adam has already pointed out, crypto protocols may have a lot to offer this new *science-as-a-game* ("SaaG") regime. For example, *bit commitment protocols* can be used to pre-register hypotheses, and *zero knowledge protocols* can be used to communicate among the various parties: hypothesizer, experimenter, statistical analyst. We could even use the "B-word" here: "Bl**kch**n", to make sure that embarrassing/unpleasant data cannot be "corrected" (a la Newton's Moon data) or simply erased and forgotten, and that embarrassing/unpleasant published papers cannot be "disappeared", as in the Nazi cold tolerance tests, the Tuskegee tests, and the Cincinnati Radation Tests. At 01:50 PM 6/24/2018, Adam P. Goucher wrote:
It's cryptographically possible to prevent p-hacking, provided the experimenter does not have access to the (unencrypted) data on which the experiment is being performed.
Specifically:
(a) You 'pre-register' the experiment not with a journal, but with a fully-homomorphic-encrypted virtual machine.
(b) If your experiment has a probability threshold of p, then you need to pay p bitcoins to an invalid address (to 'destroy' them), where the address is a function of the hash of the algorithm you intend to run.
(c) The VM takes the proof of payment and the algorithm, runs it on the encrypted data, and returns a (digitally signed by the VM) certificate saying either 'true' or 'false'.
-- APG.
Sent: Sunday, June 24, 2018 at 9:23 PM From: "Mike Stay" <metaweta@gmail.com> To: math-fun <math-fun@mailman.xmission.com> Subject: Re: [math-fun] Scientific Method, Experiments and Causality
https://plus.google.com/+DanPiponi/posts/dcGDyMgDtJ9
Researcher: I've just completed this amazing experiment that's given me results significant at the p<0.01 level. Can I publish it in your journal? Journal: Did you pre-register? R: No? Why do I need to do that? J: For all I know you carried out 100 experiments and just picked the best result. In that case you'd likely get p<0.01 just by chance. R: But I didn't do that. J: I don't know that. My readers don't know that. So you need to pre-register any experiment with us first. If you pre-register 100 experiments but only write up one my readers will know you've just been trawling for significant results. R: Oh, OK. I have another great experiment called A coming up and I'll pre-register that. J: Can I help you with anything else? R: Well, we've developed this new piece of hardware to automate experiments and we have a string of 100 statistically independent experiments B1-100 that we want to run on it. J: Sure. Just register 100 experiments on our web site. R: No way. If experiment A shows a significant result, but I then register 100 more experiments, people will think that A is just the result of trawling. J: Well them's the rules. Register or go unpublished. R: OK, I have an idea. I'll batch B1-100 together as one experiment. If the individual experiments have p-values p1-100 I'll compute a "meta" p-value, q=min(p1/0.01, p2/0.01, ..., p100/0.01). For small enough x, P(q<x) is around x. So I'll treat q like an ordinary p-value. If it's significant, I'll write a paper about the individual underlying pi that made it significant. J: Um...well this is a bit irregular, but I have to admit it follows the letter of the rules, so I'll allow that. R: But even this is going to dilute experiment A by a factor of two. I really care about A, whereas B1-100 are highly speculative. Your journal policy is going to tend to freeze speculative research. J: I'm sorry you feel that way. R: I have another idea. I'll batch everything together. If p0 is the significance level of experiment A, let me construct another meta p-value q=min(p0/0.9, p1/0.001, p2/0.001, ..., p100/0.001). I'll pre-register just one meta-experiment based on this single q-value. If I get q<0.01 we know that something significant happened even though I performed 101 experiments. And now experiments B1-100 only have a small dilution effect on A. J: Um...I guess I have to accept that. R: In fact, why don't you just give me a budget of 1 unit? I'll share out this unit between any experiments I like as I see fit. I'll choose wi so that w1+...+wn=1. I'll then incrementally pre-register the individual parts of my one meta-experiment. For each individual experiment, instead of the usual p-value pi I'll use the modified p-value, pi/wi to judge significance. [1] J: OK R: Even better, why don't you just give me some "currency" to spend as modifiers to my p-values. I'll pre-register all of my experiments but with this proviso: for each one you publish the amount I spent on it and I'll divide the p-value by what I spent on it. So even though I appear to have many experiments, people can see which ones really were just me trawling, and which ones are significant. J: This will mean rewriting the policy. But it seems like a good scheme. You can do as many experiments as you like, even using combinatorial methods to trawl millions of possibilities. As long as you pay some of your budget for each experiment, and don't go over your budget, we can easily judge the significance of your work. And you get to choose which experiments you think are more important.
Question: Is this a reasonable policy for a journal? [1] Suppose w1+...+wn=1, wi>0. Define q = min(pi/wi). Suppose pi uniform on [0,1]. P(min(pi/wi)<x) = P(p1/w1<x or ... or pn/wn<x) = 1-(1-P(p1<w1 x))...(1-P(pn<wn x)) = approx. w1 x+...+wn x = x for small x. So we can treat q like an ordinary p-value for small enough x.
On Sun, Jun 24, 2018 at 11:59 AM, Henry Baker <hbaker1@pipeline.com> wrote:
I'm not an expert in statistical analysis, and I'm having a hard time reconciling all of the features of the modern scientific method.
In particular, the usual process goes something like the following:
1. A scientist observes some phenomena and detects some correlations between observations of type A and observations of type B.
2. The scientist *hypothesizes* some causality among the observations.
3. The scientist designs some *experiment* to try to determine causality.
4. But since the scientist already has preconceived notions about the causality, he/she is not the appropriate person to *perform* the experiment; better to *double blind* the study and have someone *completely ignorant of the experimental design* perform the experiment on subjects (in the case of animate subjects) who are also *completely ignorant of the experimental design*.
5. The data from the experiment can be analyzed by yet another party who is *completely ignorant of the experimental design*, so that his/her biases cannot affect the analysis.
In a perfect, causal world, such a proper experiment should show causality if and only if the causality exists. In particular, a "proper" experiment should have N large enough so that the probability of false positives and false negatives are unbelievably small.
----- Here's my problem:
Scientists have been accused of *fitting to the facts* -- i.e., coming up with hypotheses *after the experiment* that match the experimental results. Furthermore, some have recommended that all such "a posteriori" papers be firmly rejected as scientific fraud.
My question is: "how can our universe possibly tell whether the hypothesis was suggested before or after the experiment?"
In a classically causal universe, the timing of the hypothesis and the timing of the experiment should make no difference, because the mental state of the scientist can't possibly affect the results of the experiment.
If an experiment is indeed performed completely blind by disinterested third parties, why should anyone care how or *when* the hypothesis was obtained?
participants (1)
-
Henry Baker