[math-fun] Distinguishing cause from effect using observational data
FYI -- Perhaps this article has some bearing on WDS's questions: http://science.slashdot.org/story/14/12/18/1810221/cause-and-effect-how-a-re... Statisticians have long thought it impossible to tell cause and effect apart using observational data. The problem is to take two sets of measurements that are correlated, say X and Y, and to find out if X caused Y or Y caused X. That's straightforward with a controlled experiment in which one variable can be held constant to see how this influences the other. Take for example, a correlation between wind speed and the rotation speed of a wind turbine. Observational data gives no clue about cause and effect but an experiment that holds the wind speed constant while measuring the speed of the turbine, and vice versa, would soon give an answer. But in the last couple of years, statisticians have developed a technique that can tease apart cause and effect from the observational data alone. It is based on the idea that any set of measurements always contain noise. However, the noise in the cause variable can influence the effect but not the other way round. So the noise in the effect dataset is always more complex than the noise in the cause dataset. The new statistical test, known as the additive noise model, is designed to find this asymmetry. Now statisticians have tested the model on 88 sets of cause-and-effect data, ranging from altitude and temperature measurements at German weather stations to the correlation between rent and a partment size in student accommodation. The results suggest that the additive noise model can tease apart cause and effect correctly in up to 80 per cent of the cases (provided there are no confounding factors or selection effects). That's a useful new trick in a statistician's armoury, particularly in areas of science where controlled experiments are expensive, unethical or practically impossible. http://arxiv.org/abs/1412.3773 Distinguishing cause from effect using observational data: methods and benchmarks Joris M. Mooij, Jonas Peters, Dominik Janzing, Jakob Zscheischler, Bernhard Schölkopf (Submitted on 11 Dec 2014) The discovery of causal relationships from purely observational data is a fundamental problem in science. The most elementary form of such a causal discovery problem is to decide whether X causes Y or, alternatively, Y causes X, given joint observations of two variables X, Y . This was often considered to be impossible. Nevertheless, several approaches for addressing this bivariate causal discovery problem were proposed recently. In this paper, we present the benchmark data set CauseEffectPairs that consists of 88 different "cause-effect pairs" selected from 31 datasets from various domains. We evaluated the performance of several bivariate causal discovery methods on these real-world benchmark data and on artificially simulated data. Our empirical results provide evidence that additive-noise methods are indeed able to distinguish cause from effect using only purely observational data. In addition, we prove consistency of the additive-noise method proposed by Hoyer et al. ( 2009).
The first sentence "Statisticians have long thought it impossible to tell cause and effect apart using observational data" is very untrue. Economists, social scientists, and especially epidemologists have long used statistics to try to determine cause and effect from observational data. It is true that pure statisticians have largely avoided this question because it is very hard and, so far, only partial progress has been possible. They prefer problems they can solve completely. --Dan
On Dec 18, 2014, at 1:28 PM, Henry Baker <hbaker1@pipeline.com> wrote:
. . .
http://science.slashdot.org/story/14/12/18/1810221/cause-and-effect-how-a-re...
Statisticians have long thought it impossible to tell cause and effect apart using observational data. . . . Take for example, a correlation between wind speed and the rotation speed of a wind turbine. . . . That's a useful new trick in a statistician's armoury, particularly in areas of science where controlled experiments are expensive, unethical or practically impossible.
http://arxiv.org/abs/1412.3773
Distinguishing cause from effect using observational data: methods and benchmarks Joris M. Mooij, Jonas Peters, Dominik Janzing, Jakob Zscheischler, Bernhard Schölkopf (Submitted on 11 Dec 2014)
The discovery of causal relationships from purely observational data is a fundamental problem in science. . . . In addition, we prove consistency of the additive-noise method proposed by Hoyer et al. ( 2009).
participants (2)
-
Daniel Asimov -
Henry Baker