thomascolthurst ([personal profile] thomascolthurst) wrote2010-03-23 10:11 am

(no subject)

I really enjoyed Andrew Gelman's recent review essay on the relationship between causal inference and statistics. That might sound boring, but really it is the very heart of science: what can data tell us about causes?

As you might imagine, answers to that question vary a lot. Some people say that data can never tell us about causes. (I'm looking at you, David Hume). Most statisticians would say that certain specially designed experiments can tell us about certain kinds of causes, but of course there is much disagreement about which kinds of experiments and causes fit the bill.

I come to statistics through AI, so my position is the opposite extreme from Hume, which Gelman ably summarizes as: "a computer should be able to discern causal relationships from observational data, based on the reasonable argument that we, as humans, can do this ourselves in our everyday life with little recourse to experiment."

Still, this isn't a very popular view, so I thought it could use some advertising. A catchy jingle, maybe?

The birds do it,
The bees do it,
Even the rich folks on 5th avenue do it,
Let's do it,
Let's infer causes from purely observational data!


Comments are open, but this blog does sadly have a new comments policy:  all insults and ad hominem attacks will be deleted.  Keep it classy, internets.

[identity profile] dibalh.livejournal.com 2010-03-24 12:52 pm (UTC)(link)
It seems likely to me that this is a data- and domain-dependent question. I forwarded this article to the whole research group I work in -- many of us try to infer causes from data while looking at data concerning the concentration of a small subset of biological molecules at a small number of time points in some cell line of interest. This is a *hard* problem, and the data may well not be sufficient to the task at all, *even though it's not hard to convince yourself you've learned something causal when you turn the crank on your mathematical methods*.

[identity profile] thomascolthurst.livejournal.com 2010-03-24 05:23 pm (UTC)(link)
I agree that, as a statement about current best or accepted practice, inferring causes from data needs to be done in a domain dependent manner. I have the hope and desire, however, that our mathematical methods can be greatly improved so that (a) they are better able to capture the sort of domain relevant information that humans use to judge when data can support causal inference and (b) they lie less when folks turn the crank.

I don't think this is a naive or far off hope, either -- I think you can get a lot of the way there by doing good old fashioned Bayesian inference starting from priors over Pearl-style graphical models.

Of course, this might not help your particular problem at all; I get the impression that in intracellular biology, almost everything can potentially affect everything else, which leads to weak priors which leads to the data rarely being able to tell you anything about causes. Still, that's better than your analysis telling you the wrong thing.