[personal profile] thomascolthurst
I really enjoyed Andrew Gelman's recent review essay on the relationship between causal inference and statistics. That might sound boring, but really it is the very heart of science: what can data tell us about causes?

As you might imagine, answers to that question vary a lot. Some people say that data can never tell us about causes. (I'm looking at you, David Hume). Most statisticians would say that certain specially designed experiments can tell us about certain kinds of causes, but of course there is much disagreement about which kinds of experiments and causes fit the bill.

I come to statistics through AI, so my position is the opposite extreme from Hume, which Gelman ably summarizes as: "a computer should be able to discern causal relationships from observational data, based on the reasonable argument that we, as humans, can do this ourselves in our everyday life with little recourse to experiment."

Still, this isn't a very popular view, so I thought it could use some advertising. A catchy jingle, maybe?

The birds do it,
The bees do it,
Even the rich folks on 5th avenue do it,
Let's do it,
Let's infer causes from purely observational data!


Comments are open, but this blog does sadly have a new comments policy:  all insults and ad hominem attacks will be deleted.  Keep it classy, internets.

Date: 2010-03-23 06:32 pm (UTC)
randysmith: (Default)
From: [personal profile] randysmith
Isn't this a philosophical issue versus a practical one? I mean, yes, humans do it all the time, and they're wrong pretty often, too. I'd value computers getting to be as good as we are (heck, I suspect if we can get them that far, we can get them better than us) but I don't disagree with the philosophical absolute statement that data can't tell us about causes.

(Written without having read the article; I may come back and backpedal severely after I've done so.)

Date: 2010-03-24 12:38 am (UTC)
From: [identity profile] thomascolthurst.livejournal.com
I guess I view it as both a philosophical issue and a practical one. The way I phrased it above ("What can data tell us about causes?") may make it sound exclusively philosophical, but you could also phrase it a purely practical way: What can data tell you about what to do, given that what you care about is almost always different from what you can directly effect? (Here I'm construing "directly effect" very narrowly -- basically just muscle control -- and what you care about very broadly: health, wealth, happiness, etc. for yourself and others.)

But here's a question for you: if data can't tell us about causes, what can? If humans infer causes incorrectly pretty often as opposed to all of the time, then sometimes they must get causes right -- how?

Date: 2010-03-24 12:52 pm (UTC)
From: [identity profile] dibalh.livejournal.com
It seems likely to me that this is a data- and domain-dependent question. I forwarded this article to the whole research group I work in -- many of us try to infer causes from data while looking at data concerning the concentration of a small subset of biological molecules at a small number of time points in some cell line of interest. This is a *hard* problem, and the data may well not be sufficient to the task at all, *even though it's not hard to convince yourself you've learned something causal when you turn the crank on your mathematical methods*.

Date: 2010-03-24 05:23 pm (UTC)
From: [identity profile] thomascolthurst.livejournal.com
I agree that, as a statement about current best or accepted practice, inferring causes from data needs to be done in a domain dependent manner. I have the hope and desire, however, that our mathematical methods can be greatly improved so that (a) they are better able to capture the sort of domain relevant information that humans use to judge when data can support causal inference and (b) they lie less when folks turn the crank.

I don't think this is a naive or far off hope, either -- I think you can get a lot of the way there by doing good old fashioned Bayesian inference starting from priors over Pearl-style graphical models.

Of course, this might not help your particular problem at all; I get the impression that in intracellular biology, almost everything can potentially affect everything else, which leads to weak priors which leads to the data rarely being able to tell you anything about causes. Still, that's better than your analysis telling you the wrong thing.

Date: 2010-03-24 01:46 pm (UTC)
From: [identity profile] megmuck.livejournal.com
Do you mean "directly effect" -cause- or "directly affect" - change?

Date: 2010-03-24 05:05 pm (UTC)
From: [identity profile] thomascolthurst.livejournal.com
I think I mean/meant "directly affect", but that's just because I'm using it in a verb context, right?

Which is to say: if one was allowed to verbize "effect" the same way that other nouns are verbized, then I'm not sure what the semantic difference would be between affect and effect-as-a-verb.

Profile

thomascolthurst

February 2011

S M T W T F S
  12345
6789101112
13141516 1718 19
20212223242526
2728     

Page Summary

Style Credit

Expand Cut Tags

No cut tags
Page generated Jun. 26th, 2025 09:21 pm
Powered by Dreamwidth Studios