|
Image by Olimpia Zagnoli, New York Times |
Back in Mr. Wardle's high school economics class, we were once asked to look for a correlation between
any two datasets. I searched through the small school library and stumbled across historical data on marijuana use in Canada.
So, for fun, I compared it to unemployment data and found a strong negative correlation. More marijuana use, less unemployment.
Mr. Wardle liked my assignment. Granted, his sense of humour was famous; on the weekly ten point quizzes, we got a bonus mark for adding a caption to a
Far Side cartoon, and he announced the funniest caption the following class.
When he handed back the assignment, he reminded me of one key rule of research:
Correlation does not imply causation.
Which brings us to an
op-ed in yesterday's New York Times. Seth Stephens-Davidowitz, an economist interning at Google, presented evidence from Google searches for possible causes of depression in the United States. After unemployment, what was the best predictor of searches for depression?
I tested dozens of variables in many different categories. The strongest
predictor by far: an area’s average temperature in January. Colder
places have higher rates of depression, with the correlation
concentrated in the colder months. The relationship between weather and
mental health has been debated, but those debates have generally relied
on “small” data. Google searches, the biggest data source we currently
have, are unambiguous: when it comes to our happiness, climate matters a
great deal.
Paging Mr. Wardle, wherever you are.
What else happens in January in cold places?
It is dark. You don't need to be a mental health expert to know about 'seasonal affective
ed disorder', a common condition in places where the winter days are short. Yet Stephens-Davidowitz misses this critically relevant correlate to temperature and goes on to provide temperature-based advice:
The striking correlation between temperature
and depression suggests they should consider moving to a more temperate
location. Of course, people at risk for depression should hesitate to
abandon a job in a cold-winter location for no job in a warm-winter
clime, and they should think twice about moving away from family and
friends.
The advice may be good, even though the op-ed is probably mistakenly attributing many cases of northern depression to lower temperatures rather than less sunlight. If colder places are also darker in winter, does it matter which variable you use?
Yes, it matters, because we are talking about a correlation, not a perfect relationship. There are cold, northern cities with glorious sunny winters as well as mild, northern cities with depressing grey winter.
For example, if you suffer from winter depression, should you move from Montreal, with its notoriously frigid winters, to more temperate Vancouver? Probably not, because in Vancouver you're likely to experience weeks on end without seeing the sun. I counted 22 days of non-stop rain a few Novembers back.
The availability of internet search data allows researchers to
probe questions previously answered only with high effort, limited sample-size
opinion polls. There can be real value to analyses with Google Trends or other
storehouses or search data. The Center
s for Disease Control, for example,
works with Google because the number of people in an area
searching for information on the flu turned out to be the best available indicator of a
flu outbreak. On a simpler note, want to know whether people are more likely to use the term "climate change" or the term "global warming"? Try Google Trends,
and you'll see
the answer is clearly "global warming".
So by all means, examine data with Google Trends. Just remember Mr. Wardle's lesson; correlation does not imply causation. Even Stephens-Davidowitz seemed to understand this, at least in the case of one variable:
More Hispanic-Americans meant fewer searches (though this
might have been a result of language factors).
Might have. You think?