Work of the week: The Parable of Google Flu: Traps in Big Data Analysis

This paper’s[1] release in the spring of 2014 made a big splash, with ripples of media attention following from outlets including The New York Times[2], NPR[3], Slate[4], and The Guardian[5]. In it, David Lazer and his collaborators question the accuracy of Google Flu Trends (GFT), a highly celebrated algorithm that uses Google search queries to estimate flu activity in real time, and which, since its development in 2008[6], has become a poster child for the promise of big data. The authors are not the first to criticize GFT[7], nor are they the first to receive media attention for it (1).  Their argument’s value lies in its scope. By using GFT as an example, the authors send out a broader plea that big data analysts carry out their work with transparency, that they balance their analyses with traditional sources, and that they incorporate a certain finesse into their algorithms, so that they remain accurate even when the methods of data generation change. 

Lazer et al. make the important assertion that the size of our data sets can never replace good, robust statistical thinking, though the former can greatly enhance the latter’s capabilities. The creators of GFT knew this, but in the excitement that followed its development, there was, and still remains, the risk of others proceeding without due caution. Those of us who make daily use of large data sets must remember that big data is powerful, but won't walk on water. 

(1) This Nature News article[8] from 2013 received a similar cascade of popular media responses. 

References:

[1] Lazer, D., Kennedy, R., King, G., & Vespignani, A. (2014). Big data. The parable of Google Flu: traps in big data analysis. Science (New York, N.Y.), 343(6176), 1203–5. doi:10.1126/science.1248506

[2] Lohr, S. (2014, March 28). Google Flu Trends: The Limits of Big Data. New York Times. Retrieved from http://bits.blogs.nytimes.com/2014/03/28/google-flu-trends-the-limits-of-big-data/?smid=pl-share

[3] Harris, R. (2014, March 13). Google’s Flu Tracker Suffers From Sniffles. NPR. Retrieved from http://www.npr.org/blogs/health/2014/03/13/289802934/googles-flu-tracker-suffers-from-sniffles

[4] Auerbach, D. (2014, March 19). The Mystery of the Exploding Tongue: How Reliable is Google Flu Trends? Slate. Retrieved from http://www.slate.com/articles/technology/bitwise/2014/03/google_flu_trends_reliability_a_new_study_questions_its_methods.html

[5] Arthur, C. (2014, March). Google Flu Trends is no longer good at predicting flu, scientists find. The Guardian. Retrieved from http://www.theguardian.com/technology/2014/mar/27/google-flu-trends-predicting-flu

[6] Ginsberg, J., Mohebbi, M. H., Patel, R. S., Brammer, L., Smolinski, M. S., & Brilliant, L. (2009). Detecting influenza epidemics using search engine query data. Nature, 457(7232), 1012–4. doi:10.1038/nature07634

[7] Olson, D. R., Konty, K. J., Paladini, M., Viboud, C., & Simonsen, L. (2013). Reassessing Google Flu Trends data for detection of seasonal and pandemic influenza: a comparative epidemiological study at three geographic scales. PLoS Computational Biology, 9(10), e1003256. doi:10.1371/journal.pcbi.1003256

[8] Butler, D. (2013). When Google got flu wrong. Nature, 494(7436), 155–6. doi:10.1038/494155a