Sharing Data May Make It Less Accurate for All of Us
Lots of you work with big data that gets ever bigger. Here’s something to chew on as you add more databases and studies to the pile.
[Scientists are] finding that computer systems are quite good at storing and easing access to the enormous quantities of information they generate. But comparing and synthesizing all that data, in differing formats and styles and methods, requires human skill and judgment. And even the best aren’t sure how to do it, raising questions of whether the nationwide rush toward open data will really mean a momentous revolution in scientific progress or just a whole new level of gnarly reproducibility issues.
Mr. Curran is a professor of psychology at the University of North Carolina at Chapel Hill who studies the effects of alcoholic parents on their children. He combines findings from multiple studies and sees a challenge lurking in the varied scientific meanings and assessments that professional colleagues apply to terms such as “anxiety” and “depression.”
“The thing that keeps me up at night,” he said, “is, Am I making a substantive theoretical conclusion that is based on some artifact of how we scored the scale?”
Such questions represent a much-less-discussed aspect of the push for open science: Some researchers believe that the intricacies of translating and synchronizing data accurately are getting too little attention, even though reproducibility is already a major struggle. And those details will only grow more important as scientists begin writing the code for a future in which computers routinely extract answers from data piles far too big for any human to handle.