If the Neuroscience Information Framework is any guide, we are certainly in an era of “Openness” in biomedical science. A search of the NIF Registry of tools, databases and projects for biomedical science for “Open” leads to over 700 results, ranging from open access journals, to open data, to open tools. What do we mean by “open”? Well, not closed or, at least, not entirely closed. These open tools are, in fact, covered by a myriad of licenses and other restrictions on their use. But, the general theme is that they are open for at least non-commercial use without fees or undue licensing restrictions.
So, is Open Science already here? Not exactly. Open Science is more than a subset of projects that make data available or sharing of software tools, often because they received specific funding to do so. According to Wikipedia, “Open science is the umbrella term of the movement to make scientific research, data and dissemination accessible to all levels of an inquiring society, amateur or professional. It encompasses practices such as publishing open research, campaigning for open access, encouraging scientists to practice open notebook science, and generally making it easier to publish and communicate scientific knowledge.” Despite the wealth of Open platforms, most of the products of science, including, most notably, the data upon which scientific insights rests, remain behind closed doors. While attitudes and regulations are clearly changing, as the latest attempts by PLoS to establish routine sharing of data illustrate (just Google #PLOSfail), we are not there yet.
Why are so many pushing for routine sharing of data and a more open platform for conducting science? I became interested in data sharing in the late 1990’s as a microscopist as we started to scale up rate and breadth at which we could acquire microscopic images. Suddenly, due to precision stages and wide field cameras, we were able to image tissue sections at higher resolution over much greater expanses of tissue than before, when we were generally restricted to isolated snapshots or low magnification surveys. I knew that there was far more information within these micrographs and reconstructions than could be analyzed by a single scientist. It seemed a shame that they were not made more widely available. To help provide a platform, we established the Cell Centered Database, which has recently merged with the Cell Image Library. Although we were successful in the CCDB in attracting outside researchers to deposit their data, we were rarely contacted by researchers wanting to deposit their data. most of the time we had to ask, although many would release the data if we did. But I do distinctly remember one researcher saying to me: “I understand how sharing my data helps you, but not me”.
True. So in the interest of full disclosure, let me state a few things. I try to practice Open Science, but am not fanatical. I try to publish in open access journals, although I am not immune to the allure of prestigious closed journals. I do blog, make my slides available through Slide Share, and upload pre-prints to Research Gate. But I continue to remain sensitive to the fact that through my informatics work in the Neuroscience Information Framework and my advocacy for transforming scholarly communications through FORCE11 (the Future of Research Communications and e-Scholarship), I am now in a field where: A) I no longer really generate data. I generate ontologies and other information artefacts, and these I share, but not images, traces, sequences, blots, structures; B) I do benefit when others share their data, as I build my research these days on publicly shared data.
But do I support Open Science because I am a direct beneficiary of open data and tools? No. I support Open Science because I believe that Open Science = Good Science. To paraphrase Abraham Lincoln: “If I could cure Alzheimer’s disease by making all data open, I would do so; if I could cure Alzheimer’s disease by making all data closed, I would do so.” In other words, if the best way to do science is the current mode: publish findings in high impact journals that only become open access after a year, make sure no one can access or re-use your data, make sure your data and articles are not at all machine-processable, publish under-powered studies with only positive results, allow errors introduced by incorrect data or analyses to stay within the literature for years, then I’m all for it.
But, we haven’t cured Alzheimer’s disease or much else in the neurosciences lately. That’s not to say that our current science, based on intense competition and opaque data and methods, has not produced spectacular successes. It surely has. But the current system has also led to some significant failures as well, as the retreat of pharmaceutical companies from neuroscience testifies. Can modernizing and opening up the process of science to humans and machines alike accelerate the pace of discovery? I think we owe the taxpayers, who fund our work in hope of advancing society and improving human health, an honest answer here. Are we doing science as well as it can be done?
I don’t believe so. And, as this is a blog and not a research article, I am allowed to state that categorically. I believe that at a minimum, Open Science pushes science towards increased transparency, which, in my view, helps scientists produce better data and helps weed out errors more quickly. I also believe that our current modes of scientific communication are too restrictive, and create too high a barrier for us to make available all of the products of our work, and not just the positive results. At a maximum, I believe that routine sharing of data will help drive biomedical sciences towards increased discovery, not just because we will learn to make data less messy, but because we will learn to make better use of the messy data we have.
Many others have written on why scientists are hesitant or outright refuse to share their data and process (see #PLOSfail above) so I don’t need to go into detail here. But at least one class of frequent objections has to do with the potential harm that sharing will do to the researcher who makes data available. A common objection is that others will take advantage of data that you worked hard to obtain before you can reap the full benefits. Others say that there is no benefit to sharing negative results, detailed lab protocols or data, or blogging, saying that it is more productive for them to publish new papers than to spend time making these other products available. Others are afraid that if they make data available that might have errors, their competitors would attack them and their reputations would be tarnished. Some have noted that unlike in the Open Source Software community, where identifying and fixing a bug is considered a compliment, in other areas of scholarship, it is considered an attack.
All of these are certainly understandable objections. Our current reward system does not provide much incentive for Open Science, and changing our current culture, as I’ve heard frequently, is hard. Yes it is. But if our current reward system is supporting sub-optimal science, then don’t we as scientists have an obligation to change it? Taxpayers don’t fund us because they care about our career paths. No external forces that I know of support, or even encourage, our current system of promotion and reward: it is driven entirely by research scientists. Scientists run the journals, the peer-review system, the promotion committees, the academic administration, the funding administration, the scientific societies and the training of more scientists. Given that non-scientists are beginning to notice, as evidenced by articles in the Economist (2013) and other non-science venues about lack of reproducibility, perhaps it’s time to start protecting our brand.
While many discussions on Open Science have focused on potential harm to scientists who share their data and negative results, I haven’t yet seen discussions on the potential harm that Opaque Science does to scientists. Have we considered the harm that is done to graduate students and young scientists when they spend precious months or years trying to reproduce a result that was perhaps based on faulty data or selective reporting of results? I once heard a heart-breaking story of a promising graduate student who couldn’t reproduce the results of a study published in a high impact journal. His advisor thought the fault was his, and he was almost ready to quit the program. When he was finally encouraged to contact the author, he found that they couldn’t necessarily reproduce the results either. I don’t know whether the student eventually got his degree, but you can imagine the impact such an experience has on young scientists. Beyond my anecdotal example above, we have documented examples where errors in the literature have significant effects on grants awarded or the ability to publish papers that are in disagreement (e.g., Miller, 2006). All of these have a very real human cost to science and scientists.
On a positive note, for the first time in my career, since I sipped the Kool Aid back in the early days of the internet, I am seeing real movement by not just a few fringe elements, but by journals, senior scientists, funders and administrators, towards change. It is impossible to take a step without tripping over a reference to Big Data or metadata. Initiatives are underway to create a system of reward around data in the form of data publications and data citations. NIH has just hired Phil Bourne, a leader in the Open Science movement, as Associate Director of Data Science. And, of course, time is on our side, as younger scientists and those entering into science perhaps have different attitudes towards sharing than their older colleagues. Time will also tell whether Open Science = Good Science. If it doesn’t, I promise to be the first to start hoarding my data again and publishing only positive results.
Economist, How Science Goes Wrong, Oct 19, 2013
Miller, G. (2006) A scientist’s nightmare: software problem leads to five retractions. Science, 22, 314, pp 1856-1857.