The Reproducible Research Guilt Trip May Finally Be Paying Off
We might be closer to killing off the "Just take my word for it - I'm pretty sure I did this right" methods section
Why is this even an issue? Biologists in particular seem to be collectively and subconsciously reacting to those awful General Chemistry labs where they had you copy down pages of instructions verbatim into your lab notebook. It should come as no surprise that bioinformatics is ground zero for reproducibility activism.
It is unfortunate reproducible research is tied up with all sorts of other holier-than-thou practices: open access, open source, open data, literate programming, blogging, functional programming. This all-encompassing evangelism tends to polarize people. While wonky über-programmers like C. Titus Brown lay out fundamental practices for reproducibility, most PIs have been publicly giving lip service to the idea of reproducible research, belying a "I don't wanna eat my vegetables"-type disdain. There are now "corsortia" and an "initiative" to compel scientists to actually write their shit down, preferably with door prizes. If you think this has a "posture pals" (video) feel to it, you're not alone. As the number of pro-RR articles has steadily increased, few take these to heart.
This head against wall bashing has been the pattern for many years - better tools are now available (RStudio, knitr, Galaxy, cloud computing, figshare, github, bitbucket) and more rah-rah from the blogosphere - but little enforcement from major journals. But now a recent development has raised my hopes, because it indicates editors have been tightening the screws enough to cause discomfort:
People have actually started to argue against reproducible research!
Hearts and Minds
The founder of the irreplicability movement is Christopher Drummond, author of “Reproducible Research: a Dissenting Opinion”. I will attempt to paraphrase his arguments here:
- Richard Feynman never had a Github account.
- No one is really going to read your damn code anyway.
- Writing shit down == A big drag, man.
- The Anil Potti incident proves liars always lie about their Rhodes Scholarships first. We should crack down on curricula vitae, not veritas curat.
A precursor to the dissenting opinion article is Drummond's "Replicability is not Reproducibility: Nor is it Good Science". A distinction is drawn between reproducibility and replicability, the former being what is advocated and the latter being more generalizable or scientifically provable. The idea we require researchers to submit their data and code, replicable research, is a narrow concept really only useful for ferreting out scientific misconduct.
I would argue that ignorance of biological sequence analysis, and even moreso statistics, is a bigger threat than the outright fraud seen in the Duke case. Most bioinformatics manuscripts feature analysis which is not replicable, which is frightening to consider when GWAS and exome NGS variant papers implicate so many genes in disease, many of them residing along a razor thin p-value threshold tweaked by several incomprehensible cherry picked program parameters.
It is not clear science can efficiently self-correct. So while replicability is not reproducibility, reproducibility is too slow to substitute for replicability. A manuscript that describes real reproducible biological phenomena is essentially conjecture until it can be repeated. The greatest ferret-legger the world has ever known will live in obscurity until they buy a ferret. We have a culture of scientists who refuse to buy a ferret.
Accounting for Tastes
- It is difficult to make polished software for others to use and that is not the point of research.
- Replicability is not reproducibility.
Getting these sequence analysis workflows to be reproducible will not require a highly skilled platoon of developers. Any willing researcher can submit a shell script or a build script of commands provided they avoid these common pitfalls:
- Using bioinformatics web applications with no web service capability
- Using desktop bioinformatics software with no logging capability
- Relying on proprietary institutional databases, perhaps with stored procedures that prove too unwiedly to dump
- Using command line programs without a directory-based bash history
- Using Excel to manually manipulate data
- The researcher was perfectly capable of submitting code but decided to retain a competitive advantage.
Replication does not prove a biological truth but we often don't even have the fleeting proof that a scientist did what they said they did.
Which brings us back to those damn chemistry labs. While many public access talk shows find chemists willing argue against evolution, you would be hard pressed to find a one who would argue against writing shit down.
In other words: Not writing shit down is an even worse idea than creationism.
--
There, I blogged in 2012.
This comment has been removed by a blog administrator.
ReplyDeletePLoS says the jig is up - no more irreproducible research - and now the real haters have come out in force: http://drugmonkey.wordpress.com/2014/02/25/plos-is-letting-the-inmates-run-the-asylum-and-this-will-kill-them/
ReplyDeleteThis comment has been removed by a blog administrator.
ReplyDeletePlease blog more on this topic.
ReplyDeleteI was very happy because I could find this very useful interesting article
ReplyDeleteI am sure, the article you wrote will be useful for many people
thank you very much, greetings. And good luck always.
Obat Patah Tulang Obat BAB Berdarah Obat Edema Paru Obat Mengi Cara Melancarkan Haid Cara Mengatasi Nyeri Dada Akibat Asam Lambung Penyebab Munculnya Benjolan Di Leher Obat Sekelan Obat Leuncangeun Cara Menurunkan Tekanan Darah Tinggi
Hi.
ReplyDeleteThank you for sharing this information. This good blog this blog was gives good information and we really liked this type of blogs.
Here is sharing some Salesforce Service Cloud information may be its helpful to you.
Salesforce Service Cloud Training