From that title, you'd think that a bioinformatics person sharing their code was a rare event. It's true that raw data sharing as is still a rarity of amongst experimenters; the no-insider-information, real-time posting of raw experimental data pre-publication (i.e. Open Notebook Science) is an even greater rarity amongst experimenters. However, if having public code repositories is the bioinformatics version of Open Notebook Science (ONS), then ONS is hardly new to bioinformatics. A search for "bioinformatics" on Sourceforge (the largest open source software repository, which is similar to the Google Code repository used for Pedro's project) yields 126 results with projects dating back to 2001. I personally have a bioinformatics project in Sourceforge dating back to 2003.
But the truth is that openness is not a rare event in computer science or bioinformatics. Although, it is pretty common to read a publication where the authors don't provide their source code, such publications are generally looked down upon and are typically less cited (if you make me email you for your code and another person has the same code for free and easily accessible, why should I email you). Bioinformatics already benefits from the open atmosphere that pervades computer science. When I write a new bioinformatics algorithm, I almost always take advantage of the vast amount of publicly available tutorials and software (from C libraries and perl modules to bioinformatics-specific code like bioperl and bioconductor). Almost all of the large collaborative bioinformatics projects like Bioperl and Gbrowse provide live versioned repositorys like cvs or subversion, these projects have been around for years.
So is a project repository such as Sourceforge or Google Code the bioinformatics equivalent of Open Notebook Science? Yes, I believe it is. Congratulations computer programmers and bioinformaticians of the world, you already experience the value created by open sharing. In fact when I switched over from a pure bioinformatics job to become a hybrid scientist that spends half of the time doing experiments, I was shocked at how few tutorials there are on the internet to teach people experimental biology. All of these attitudes of data hiding seemed odd to me, so when I bumped into Jean-Claude's ONS article, it was great to finally see someone willing to dispel the fear-of-being-scooped myth and overcome the organizational hurdle requiring a level of annotation so that others could read and understand your code (i.e. experiments). Computer science removed those myths many years ago via the heroic efforts of those now famous names like Richard Stallman and Linus Torvalds. People in computer science don't fear being scoped, they typically praise it. How many variants of internet browsers derive their code from the Mozilla project? Computer science has also created standards and social norms for code annotation; poorly annotated computer code is very much looked down upon by true hackers. In the future, a poorly annotated open lab notebook from an experimental biologist will be viewed in a similar light.
Summary so far:
- Bioinformatics is already open
- Experimentalists can learn from bioinformatics (as bioinformatics learned from computer science) that:
- openness does not lead to widespread, uncontrolled idea theft; furthermore, idea theft (if properly acknowledged) is actually the highest form of praise; if hundreds of people are using your data before you've even finished your project, congratulations, you're doing one hell-of-a-job as a scientist
- annotation standards must become a part of the social atmosphere of the open notebook science community; we must praise notebooks written well enough that any scientist in a similar field could immediately understand, interpret, and replicate the experiment from the notebook.
So what is the role of bioinformatics in the future of Open Notebook Science?
First, bioinformatics programmers need to continue doing what they've been striving for since bioinformatics began:
- develop your code as open source projects on one of the standard code repositories or at least put a link to your downloadable sourcecode on a public website
- provide a README file, installation instructions, and a few example data files so that people can get up and running easily
- if possible develop your code to work on a wide variety of platforms
- ensure that your code is annotated well-enough that other programmers can read it (preferably using one of the standard formats like perldoc with perl or doxygen with C)
2 comments:
The question of what constitutes Open Notebook Science in non experimental fields is not simple to answer. You are right that software has been created openly for a long time now. In fact when I started doing this work I was using the term "Open Source Science", thinking that the analogy would be obvious but it was not. Too many people assumed that meant Open Source Scientific Software.
That's why I started to use the term Open Notebook Science, bringing the focus to the laboratory notebook. My assumption was that anyone doing experimental science must keep a lab notebook. If that notebook is completely public you are doing ONS - if it isn't then you're doing something else.
When we started collaborating with people doing non-experimental work, like docking, things got a lot more challenging. I've tried to maintain "experiment-like" pages on our wiki with links to the libraries, algorithms and result files. But because so much information gets generated I don't think it is possible to capture all the mistakes like we can with a chemistry experiment.
I think the key distinction to make is again that of "no insider information". When a student and PI get together to write a paper, are they using only public data and files to construct that paper? Could someone else not in that group (human or otherwise) construct the equivalent of that same paper using only files made public by the research group? If so then I think they are doing ONS. If not then they are doing something else - it might be a form of Open Science but not ONS.
Terminologies aside I think bioinformatics should be playing a strong role in ONS because of the open source culture and tech aptitude. Still, it is not that easy to gain critical mass in a particular field of research to get several people around a discovery project.
In a related note, I just gave a talk to a PhD program meeting in Portugal about web resources for scientists. I ended the talk mentioning open notebook science and the very first question was about scooping. Most people find the concept interesting but they also think it is very risky.
Post a Comment