From that title, you'd think that a bioinformatics person sharing their code was a rare event. It's true that raw data sharing as is still a rarity of amongst experimenters; the no-insider-information, real-time posting of raw experimental data pre-publication (i.e. Open Notebook Science) is an even greater rarity amongst experimenters. However, if having public code repositories is the bioinformatics version of Open Notebook Science (ONS), then ONS is hardly new to bioinformatics. A search for "bioinformatics" on Sourceforge (the largest open source software repository, which is similar to the Google Code repository used for Pedro's project) yields 126 results with projects dating back to 2001. I personally have a bioinformatics project in Sourceforge dating back to 2003.
But the truth is that openness is not a rare event in computer science or bioinformatics. Although, it is pretty common to read a publication where the authors don't provide their source code, such publications are generally looked down upon and are typically less cited (if you make me email you for your code and another person has the same code for free and easily accessible, why should I email you). Bioinformatics already benefits from the open atmosphere that pervades computer science. When I write a new bioinformatics algorithm, I almost always take advantage of the vast amount of publicly available tutorials and software (from C libraries and perl modules to bioinformatics-specific code like bioperl and bioconductor). Almost all of the large collaborative bioinformatics projects like Bioperl and Gbrowse provide live versioned repositorys like cvs or subversion, these projects have been around for years.
So is a project repository such as Sourceforge or Google Code the bioinformatics equivalent of Open Notebook Science? Yes, I believe it is. Congratulations computer programmers and bioinformaticians of the world, you already experience the value created by open sharing. In fact when I switched over from a pure bioinformatics job to become a hybrid scientist that spends half of the time doing experiments, I was shocked at how few tutorials there are on the internet to teach people experimental biology. All of these attitudes of data hiding seemed odd to me, so when I bumped into Jean-Claude's ONS article, it was great to finally see someone willing to dispel the fear-of-being-scooped myth and overcome the organizational hurdle requiring a level of annotation so that others could read and understand your code (i.e. experiments). Computer science removed those myths many years ago via the heroic efforts of those now famous names like Richard Stallman and Linus Torvalds. People in computer science don't fear being scoped, they typically praise it. How many variants of internet browsers derive their code from the Mozilla project? Computer science has also created standards and social norms for code annotation; poorly annotated computer code is very much looked down upon by true hackers. In the future, a poorly annotated open lab notebook from an experimental biologist will be viewed in a similar light.
Summary so far:
- Bioinformatics is already open
- Experimentalists can learn from bioinformatics (as bioinformatics learned from computer science) that:
- openness does not lead to widespread, uncontrolled idea theft; furthermore, idea theft (if properly acknowledged) is actually the highest form of praise; if hundreds of people are using your data before you've even finished your project, congratulations, you're doing one hell-of-a-job as a scientist
- annotation standards must become a part of the social atmosphere of the open notebook science community; we must praise notebooks written well enough that any scientist in a similar field could immediately understand, interpret, and replicate the experiment from the notebook.
So what is the role of bioinformatics in the future of Open Notebook Science?
First, bioinformatics programmers need to continue doing what they've been striving for since bioinformatics began:
- develop your code as open source projects on one of the standard code repositories or at least put a link to your downloadable sourcecode on a public website
- provide a README file, installation instructions, and a few example data files so that people can get up and running easily
- if possible develop your code to work on a wide variety of platforms
- ensure that your code is annotated well-enough that other programmers can read it (preferably using one of the standard formats like perldoc with perl or doxygen with C)