<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-3472100433840968771</id><updated>2011-08-07T04:09:53.087-07:00</updated><category term='epistemology'/><category term='scientific ethics'/><category term='response surface methods'/><category term='published'/><category term='scientific publishing'/><category term='peer review'/><category term='books'/><category term='host microbe interactions'/><category term='microarrays'/><category term='in progress'/><category term='how the mind works'/><category term='biotechnology'/><category term='ChIP'/><category term='fractional factorial'/><category term='experimental design'/><category term='bioinformatics'/><category term='evolution'/><category term='untested ideas'/><category term='open notebook science'/><title type='text'>J's blog</title><subtitle type='html'></subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://blog-di-j.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3472100433840968771/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://blog-di-j.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>J</name><uri>http://www.blogger.com/profile/02818497841390003056</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>23</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-3472100433840968771.post-3060482882697832367</id><published>2007-12-13T17:44:00.001-08:00</published><updated>2007-12-17T16:07:39.772-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='bioinformatics'/><category scheme='http://www.blogger.com/atom/ns#' term='open notebook science'/><title type='text'>The role of bioinformatics in Open Notebook Science</title><content type='html'>&lt;span&gt;At one time in my life, I must have subscribed to "what's going on in the blogosphere" email updates from Genome Technology.  Recently they were &lt;a href="http://www.genome-technology.com/issues/blog/general/143788-1.html"&gt;promoting&lt;/a&gt; a blog post about an "open notebook" &lt;a href="http://code.google.com/p/domainevolution/"&gt;bioinformatics project&lt;/a&gt; by &lt;/span&gt;&lt;a href="http://pbeltrao.blogspot.com/2007/12/open-science-project-on-domain-family.html"&gt;&lt;span class="bodycopy"&gt;&lt;span class="bodycopy"&gt;Pedro Beltrao&lt;/span&gt;&lt;/span&gt;&lt;/a&gt;&lt;span style="font-style: italic;"&gt;. &lt;/span&gt; I think it's great that Pedro is testing out Google Projects as a repository for developing bioinformatics applications and providing a forum for discussion, code versioning, and code releases.  However, what surprised me a little was the title of the Genome Technology post: &lt;span style="font-style: italic; font-weight: bold;"&gt;"Yes, it's OK to share your results&lt;/span&gt;".&lt;br /&gt;&lt;br /&gt;From that title, you'd think that a bioinformatics person sharing their code was a rare event.  It's true that raw data sharing as is still a rarity of amongst experimenters; the &lt;span style="font-style: italic;"&gt;no-insider-information&lt;/span&gt;, real-time posting of raw experimental data pre-publication (i.e. &lt;a href="http://drexel-coas-elearning.blogspot.com/2006/09/open-notebook-science.html"&gt;Open Notebook Science&lt;/a&gt;) is an even greater &lt;a href="http://open-notebook-science.dabbledb.com/publish/open-notebook-science/54352d3e-b6d6-4c7b-bbbd-eaba3f1e7984/opennotebooksciencepeople.html"&gt;rarity amongst experimenters&lt;/a&gt;.  However, if having public code repositories is the bioinformatics version of Open Notebook Science (ONS), then ONS is hardly new to bioinformatics. A search for "bioinformatics" on  Sourceforge (the largest open source software repository, which is similar to the Google Code repository used  for Pedro's project) yields 126 results with projects dating back to 2001.  I personally have a &lt;a href="http://sourceforge.net/projects/lwgv/"&gt;bioinformatics project in Sourceforge dating back to 2003&lt;/a&gt;&lt;span style="font-style: italic;"&gt;.&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;&lt;span&gt;But the truth is that openness is not a rare event in computer science or bioinformatics.  Although, it is pretty common to read a publication where the authors don't provide their source code, such publications are generally looked down upon and are typically less cited (if you make me email you for your code and another  person has the same code for free and easily accessible, why should I email you). &lt;/span&gt;Bioinformatics already benefits from the open atmosphere that pervades computer science. When I write a new bioinformatics algorithm, I almost always take advantage of the vast amount of publicly available tutorials and software (from C libraries and perl modules to bioinformatics-specific code like bioperl and bioconductor).  Almost all of the large collaborative bioinformatics projects like &lt;a href="http://www.bioperl.org/wiki/Using_CVS"&gt;Bioperl&lt;/a&gt; and &lt;a href="http://www.gmod.org/wiki/index.php/CVS_Access"&gt;Gbrowse&lt;/a&gt;&lt;span style="font-style: italic;"&gt; &lt;/span&gt;&lt;span&gt;provide live versioned repositorys like cvs or subversion, these projects have been around for years.&lt;br /&gt;&lt;br /&gt;So is a project repository such as &lt;a href="http://sourceforge.net/"&gt;Sourceforge&lt;/a&gt; or &lt;a href="http://code.google.com/hosting/"&gt;Google Code&lt;/a&gt; the bioinformatics equivalent of Open Notebook Science? Yes, I believe it is.  Congratulations computer programmers and bioinformaticians of the world, you already experience the value created by open sharing.  In fact when I switched over from a pure bioinformatics job to become a hybrid scientist that spends half of the time doing experiments, I was shocked at how few tutorials there are on the internet to teach people experimental biology.  All of these attitudes of data hiding seemed odd to me, so when I bumped into &lt;a href="http://drexel-coas-elearning.blogspot.com/2006/09/open-notebook-science.html"&gt;Jean-Claude's ONS article&lt;/a&gt;, it was great to finally see someone willing to &lt;span style="font-style: italic;"&gt;dispel&lt;/span&gt; the &lt;span style="font-style: italic;"&gt;fear-of-being-scooped myth&lt;/span&gt; and &lt;span style="font-style: italic;"&gt;overcome&lt;/span&gt; the &lt;span style="font-style: italic;"&gt;organizational hurdle&lt;/span&gt; requiring a level of annotation so that others could read and understand your code (i.e. experiments).  Computer science removed those myths many years ago via the heroic efforts of those now famous names like Richard Stallman and Linus Torvalds. People in computer science don't fear being scoped, they typically praise it. How many variants of internet browsers derive their code from the Mozilla project?  Computer science has also created standards and social norms for code annotation; poorly annotated computer code is very much looked down upon by true hackers.  In the future, a poorly annotated open lab notebook from an experimental biologist will be viewed in a similar light.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size:180%;"&gt;&lt;span style="font-weight: bold;"&gt;Summary so far:&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;/span&gt;&lt;ol&gt;&lt;li&gt;Bioinformatics is already open&lt;/li&gt;&lt;li&gt;Experimentalists can learn from bioinformatics (as bioinformatics learned from computer science) that:&lt;/li&gt;&lt;ol&gt;&lt;li&gt;openness does not lead to widespread, uncontrolled idea theft; furthermore, idea theft (if properly acknowledged) is actually the highest form of praise; if hundreds of people are using your data before you've even finished your project, congratulations, you're doing one hell-of-a-job as a scientist&lt;/li&gt;&lt;li&gt;annotation standards must become a part of the social atmosphere of the open notebook science community; we must praise notebooks written well enough that any scientist in a similar field could immediately understand, interpret, and replicate the experiment from the notebook.&lt;/li&gt;&lt;/ol&gt;&lt;/ol&gt;I think that bioinformatics has already done a great service for ONS by setting an example for experimentalists to follow.  Therefore, I feel we do a disservice to the future of ONS, by promoting open bioinformatics projects as great new contributions. Yes they are contributions, but they are hardly new.  It's a little bit like patting America on the back for lowering HIV survival rates for yet another year, when we should really be focused on whether it's possible to do the same lowering for Africa, where the heart of the problem lies.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;So what is the role of bioinformatics in the future of Open Notebook Science?&lt;/span&gt;&lt;br /&gt;First, bioinformatics programmers need to continue doing what they've been striving for since bioinformatics began:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;develop your code as open source projects on one of the standard code repositories or at least put a link to your downloadable sourcecode on a public website&lt;br /&gt;&lt;/li&gt;&lt;li&gt;provide a README file, installation instructions, and a few example data files so that people can get up and running easily&lt;/li&gt;&lt;li&gt;if possible develop your code to work on a wide variety of platforms&lt;/li&gt;&lt;li&gt;ensure that your code is annotated well-enough that other programmers can read it (preferably using one of the standard formats like perldoc with perl or doxygen with C)&lt;br /&gt;&lt;/li&gt;&lt;/ol&gt;Beyond this, bioinformatics will have a large role to play in the future of ONS for experimentalists.   Being a hacker myself, it's clear that the current breed of software for ONS is far from ideal.  In an ideal world, the software (web interface, GUI or whatever) I'm running to log my ONS experimental work would check that the experiment I'm about to run is at least sensible (e.g. the software should warn me if the buffer in my protocol is not compatible with the reaction I'm trying to run; or if the annealing temperature I'm using for my PCR is too low for the primers in the reaction).  When I run a digestion, I want to know the success rate of everyone else that's ever run a similar digestion, I want to know their success rate with different buffers, I want to know their success rate given how old their restriction enzyme is and the batch it is from.  I want the raw sequence data I enter into the ONS to be viewable in a traceviewer, with an interface that automatically BLASTs the sequence against the species I'm interested in to help me figure out what I've sequenced.  I want to upload all of my sequences in a single zip archive and have the software organize it form me &lt;a href="http://blog.openwetware.org/scienceintheopen/2007/12/17/the-problem-with-data/"&gt;rather than uploading my files one at a time&lt;/a&gt;.  I want scientists to be able to leave comments in my notebook, I want them to be able to received emails when someone replies to their comments (the lack of this feature kills many blog-based discussions).  Like a wiki, I want to have the entire history of every file accessible to all readers.  Like a blog, I want to provide RSS feeds so that all of the project's collaborators receive automagic updates when new experiments are added.  I want all of this to be easy and intuitive.  And someone that understands biology has to write the code...&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3472100433840968771-3060482882697832367?l=blog-di-j.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://blog-di-j.blogspot.com/feeds/3060482882697832367/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3472100433840968771&amp;postID=3060482882697832367' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3472100433840968771/posts/default/3060482882697832367'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3472100433840968771/posts/default/3060482882697832367'/><link rel='alternate' type='text/html' href='http://blog-di-j.blogspot.com/2007/12/role-of-bioinformatics-in-open-notebook.html' title='The role of bioinformatics in Open Notebook Science'/><author><name>J</name><uri>http://www.blogger.com/profile/02818497841390003056</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3472100433840968771.post-8800466813104186662</id><published>2007-12-04T15:25:00.000-08:00</published><updated>2008-02-23T14:04:48.757-08:00</updated><title type='text'>Optimized ChIP Protocols</title><content type='html'>This page contains links to a &lt;a href="http://blog-di-j.blogspot.com/2007/10/factorial-and-response-surface.html"&gt;chromatin immunoprecipitation protocol optimized with factorial and response surface methods.&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;The optimization resulted in two protocols that both have more than 10-fold higher throughput than the original.  One of the protocols was optimized for speed and requires only 1.5 days to complete with a 46% average improvement in signal-to-noise ratio over the original protocol.  The second protocol, optimized for signal-to-noise, requires  2.5 days to complete and achieves a 293% average improvement in signal-to-noise ratio over the original protocol.  The optimizations were done in &lt;span style="font-style: italic;"&gt;E. coli&lt;/span&gt; using three different transcription factors (two were primarily for validation).&lt;br /&gt;&lt;span style="font-size:130%;"&gt;&lt;a href="http://www.jeremiahfaith.com/blog_figs/new_protocol.pdf"&gt;&lt;br /&gt;Download the new ChIP Protocols (pdf)&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;a href="http://www.jeremiahfaith.com/blog_figs/original_protocol.pdf"&gt;Download the original ChIP Protocol (pdf)&lt;/a&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;The above links will always refer to the most recent version of the protocol if future improvements are added.  If you have any questions or comments on the protocol, please post them to this blog.&lt;br /&gt;&lt;br /&gt;All raw data and experiments that went towards this protocol optimization are in my &lt;a href="http://www.jeremiahfaith.com/open_notebook_science/"&gt;Lab Notebook&lt;/a&gt; in the Chapter entitled: &lt;span style="font-style: italic;"&gt;Towards a faster, more reliable ChIP protocol.&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style="font-weight: bold;font-size:130%;" &gt;Older Versions&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style="font-size:130%;"&gt;&lt;span style="font-size:100%;"&gt;&lt;span style="font-size:85%;"&gt;The protocol has a Change Log to describe what changes have occured between versions of the protocol.  Archived older versions are below:&lt;br /&gt;&lt;br /&gt;&lt;a href="http://www.jeremiahfaith.com/blog_figs/new_protocol_v1.2.pdf"&gt;ChIP Protocol Version 1.2&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;a href="http://www.jeremiahfaith.com/blog_figs/new_protocol_v1.1.pdf"&gt;ChIP Protocol Version 1.1&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;a href="http://www.jeremiahfaith.com/blog_figs/new_protocol_v1.0.pdf"&gt;ChIP Protocol Version 1.0&lt;/a&gt;&lt;/span&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3472100433840968771-8800466813104186662?l=blog-di-j.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://blog-di-j.blogspot.com/feeds/8800466813104186662/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3472100433840968771&amp;postID=8800466813104186662' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3472100433840968771/posts/default/8800466813104186662'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3472100433840968771/posts/default/8800466813104186662'/><link rel='alternate' type='text/html' href='http://blog-di-j.blogspot.com/2007/12/optimized-chip-protocols.html' title='Optimized ChIP Protocols'/><author><name>J</name><uri>http://www.blogger.com/profile/02818497841390003056</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3472100433840968771.post-3538091966611459161</id><published>2007-11-07T10:14:00.000-08:00</published><updated>2007-11-07T10:51:59.409-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='scientific publishing'/><title type='text'>My first appearance in a scientific credit list</title><content type='html'>Tim Gardner (my PI) and Michael Molla wrote a guest blog post at PLoS on how &lt;a href="http://www.plos.org/cms/node/285"&gt;science can learn from the movie industry.&lt;/a&gt;  At the end of a movie, the role of everyone in the movie is clearly spelled out in the movie credits, while on scientific publications you only have a list of ordered names from which to try and infer the authors' role.&lt;br /&gt;&lt;blockquote&gt; &lt;span style="font-size:78%;"&gt;Excerpt from their &lt;a href="http://www.plos.org/cms/node/285"&gt;scientific credit list&lt;/a&gt; post:&lt;/span&gt;&lt;br /&gt;There is a better system, and it's already in use in the film industry -- a credit list. Each person who contributed to a movie has a specific credit describing his or her contribution. If one's contribution fills more than one role, that person's name can appear more than once.&lt;/blockquote&gt;&lt;br /&gt;Apparently, I contributed enough to this particular scientific endeavor of Tim and Michael to earn a spot in the credits:&lt;br /&gt;&lt;h1 style="font-style: italic;"&gt;&lt;span style="font-size:100%;"&gt;&lt;/span&gt;&lt;/h1&gt;&lt;a href="http://www.plos.org/cms/node/285"&gt;&lt;span style="font-size:100%;"&gt;&lt;/span&gt;&lt;/a&gt;&lt;blockquote&gt;&lt;br /&gt;&lt;span style="font-size:78%;"&gt;Excerpt from their &lt;a href="http://www.plos.org/cms/node/285"&gt;scientific credit list&lt;/a&gt; post:&lt;/span&gt;&lt;span style="font-size:78%;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style="font-weight: bold;"&gt;&lt;span style="font-size:130%;"&gt;&lt;a href="http://www.plos.org/cms/node/285"&gt;Roll Credits: Sometimes the Authorship Byline Isn't Enough&lt;/a&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Concept:&lt;/span&gt; Michael Molla (1) and Tim Gardner (2)&lt;br /&gt;&lt;span style="font-weight: bold;"&gt; Writer:&lt;/span&gt; M. Molla&lt;br /&gt;&lt;span style="font-weight: bold;"&gt; Editor:&lt;/span&gt; T. Gardner&lt;br /&gt;&lt;span style="font-weight: bold;"&gt; Readers: &lt;/span&gt;Jeff Hasty (3) Jeremiah Faith (4)&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;&lt;/span&gt;(1) Research Associate, Biomedical Engineering, Boston University&lt;br /&gt;(2) Assistant Professor, Biomedical Engineering, Boston University&lt;br /&gt;(3) Associate Professor, Department of Bioengineering, University of California, San Diego&lt;br /&gt;(4) Ph.D. Candidate, Bioinformatics Program, Boston University&lt;/blockquote&gt;&lt;br /&gt;I've written before about how our &lt;a href="http://blog-di-j.blogspot.com/2007/09/towards-richer-scientific-literature.html"&gt;current publishing systems are certainly falling behind their potential&lt;/a&gt;.  And while &lt;a href="http://drexel-coas-elearning.blogspot.com/2006/09/open-notebook-science.html"&gt;revolutionary ideas about how science should be done&lt;/a&gt; may be the way of the future, I still think we can benefit now from these types of incremental improvements to our current system.&lt;br /&gt;&lt;blockquote&gt;&lt;span style="font-size:78%;"&gt;Excerpt from their &lt;a href="http://www.plos.org/cms/node/285"&gt;scientific credit list&lt;/a&gt; post:&lt;br /&gt;&lt;/span&gt;Such a research credit system would have huge benefits for one's career prospects; and it might encourage more effective collaborations. Moreover, these credits could easily be tracked by scientist or project in a database akin to the &lt;a href="http://www.imdb.com/" rel="nofollow"&gt;Internet Movie Database (IMDB)&lt;/a&gt;. It could provide an alternative to the ever-so-important citation factors as a means of assessing one's scientific impact. And maybe one day there will even be an Academy Awards of Science.&lt;/blockquote&gt;&lt;br /&gt;Here's hopin I win the &lt;span style="font-style: italic;"&gt;Best Reader&lt;/span&gt; award at the 2008 Academy Awards of Science.&lt;br /&gt;&lt;span style="font-style: italic;"&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3472100433840968771-3538091966611459161?l=blog-di-j.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://blog-di-j.blogspot.com/feeds/3538091966611459161/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3472100433840968771&amp;postID=3538091966611459161' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3472100433840968771/posts/default/3538091966611459161'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3472100433840968771/posts/default/3538091966611459161'/><link rel='alternate' type='text/html' href='http://blog-di-j.blogspot.com/2007/11/my-first-appearance-in-scientific.html' title='My first appearance in a scientific credit list'/><author><name>J</name><uri>http://www.blogger.com/profile/02818497841390003056</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3472100433840968771.post-6459769373840444028</id><published>2007-10-24T15:09:00.002-07:00</published><updated>2007-10-24T15:30:17.493-07:00</updated><title type='text'>How this blog works</title><content type='html'>I've been blogging for about 6 months now, and it has definitely been more interesting and productive than I initially thought it would be. One thing I've found I don't like about blogging is that most blogs just throw stuff out there. I understand this is part of the blogginess of blogging, but it makes it really hard to get oriented. When you go to someone's blog for the first time, it often feels like random stuff is just being tossed onto the web. Only after following the blog for a while will you really figure out if the author has an overarching point.&lt;br /&gt;&lt;br /&gt;So this post is just to help new folks orient themselves to my blog.&lt;br /&gt;&lt;br /&gt;&lt;hr /&gt;J's blog is primarily focused on developing and abiding by an &lt;a href="http://3quarksdaily.blogs.com/3quarksdaily/2006/10/the_future_of_s_1.html"&gt;Open&lt;/a&gt; &lt;a href="http://3quarksdaily.blogs.com/3quarksdaily/2006/11/the_future_of_s.html"&gt;Science&lt;/a&gt; &lt;a href="http://3quarksdaily.blogs.com/3quarksdaily/2007/01/the_future_of_s.html"&gt;system&lt;/a&gt;. Since Open Science is a relatively new idea, things are changing as I go along, and no one really has any standards yet, because Open Science people (and in particular &lt;a href="http://open-notebook-science.dabbledb.com/publish/open-notebook-science/54352d3e-b6d6-4c7b-bbbd-eaba3f1e7984/opennotebooksciencepeople.html"&gt;Open Notebook Science People&lt;/a&gt;) are still trying to figure out best practices for science in the open.&lt;br /&gt;&lt;br /&gt;That said here is my current schema:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;&lt;span style="font-style: italic;"&gt;Open Ideas&lt;/span&gt;: I try to blog all of the experimental ideas that I'm considering pursuing. The hope is that I can find other folks interested in the same things as myself, and if I don't pursue the ideas, perhaps they'll be of use to someone else. I maintain an index of these Open Ideas on this blog.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-style: italic;"&gt;Open Projects&lt;/span&gt;: The Open Ideas I decide to pursue become a chapter in &lt;a href="http://www.jeremiahfaith.com/open_notebook_science/"&gt;J's Lab Notebook&lt;/a&gt;.  Following the basic idea of &lt;a href="http://drexel-coas-elearning.blogspot.com/2006/09/open-notebook-science.html"&gt;Open Notebook Science&lt;/a&gt;, all of raw data for the projects I pursue is publically available in real time (updated nitely). The hope is that folks who might find my work useful don't have to wait two years until I publish it. I know following someone's experiments in raw form can be difficult, but similar to reading someones computer code, I think we need &lt;a href="http://blog-di-j.blogspot.com/2007/07/tips-rules-for-open-notebook-science.html"&gt;some rules or general guidelines&lt;/a&gt; to make such tasks easier. I do &lt;span style="font-style: italic;"&gt;not&lt;/span&gt; expect folks to read and follow the notebook as I go along. Rather I expect folks to stumble upon the notebook through internet searches and such. Whereupon, folks can email if they're interested in more information or clarification of anything. I maintain an index of these Open Projects on this blog.&lt;/li&gt;&lt;li&gt;&lt;span style="font-style: italic;"&gt;Open Publishing&lt;/span&gt;: After I finish projects, I typically publish them in scientific journals. In the future, I hope to publish the failed or smaller experiments to this blog or to an archive. Because the &lt;a href="http://www.doaj.org/"&gt;current set of open access journals&lt;/a&gt; doesn't yet cover the entire range of experimental and computational biology subjects, I do not publish exclusively in Open Access journals (&lt;span style="font-size:85%;"&gt;though I think Jonathan Eisen has some &lt;a href="http://phylogenomics.blogspot.com/2006/09/top10-novel-ways-to-contribute-to-open.html"&gt;interesting &lt;/a&gt;&lt;a href="http://phylogenomics.blogspot.com/2007/02/why-i-am-ashamed-to-have-paper-in.html"&gt;ideas&lt;/a&gt; on this topic, I think it's a little early to limit yourself to only open access journals unless you're already well known [which Eisen is]&lt;/span&gt; ). I maintain an index of these completed projects on this blog.&lt;/li&gt;&lt;/ol&gt;So if you're new to my blog, and you'd like to learn more. You might start by skimming the short descriptions available in the indexes: &lt;a href="http://blog-di-j.blogspot.com/2007/10/js-untested-project-ideas.html"&gt;J's Open Ideas index&lt;/a&gt;, &lt;a href="http://blog-di-j.blogspot.com/2007/10/js-projects-in-progress.html"&gt;J's Open Projects index&lt;/a&gt;, &lt;a href="http://blog-di-j.blogspot.com/2007/10/js-published-projects.html"&gt;J's Open Publishing index&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3472100433840968771-6459769373840444028?l=blog-di-j.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://blog-di-j.blogspot.com/feeds/6459769373840444028/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3472100433840968771&amp;postID=6459769373840444028' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3472100433840968771/posts/default/6459769373840444028'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3472100433840968771/posts/default/6459769373840444028'/><link rel='alternate' type='text/html' href='http://blog-di-j.blogspot.com/2007/10/how-this-blog-works.html' title='How this blog works'/><author><name>J</name><uri>http://www.blogger.com/profile/02818497841390003056</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3472100433840968771.post-1600804545067033790</id><published>2007-10-24T15:09:00.001-07:00</published><updated>2007-10-24T15:25:36.188-07:00</updated><title type='text'>J's Open Projects index</title><content type='html'>This post is an index with a one-line description of the projects I'm working on and the relevant chapter of the work in &lt;a href="http://www.jeremiahfaith.com/open_notebook_science/"&gt;J's Lab Notebook&lt;/a&gt;. Read &lt;a href="http://blog-di-j.blogspot.com/2007/10/how-this-blog-works.html"&gt;how this blog works&lt;/a&gt; for more general information.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;1) Cheaper, faster, better ChIP&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Description: &lt;/span&gt;Using statistical experimental design methods to shorten, cheapen, and optimize a Chromatin Immunoprecipitation protocol for experimentally determining transcription factor binding sites.&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Detailed description: &lt;/span&gt;&lt;span&gt;&lt;a href="http://blog-di-j.blogspot.com/2007/10/factorial-and-response-surface.html"&gt;Factorial and response surface optimization of a chromatin immunoprecipitation protocol&lt;/a&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Lab Notebook Chapter: &lt;/span&gt;&lt;span style="font-style: italic;"&gt;Towards a faster, more reliable ChIP protocol.&lt;br /&gt;&lt;/span&gt;&lt;span style="font-weight: bold;"&gt;Date Project Started: &lt;/span&gt;&lt;span style="font-style: italic;"&gt;Apr 26, 2007&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3472100433840968771-1600804545067033790?l=blog-di-j.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://blog-di-j.blogspot.com/feeds/1600804545067033790/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3472100433840968771&amp;postID=1600804545067033790' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3472100433840968771/posts/default/1600804545067033790'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3472100433840968771/posts/default/1600804545067033790'/><link rel='alternate' type='text/html' href='http://blog-di-j.blogspot.com/2007/10/js-projects-in-progress.html' title='J&apos;s Open Projects index'/><author><name>J</name><uri>http://www.blogger.com/profile/02818497841390003056</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3472100433840968771.post-1275270542025932334</id><published>2007-10-24T15:06:00.000-07:00</published><updated>2007-10-24T15:25:17.491-07:00</updated><title type='text'>J's Open Publishing index</title><content type='html'>This post is an index with a one-line description of the projects I've published. Read &lt;a href="http://blog-di-j.blogspot.com/2007/10/how-this-blog-works.html"&gt;how this blog works&lt;/a&gt; for more general information.&lt;br /&gt;&lt;br /&gt;2) &lt;a href="http://nar.oxfordjournals.org/cgi/content/full/gkm815v1"&gt;Many Microbe Microarrays Database: uniformly normalized Affymetrix compendia with structured experimental metadata&lt;/a&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Description:&lt;/span&gt; Microarray compendia for &lt;span style="font-style: italic;"&gt;E. coli&lt;/span&gt;, &lt;span style="font-style: italic;"&gt;Shewanella&lt;/span&gt;, and yeast (currently 524, 530, and 14 arrays for each of these species respectively). The arrays are normalized together to allow all arrays for each species to be analyzed as a single group.&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Blog post on this publication:&lt;/span&gt; &lt;a href="http://blog-di-j.blogspot.com/2007/10/when-will-gene-expression-data-become.html"&gt;When will gene expression data become collective knowledge?&lt;/a&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Date work was published: &lt;/span&gt;&lt;span style="font-style: italic;"&gt;Sept 18, 2007&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;1) &lt;a href="http://biology.plosjournals.org/perlserv/?request=get-document&amp;amp;doi=10.1371%2Fjournal.pbio.0050008"&gt;Large-Scale Mapping and Validation of &lt;span style="font-style: italic;"&gt;Escherichia coli&lt;/span&gt; Transcriptional Regulation from a Compendium of Expression Profiles&lt;/a&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Description:&lt;/span&gt; How well can we computationally infer regulatory interactions between transcription factors and their targets using microarray data? Predictions from several algorithms were validated using the 3500 experimentally determined interactions in RegulonDB plus and additional 300 interactions were tested with ChIP. Most importantly, once we've reliably inferred this networks, what can we do with them?&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Useful resources:&lt;/span&gt;&lt;span&gt; The microarray data used in this paper is available at &lt;a href="http://m3d.bu.edu/"&gt;M3D&lt;/a&gt;. In addition, we have supplemental site containing links to the algorithms and the RegulonDB known interactions we used in the publication.&lt;/span&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Date work was published:&lt;/span&gt;&lt;span style="font-style: italic;"&gt; January 9, 2007&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3472100433840968771-1275270542025932334?l=blog-di-j.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://blog-di-j.blogspot.com/feeds/1275270542025932334/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3472100433840968771&amp;postID=1275270542025932334' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3472100433840968771/posts/default/1275270542025932334'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3472100433840968771/posts/default/1275270542025932334'/><link rel='alternate' type='text/html' href='http://blog-di-j.blogspot.com/2007/10/js-published-projects.html' title='J&apos;s Open Publishing index'/><author><name>J</name><uri>http://www.blogger.com/profile/02818497841390003056</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3472100433840968771.post-3662845170490600853</id><published>2007-10-24T15:05:00.000-07:00</published><updated>2007-10-24T15:24:56.005-07:00</updated><title type='text'>J's Open Ideas index</title><content type='html'>This post is an index with a one-line description of the projects I'm considering working on. Read &lt;a href="http://blog-di-j.blogspot.com/2007/10/how-this-blog-works.html"&gt;how this blog works&lt;/a&gt; for more general information.&lt;br /&gt;&lt;br /&gt;3) &lt;a href="http://blog-di-j.blogspot.com/2007/09/effect-of-sequence-level-mutations-on.html"&gt;Effect of sequence level mutations on transcription, translation, and noise&lt;/a&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Description:&lt;/span&gt; A technique I think would allow us to determine the effect of millions of promoter variants on the rate of transcription and translation in single-cells.&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Date idea was blogged:&lt;/span&gt;&lt;span style="font-style: italic;"&gt; September 1, 2007&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;2)&lt;span&gt; &lt;a href="http://blog-di-j.blogspot.com/search/label/host%20microbe%20interactions"&gt;Mutations, gene passing, and the evolution of gut microbes&lt;/a&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Description: &lt;/span&gt;Can we use gnotobiotic mice to obtain experimental estimates of mutation rates and gene transfer rates in different intestinal environments (e.g. under different stresses and with different combinations of microbes).&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Date idea was blogged: &lt;/span&gt;&lt;span style="font-style: italic;"&gt;June 26, 2007&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;1) &lt;a href="http://blog-di-j.blogspot.com/2007/06/two-photo-microscopy-live-imaging-of.html"&gt;Live imaging of host-microbe interactions&lt;/a&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Description:&lt;/span&gt; Can we apply the deep-imaging 2-photon microscopy techniques, which have been developed over the past few years for imaging neuronal systems, to the imaging of the special distribution and interactions between different gut microbes and their host.&lt;span style="font-weight: bold;"&gt;&lt;br /&gt;Date idea was blogged: &lt;/span&gt;&lt;span style="font-weight: bold;"&gt; &lt;/span&gt;&lt;span style="font-style: italic;"&gt;June 12, 2007&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3472100433840968771-3662845170490600853?l=blog-di-j.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://blog-di-j.blogspot.com/feeds/3662845170490600853/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3472100433840968771&amp;postID=3662845170490600853' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3472100433840968771/posts/default/3662845170490600853'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3472100433840968771/posts/default/3662845170490600853'/><link rel='alternate' type='text/html' href='http://blog-di-j.blogspot.com/2007/10/js-untested-project-ideas.html' title='J&apos;s Open Ideas index'/><author><name>J</name><uri>http://www.blogger.com/profile/02818497841390003056</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3472100433840968771.post-7794731857088224145</id><published>2007-10-23T08:07:00.001-07:00</published><updated>2007-11-08T16:48:51.491-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='bioinformatics'/><category scheme='http://www.blogger.com/atom/ns#' term='microarrays'/><category scheme='http://www.blogger.com/atom/ns#' term='published'/><title type='text'>When will gene expression data become collective knowledge?</title><content type='html'>&lt;span style="font-style: italic; color: rgb(153, 0, 0);"&gt;Published research:&lt;/span&gt;&lt;span style="color: rgb(153, 0, 0);"&gt;&lt;span style="color: rgb(0, 0, 0);"&gt; this post describes some of my published research&lt;/span&gt;&lt;/span&gt;. The relevant publication is: &lt;span style="font-size:100%;"&gt;&lt;a href="http://nar.oxfordjournals.org/cgi/content/full/gkm815v1"&gt;Many Microbe Microarrays Database: uniformly normalized Affymetrix compendia with structured metadata&lt;/a&gt;&lt;/span&gt;, which describes the &lt;a href="http://m3d.bu.edu/"&gt;M3D database&lt;/a&gt;. This post serves as a place for folks to provide comments and suggestions for the future of the database and the future of expression data in general.&lt;br /&gt;&lt;br /&gt;&lt;hr /&gt;&lt;br /&gt;The focus of my PhD thesis has been network inference: &lt;a href="http://biology.plosjournals.org/perlserv/?request=get-document&amp;amp;doi=10.1371%2Fjournal.pbio.0050008"&gt;how can we efficiently determine regulatory networks in prokaryotes&lt;/a&gt; (i.e. which transcription factors regulate which genes). Since mRNA concentration is the only data we can easily measure for all genes simultaneously, my PhD has also be an exploration of the potential and limitations of expression data. In joining &lt;a href="http://gardnerlab.bu.edu/"&gt;Tim Gardner's&lt;/a&gt; lab at BU, I was a little reluctant to analyze microarray data, because I'd heard it was noisy and too easy to find whatever answer you were looking for. After 4+ year of working with gene expression data from microarrays, I'm convinced that it is a little noisy and that it is extremely easy to find whatever answer you're looking for. But with careful analysis, particularly regarding the statistics of large datasets and multiple hypothesis testing, microarrays hold an unparalleled wealth of knowledge about the dynamic, concerted actions of cells.&lt;br /&gt;&lt;br /&gt;Genome sequencing has provided the cell's nouns. Microarrays are providing the cell's verbs. Currently, most people use the nouns as collective knowledge while the verbs are generated and analyzed in-house. How are we ever going to understand the language of life when everyone has to generate a personal set of verbs?&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;What I mean for &lt;/span&gt;&lt;span style="font-style: italic; font-weight: bold;"&gt;genome sequencing&lt;/span&gt;&lt;span style="font-weight: bold;"&gt; is this&lt;/span&gt;: you want to knockout, clone, tag, or whatever a gene in species X. You look up the location of that gene in the genome browser for species X, download the surrounding sequence, and use your intuition or some primer design software to help you construct the chemicals (typically oligonucleotide primers) you need to experiment on your gene. Or maybe you're one of these folks that likes to take fancy trips with your science budget, so you're out in Hawaii to collect sea water for metagenomic sequencing. When you are sufficiently tanned and you head back home to sequence the seawater, the first thing you do is compare your seawater DNA sequence to all DNA sequence available for any species on the planet to see if you can find anything to help you figure out what was in your seawater.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;What I mean for &lt;/span&gt;&lt;span style="font-style: italic; font-weight: bold;"&gt;microarrays &lt;/span&gt;&lt;span style="font-weight: bold;"&gt;is this:&lt;/span&gt; you want to figure out what the cell is &lt;span style="font-style: italic;"&gt;doing&lt;/span&gt; when you apply X to it (e.g. heat shock, acid shock, DNA damage, glucose growth, etc...). You run 3 chips in a standard condition and 3 chips in a standard condition plus X. You take the two conditions run a statistical test - ttest, fold change, or FDR if you're getting fancy - to produce a list of N genes that changed expression when you did X. You write a 5 page paper where you publish the list and describe what those changes might mean.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Final summary of the microarray isolationist problem&lt;/span&gt;&lt;br /&gt;With &lt;span style="font-style: italic;"&gt;genome sequencing&lt;/span&gt; everyone is taking advantage of the wealth of collective sequencing knowledge to improve their own research.  For &lt;span style="font-style: italic;"&gt;microarrays&lt;/span&gt;, with few exceptions, knowledge is created and remains in isolation. One could make the case that microarrays are a relatively young technology and the collected knowledge has yet to accumulate, but microarrays were invented in 1995 - they're older than Google Inc.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size:100%;"&gt;&lt;span style="font-weight: bold;"&gt;How can we promote gene expression as collective knowledge?&lt;/span&gt;&lt;br /&gt;For our network inference work, it was necessary to collect as many microarrays as possible for the species of interest (&lt;span style="font-style: italic;"&gt;E. coli&lt;/span&gt;). Since Tim's lab is full of computer nerds like myself, from the start we decided to collect all of the data &lt;/span&gt;&lt;span style="font-size:100%;"&gt;in a database. Originally, the database was just a storage dump that ensured that all the folks writing network inference algorithms (i.e. Boris Hayete and myself) would use the same starting dataset. Having the microarray datadump in a standard format and standard location certainly helped our network inference efforts. In the end, Boris developed a network inference algorithm - &lt;a href="http://gardnerlab.bu.edu/clr.html"&gt;CLR&lt;/a&gt; - that currently remains the top performing algorithm on the &lt;span style="font-style: italic;"&gt;E. coli&lt;/span&gt; microarrays (&lt;span style="font-size:85%;"&gt;if you're interested in network inference and think you can top CLR - please try, &lt;a href="http://gardnerlab.bu.edu/netinfer_plos_2007/?page_id=2"&gt;here's a site to help you get started&lt;/a&gt;; and let me know how it goes!&lt;/span&gt;).&lt;/span&gt;&lt;br /&gt;&lt;span style="font-size:100%;"&gt;&lt;br /&gt;More recently, we improved our microarray database and published a piece, &lt;a href="http://nar.oxfordjournals.org/cgi/content/full/gkm815v1"&gt;Many Microbe Microarrays Database: uniformly normalized Affymetrix compendia with structured metadata&lt;/a&gt;, that I hope starts to move gene expression into the collective knowledge space. In that paper we tried to address three problems that are currently hindering the wide-s&lt;/span&gt;&lt;span style="font-size:100%;"&gt;cale adoption of microarrays:&lt;br /&gt;&lt;/span&gt;&lt;ol&gt;&lt;li&gt;the presence of platform-specific biases in expression data due to the use of many different microarray platforms in a compendium&lt;/li&gt;&lt;li&gt;the lack of a uniformly applied normalization standard for&lt;sup&gt; &lt;/sup&gt;expression datasets, even within a single expression platform. Different&lt;sup&gt; &lt;/sup&gt;software algorithms are used by different labs for preprocessing and normalizing&lt;sup&gt; &lt;/sup&gt;the raw microarray intensity values&lt;/li&gt;&lt;li&gt;the incompleteness and inconsistency&lt;sup&gt; &lt;/sup&gt;in the curation of metadata describing the details of each experimental&lt;sup&gt; &lt;/sup&gt;condition.&lt;/li&gt;&lt;/ol&gt;&lt;span style="font-size:100%;"&gt;To address point 1, we only allowed a single platform (Affymetrix) for each of the three species currently in the database (&lt;span style="font-style: italic;"&gt;E. coli&lt;/span&gt;, &lt;span style="font-style: italic;"&gt;Shewanella&lt;/span&gt;, and yeast).  To address point 2, we collected unnormalized raw&lt;/span&gt;&lt;span style="font-size:100%;"&gt; CEL files for all of the experiments and uniformly normalized them as a group with RMA. In our previous work, we found that this RMA normalization makes comparisons possible on microarrays of the same platform that are run in different laboratories (&lt;span style="font-size:85%;"&gt;see the section "Verification of array data normalization and consistency" and Figure S5  in the supplement &lt;span style="font-style: italic;"&gt;Protocol S1&lt;/span&gt; to our &lt;a href="http://biology.plosjournals.org/perlserv/?request=get-document&amp;amp;doi=10.1371%2Fjournal.pbio.0050008#toclink5"&gt;network inference paper&lt;/a&gt;&lt;/span&gt;). And to address point 3, we generated huma&lt;/span&gt;&lt;span style="font-size:100%;"&gt;n curated (and computationally validated) &lt;/span&gt;experimental metadata for each microarray&lt;sup&gt; &lt;/sup&gt;publication—converting each chemical and growth attribute&lt;sup&gt; &lt;/sup&gt;into a structured and computable set of experimental features&lt;sup&gt; &lt;/sup&gt;with consistent naming conventions and units.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size:100%;"&gt;I believe this Many Microbe Microarrays Database (M&lt;sup&gt;3D&lt;/sup&gt;) provides the essential starting point for moving towards the use of microarrays as collective knowledge: a set of curated microarray datasets that have already proven useful in a large-scale application.&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-size:100%;"&gt;&lt;span style="font-weight: bold;"&gt;If you build it, some will come&lt;/span&gt;&lt;br /&gt;&lt;a href="http://m3d.bu.edu/"&gt;M&lt;sup&gt;3D&lt;/sup&gt;&lt;/a&gt; has been online for about a year now. The site currently gets around 200 unique visitors and 3000 hits a day. But based on the emails I get from folks, the audience i&lt;/span&gt;&lt;span style="font-size:100%;"&gt;s primarily computational folks like myself that are eager to test their new algorithm on a large, well-annotated experimental dataset.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;But will the biologists ever arrive?&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-size:100%;"&gt;The wetlab biologists are the folks I would really like to begin adopting this resource. Since I do experimental work in &lt;span style="font-style: italic;"&gt;E. coli&lt;/span&gt;, whenever I want to find out what a particular gene does or what genes might regulate it, I dig around in &lt;a href="http://regulondb.ccg.unam.mx/"&gt;RegulonDB&lt;/a&gt; or &lt;a href="http://www.ecocyc.org/"&gt;EcoCyc&lt;/a&gt; to see what's known about the gene. How might expression data be incorporated into such a website to aid understanding the biology of &lt;span style="font-style: italic;"&gt;E. coli&lt;/span&gt;? For one, if RegulonDB has published evidence that lexA transcriptionally regulates recA, it would be nice to see if the collective microarray knowledge currently supports the published evidence. This type of informatio&lt;/span&gt;&lt;span style="font-size:100%;"&gt;n could be provided by including a scatterplot of the expression values of the transcription factor -vs- those of its target (see the image below generated from M3D). If you generate this plot on M3D, you can also mouseover each point to receive the details of the experiment represented by the datapoint.&lt;br /&gt;&lt;/span&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bp2.blogger.com/_ErjdhkTSmXk/RxQN9B3g15I/AAAAAAAAAAc/m7CaG9H3SlM/s1600-h/arrayPlotter.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://bp2.blogger.com/_ErjdhkTSmXk/RxQN9B3g15I/AAAAAAAAAAc/m7CaG9H3SlM/s320/arrayPlotter.png" alt="" id="BLOGGER_PHOTO_ID_5121734018351552402" border="0" /&gt;&lt;/a&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bp1.blogger.com/_ErjdhkTSmXk/RyeDPIyEEFI/AAAAAAAAACA/JfdpiBxbnfg/s1600-h/arrayPlotter.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://bp1.blogger.com/_ErjdhkTSmXk/RyeDPIyEEFI/AAAAAAAAACA/JfdpiBxbnfg/s400/arrayPlotter.png" alt="" id="BLOGGER_PHOTO_ID_5127210996863340626" border="0" /&gt;&lt;/a&gt;&lt;span style="font-size:100%;"&gt;I think the key to adoption is probably integration with the currently available resources. Folks don't want to go to yet-another-website and figure out how to work it. Because of this, I've created a way for external websites to automatically include M3D generated images on their own websites (&lt;a href="http://m3d.bu.edu/cgi-bin/web/array/index.pl?read=help#remote_pages"&gt;instructions&lt;/a&gt;).  Here are a couple examples, these are drawn on-the-fly from M3D rather than uploaded to this blog:&lt;br /&gt;&lt;br /&gt;&lt;img src="http://m3d.bu.edu/cgi-bin/web/arrayPlotter/arrayPlotter?db=E_coli_v4_Build_3&amp;amp;analysis=hist&amp;amp;allExp=1&amp;amp;width=200&amp;amp;height=200&amp;amp;method=1&amp;amp;gene_name_list=recA&amp;amp;title=test%20recA" height="200" width="200" /&gt;&lt;br /&gt;&lt;br /&gt;&lt;iframe marginheight="0" marginwidth="0" border="0" src="http://m3d.bu.edu/cgi-bin/web/arrayPlotter/arrayPlotter?db=E_coli_v4_Build_3&amp;amp;title=interactive%20plot&amp;amp;analysis=oneToMany&amp;amp;width=450&amp;amp;height=450&amp;amp;method=1&amp;amp;html=1&amp;amp;iframe=1&amp;amp;gene_name_list=lexA*recA&amp;amp;allExp=1" frameborder="0" height="450" scrolling="no" width="450"&gt;&lt;/iframe&gt;&lt;br /&gt;The problem with this approach is that if M3D goes down, the automatically generated images also fail on the remote website as well. So perhaps, remote websites would want to automatically generate and locally &lt;span style="font-style: italic;"&gt;cache&lt;/span&gt; the images.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Hey microarrays, what can you do for me?&lt;/span&gt;&lt;br /&gt;So perhaps integration with other databases will increase the awareness of microarray data, but I really want people &lt;span style="font-style: italic;"&gt;using&lt;/span&gt; the data. So the question is &lt;span style="font-style: italic;"&gt;what would folks like to do with microarray&lt;/span&gt; data? Like a sequence database, can a microarray database allow scientists to better understand their &lt;span style="font-style: italic;"&gt;own&lt;/span&gt; data? In general people don't want to just browse around NCBI, they want to BLAST their sequence to help them improve and publish their &lt;span style="font-style: italic;"&gt;own&lt;/span&gt; work.&lt;br /&gt;&lt;br /&gt;So I'd be really interested if anyone had ideas about applications that might allow more people to use the collective expression knowledge in M3D (or any other microarray database for that matter).&lt;br /&gt;&lt;br /&gt;Here are some things that are already available M3D:&lt;br /&gt;&lt;/span&gt;&lt;ul&gt;&lt;li&gt;&lt;span style="font-size:100%;"&gt;what genes changed expression in condition X (z-test)&lt;br /&gt;&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-size:100%;"&gt;what genes changed relative expression between conditions X and Y (t-test or fold-change)&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-size:100%;"&gt;do genes in particular chromosomal regions tend to change expression as a group&lt;br /&gt;&lt;/span&gt;&lt;/li&gt;&lt;/ul&gt;&lt;span style="font-size:100%;"&gt;Here are some things I'm considering adding:&lt;/span&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;span style="font-size:100%;"&gt;&lt;span style="font-style: italic;"&gt;array-blast&lt;/span&gt;: submit your raw microarray data (CEL file) and you get back a list of the most similar arrays in the database along with the condition information for those arrays&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-size:100%;"&gt;&lt;span style="font-style: italic;"&gt;expression-based-function annotation&lt;/span&gt;: across the compendium, what is the effect of each experimental feature (e.g. glucose concentration is correlated with the expression of genes X,Y,Z)&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-size:100%;"&gt;&lt;span style="font-style: italic;"&gt;array-changed-genes&lt;/span&gt;: submit your CEL file(s) and run a z-test against the entire compendium or a t-test against a particular subset of arrays in the compendium to determine the set of genes whose expression changed in your microarrays&lt;br /&gt;&lt;/span&gt;&lt;/li&gt;&lt;/ul&gt;&lt;span style="font-size:100%;"&gt;&lt;span style="font-weight: bold;"&gt;How you can help!&lt;br /&gt;&lt;/span&gt;If you have any ideas/suggestions for tools, applications, or anything else that might be done to a website like M3D to help folks use this collective expression knowledge, I'd like to hear your ideas (just leave a comment on this post).&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3472100433840968771-7794731857088224145?l=blog-di-j.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://blog-di-j.blogspot.com/feeds/7794731857088224145/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3472100433840968771&amp;postID=7794731857088224145' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3472100433840968771/posts/default/7794731857088224145'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3472100433840968771/posts/default/7794731857088224145'/><link rel='alternate' type='text/html' href='http://blog-di-j.blogspot.com/2007/10/when-will-gene-expression-data-become.html' title='When will gene expression data become collective knowledge?'/><author><name>J</name><uri>http://www.blogger.com/profile/02818497841390003056</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://bp2.blogger.com/_ErjdhkTSmXk/RxQN9B3g15I/AAAAAAAAAAc/m7CaG9H3SlM/s72-c/arrayPlotter.png' height='72' width='72'/><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3472100433840968771.post-1481685541432373288</id><published>2007-10-19T14:31:00.000-07:00</published><updated>2007-10-19T21:44:25.543-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='ChIP'/><category scheme='http://www.blogger.com/atom/ns#' term='in progress'/><category scheme='http://www.blogger.com/atom/ns#' term='response surface methods'/><category scheme='http://www.blogger.com/atom/ns#' term='experimental design'/><category scheme='http://www.blogger.com/atom/ns#' term='fractional factorial'/><title type='text'>Factorial and response surface optimization of a chromatin immunoprecipitation protocol</title><content type='html'>&lt;span style="font-style: italic; color: rgb(153, 0, 0);"&gt;Research in progress:&lt;/span&gt;&lt;span style="color: rgb(153, 0, 0);"&gt;&lt;span style="color: rgb(0, 0, 0);"&gt; this post describes some of my ongoing research&lt;/span&gt;&lt;/span&gt;. The raw data and all experimental details are updated daily in &lt;a href="http://www.jeremiahfaith.com/open_notebook_science/"&gt;J's Lab Notebook&lt;/a&gt; in the chapter entitled: &lt;span style="font-style: italic;"&gt;Towards a faster, more reliable ChIP protocol.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;The following text is largely taken from my PhD Oral Qualifier. I tried to blogify it a little, but it is still a little formal for a blog post. I also don't have many citations. Appropriate citations will be in the published version if I complete this project (if you have opinions about how we should deal with citations in very preliminary results please post a comment - I'd like to hear your opinion).&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Short Version of What I'm trying to do:&lt;/span&gt; Chromatin Precipitation (ChIP) is often used to experimentally verify or discover transcription factor binding sites. In my experience, ChIP is  lengthy, costly, and noisy. I'm trying to use statistical experimental design techniques to shorten, cheapen, and reduce the noise of the ChIP procedure. I'd really like it if ChIP were simple enough to become a standard technique that all experimentalists learn (i.e. like a miniprep and PCR), so we can really start to determine the transcriptional regulatory network structure of many organisms.&lt;br /&gt;&lt;br /&gt;In general, I think there is a lot of unnessary &lt;span style="font-style: italic;"&gt;folklore&lt;/span&gt; in our experimental procedures, and the methods I'm applying here would be applicable to almost any experimental protocol optimization - if broadly applied, experimental biology would be a much less time-consuming endeavor.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Longer Version of What I'm trying to do:&lt;/span&gt;&lt;br /&gt;We plan to optimize and shorten the chromatin immunoprecipitation (ChIP) protocol for &lt;em&gt;in vivo&lt;/em&gt; validation of transcription factor targets. Verifying a transcription factor's genomic binding regions with ChIP requires: 1) fixing the transcription factor to the regions of the genome it binds via a crosslinking agent like formaldehyde, 2) cell lysis, 3) chromatin shearing (to enable isolation of only small regions of DNA bound by the transcription factor), and 4) multiple washes to remove background noise &lt;span style="text-decoration: underline;"&gt;[&lt;span style="color: rgb(153, 0, 0);"&gt;1&lt;/span&gt;]&lt;/span&gt;. Once the ChIP procedure is complete, the DNA bound by the transcription factor should be enriched relative to unbound DNA. This enrichment can be assayed by qPCR, microarray, or DNA sequencing (less common), providing confirmation of the transcription factor bindings sites (and therefore presumably, the gene targets of the transcription factors).&lt;br /&gt;&lt;br /&gt;ChIP is used by numerous labs across many model organisms, yet the ChIP protocol is anything but standardized; ChIP protocols are as numerous as the number of investigators using the technique, suggesting that we are far from an optimal protocol. The ChIP protocol we previously used to validate network inference targets in &lt;em&gt;E. coli&lt;/em&gt; &lt;a href="file:///Users/faith/docs/techReports/J_prospectus/main.html#JJ:2007uq" name="CITEJJ:2007uq"&gt;&lt;/a&gt;&lt;span style="text-decoration: underline;"&gt;[&lt;span style="color: rgb(153, 0, 0);"&gt;2&lt;/span&gt;]&lt;/span&gt; required almost a week of long experimental days to go from cells to verified transcription factor targets. Because of this length, the procedure is error-prone and only tractable to the most experienced bench scientists. We aim to use modern statistical methods of experimental design to optimize the ChIP protocol [&lt;span style="color: rgb(153, 0, 0);"&gt;3&lt;/span&gt;]. In particular, we will use fractional factorial designs to screen for unnecessary steps that can be removed to shorten the protocol. In addition, we will optimize the protocol steps that have the most significant influence on the enrichment of known transcription factor targets to improve the signal to noise ratio of the ChIP procedure.&lt;br /&gt;&lt;br /&gt;&lt;div class="p"&gt;&lt;!----&gt;&lt;/div&gt;  &lt;b&gt;Successful completion  &lt;/b&gt; of this work will result in a markedly shorter and more effective ChIP protocol for verifying transcription factor targets. The new protocol will make verification of transcription factor binding sites approachable and practical to a wider range of bench scientists, promoting the experimental validation of future network inference predictions. In addition, the knowledge gained by an in-depth analysis of the ChIP technique will help optimize the protocol for different tasks such as highly parallel sequencing of ChIP DNA for transcription factor target discovery. Finally, the ChIP protocol optimization highlights the untapped experimenter efficiency potential these statistical methods could unleash on molecular biology if these experimental design techniques were broadly applied to experimental protocols.&lt;br /&gt;&lt;h3&gt;&lt;br /&gt;&lt;/h3&gt;&lt;h3&gt;Background:&lt;/h3&gt; Most experimental protocols can be represented mathematically as&lt;span style="font-style: italic;"&gt; y&lt;/span&gt; = f(&lt;span style="font-family:symbol;"&gt;q&lt;/span&gt;) where &lt;span style="font-style: italic;"&gt;y&lt;/span&gt; is the product resulting from the protocol and &lt;span style="font-family:symbol;"&gt;q&lt;/span&gt; are the parameters of the protocol. In a PCR experiment for example, &lt;span style="font-style: italic;"&gt;y&lt;/span&gt; would represent the yield of DNA (e.g. in micrograms), while &lt;span style="font-family:symbol;"&gt;q&lt;/span&gt; represents the parameters of the reaction (e.g. concentrations of template, primers, magnesium chloride, etc...). The statistics of experimental design contains numerous methods to expedite the empirical optimization of &lt;span style="font-style: italic;"&gt;y&lt;/span&gt; through the intelligent exploration of &lt;span style="font-family:symbol;"&gt;q&lt;/span&gt; (for two excellent books on experimental design see &lt;span style="text-decoration: underline;"&gt;[&lt;span style="color: rgb(153, 0, 0);"&gt;3,4&lt;/span&gt;&lt;/span&gt;&lt;a href="file:///Users/faith/docs/techReports/J_prospectus/main.html#box87" name="CITEbox87"&gt;&lt;/a&gt;]).&lt;br /&gt;&lt;br /&gt;&lt;div class="p"&gt;&lt;!----&gt;&lt;/div&gt;  &lt;b&gt;Fractional factorial methods. &lt;/b&gt;     For each experimental protocol, there are thousands of parameters, (&lt;span style="font-family:symbol;"&gt;q&lt;/span&gt;), whose values could be altered in an infinite number of combinations to potentially optimize the protocol output (&lt;span style="font-style: italic;"&gt;y&lt;/span&gt;). For example, with PCR we could alter the melting temperature, the duration at the melting temperature, the amount of each primer, and the variant of Taq. On another level, changing the tubes, pipettes, the PCR machine, and the experimenter could also lead to changes in the output, &lt;span style="font-style: italic;"&gt;y&lt;/span&gt;, of our PCR reaction. The first step in experimental design is to identify the parameters that contribute most to the output, so that they can be further optimized.&lt;br /&gt;&lt;br /&gt;&lt;div class="p"&gt;&lt;!----&gt;&lt;/div&gt; Fractional factorial methods provide an efficient way to screen these parameters (&lt;span style="font-size:85%;"&gt;Note: parameters are termed &lt;em&gt;factors&lt;/em&gt; in experimental design&lt;/span&gt;)&lt;a href="file:///Users/faith/docs/techReports/J_prospectus/main.html#tthFtNtAAB" name="tthFrefAAB"&gt;&lt;/a&gt;. Traditional factor screening methods take a one-at-a-time approach. For example to optimize a PCR protocol, you might try the reaction with and without DMSO, with various concentrations of magnesium chloride, or with different annealing temperatures. Reliable determination of the effect of each of these factors (&lt;span style="font-family:symbol;"&gt;q&lt;/span&gt;&lt;sub&gt;i&lt;/sub&gt;) on the PCR output (&lt;span style="font-style: italic;"&gt;y&lt;/span&gt;) requires several replicates for each tested factor level. Because of this replication, a large number of experiments is required to test a small number of factors with a one-at-a-time approach. Fractional factorial methods screen many factors &lt;em&gt;at the same time&lt;/em&gt; and remove the need for time-consuming and expensive replication. An example fractional factorial design for optimizing a PCR protocol might look like:&lt;br /&gt;&lt;br /&gt;&lt;div class="p"&gt;&lt;!----&gt;&lt;/div&gt;   &lt;table style="width: 445px; height: 182px;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;&lt;span style="font-size:85%;"&gt;&lt;b&gt;annealing temp&lt;/b&gt;&lt;/span&gt; &lt;/td&gt;&lt;td&gt;&lt;span style="font-size:85%;"&gt;&lt;b&gt;primer concentration&lt;/b&gt;&lt;/span&gt; &lt;/td&gt;&lt;td&gt;&lt;span style="font-size:85%;"&gt;&lt;b&gt;hot start&lt;/b&gt;&lt;/span&gt; &lt;/td&gt;&lt;td&gt;&lt;span style="font-size:85%;"&gt;&lt;b&gt;extension time&lt;/b&gt;&lt;/span&gt; &lt;/td&gt;&lt;/tr&gt; &lt;tr&gt;&lt;td&gt;&lt;span style="font-size:85%;"&gt;56C &lt;/span&gt;&lt;/td&gt;&lt;td&gt;&lt;span style="font-size:85%;"&gt;150 nM &lt;/span&gt;&lt;/td&gt;&lt;td&gt;&lt;span style="font-size:85%;"&gt;no &lt;/span&gt;&lt;/td&gt;&lt;td&gt;&lt;span style="font-size:85%;"&gt;30 seconds &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt; &lt;tr&gt;&lt;td&gt;&lt;span style="font-size:85%;"&gt;62C &lt;/span&gt;&lt;/td&gt;&lt;td&gt;&lt;span style="font-size:85%;"&gt;150 nM &lt;/span&gt;&lt;/td&gt;&lt;td&gt;&lt;span style="font-size:85%;"&gt;no &lt;/span&gt;&lt;/td&gt;&lt;td&gt;&lt;span style="font-size:85%;"&gt;90 seconds &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt; &lt;tr&gt;&lt;td&gt;&lt;span style="font-size:85%;"&gt;56C &lt;/span&gt;&lt;/td&gt;&lt;td&gt;&lt;span style="font-size:85%;"&gt;600 nM &lt;/span&gt;&lt;/td&gt;&lt;td&gt;&lt;span style="font-size:85%;"&gt;no &lt;/span&gt;&lt;/td&gt;&lt;td&gt;&lt;span style="font-size:85%;"&gt;90 seconds &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt; &lt;tr&gt;&lt;td&gt;&lt;span style="font-size:85%;"&gt;62C &lt;/span&gt;&lt;/td&gt;&lt;td&gt;&lt;span style="font-size:85%;"&gt;600 nM &lt;/span&gt;&lt;/td&gt;&lt;td&gt;&lt;span style="font-size:85%;"&gt;no &lt;/span&gt;&lt;/td&gt;&lt;td&gt;&lt;span style="font-size:85%;"&gt;30 seconds &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt; &lt;tr&gt;&lt;td&gt;&lt;span style="font-size:85%;"&gt;56C &lt;/span&gt;&lt;/td&gt;&lt;td&gt;&lt;span style="font-size:85%;"&gt;150 nM &lt;/span&gt;&lt;/td&gt;&lt;td&gt;&lt;span style="font-size:85%;"&gt;yes &lt;/span&gt;&lt;/td&gt;&lt;td&gt;&lt;span style="font-size:85%;"&gt;90 seconds &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt; &lt;tr&gt;&lt;td&gt;&lt;span style="font-size:85%;"&gt;62C &lt;/span&gt;&lt;/td&gt;&lt;td&gt;&lt;span style="font-size:85%;"&gt;150 nM &lt;/span&gt;&lt;/td&gt;&lt;td&gt;&lt;span style="font-size:85%;"&gt;yes &lt;/span&gt;&lt;/td&gt;&lt;td&gt;&lt;span style="font-size:85%;"&gt;30 seconds &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt; &lt;tr&gt;&lt;td&gt;&lt;span style="font-size:85%;"&gt;56C &lt;/span&gt;&lt;/td&gt;&lt;td&gt;&lt;span style="font-size:85%;"&gt;600 nM &lt;/span&gt;&lt;/td&gt;&lt;td&gt;&lt;span style="font-size:85%;"&gt;yes &lt;/span&gt;&lt;/td&gt;&lt;td&gt;&lt;span style="font-size:85%;"&gt;30 seconds &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt; &lt;tr&gt;&lt;td&gt;&lt;span style="font-size:85%;"&gt;62C &lt;/span&gt;&lt;/td&gt;&lt;td&gt;&lt;span style="font-size:85%;"&gt;600 nM &lt;/span&gt;&lt;/td&gt;&lt;td&gt;&lt;span style="font-size:85%;"&gt;yes &lt;/span&gt;&lt;/td&gt;&lt;td&gt;&lt;span style="font-size:85%;"&gt;90 seconds&lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;br /&gt;For efficiently reasons, factors in factorial designs are typically only sampled at two states. Experimenter intuition plays a role in these designs via the selection of the initial set of factors to screen and in the selection of the values of the two states to test for each factor. &lt;div class="p"&gt;&lt;!----&gt;&lt;/div&gt;&lt;br /&gt;The result of a fractional factorial can be represented in a table listing the effect size and p-value for each tested factor. For example, an analysis of our qPCR fraction factorial data might yield :&lt;br /&gt;&lt;div class="p"&gt;&lt;!----&gt;&lt;/div&gt;  &lt;table&gt; &lt;tbody&gt;&lt;tr&gt;&lt;td&gt;&lt;b&gt;factor&lt;/b&gt; &lt;/td&gt;&lt;td&gt;&lt;b&gt;effect (change in &lt;span style="font-family:symbol;"&gt;m&lt;/span&gt;g)&lt;/b&gt; &lt;/td&gt;&lt;td&gt;&lt;b&gt;p-value&lt;/b&gt; &lt;/td&gt;&lt;/tr&gt; &lt;tr&gt;&lt;td&gt;annealing temp &lt;/td&gt;&lt;td&gt;27 &lt;/td&gt;&lt;td&gt;0.001 &lt;/td&gt;&lt;/tr&gt; &lt;tr&gt;&lt;td&gt;primer concentration &lt;/td&gt;&lt;td&gt;-1 &lt;/td&gt;&lt;td&gt;0.6 &lt;/td&gt;&lt;/tr&gt; &lt;tr&gt;&lt;td&gt;hot start &lt;/td&gt;&lt;td&gt;2 &lt;/td&gt;&lt;td&gt;0.5 &lt;/td&gt;&lt;/tr&gt; &lt;tr&gt;&lt;td&gt;extension time &lt;/td&gt;&lt;td&gt;10 &lt;/td&gt;&lt;td&gt;0.05 &lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;br /&gt;From result in the table above, the experimenter might decide to focus their efforts on further optimization of the annealing temperature to increase the PCR yield, rather than on the three other tested factors that had little effect on our qPCR output.&lt;br /&gt;&lt;br /&gt;&lt;div class="p"&gt;&lt;!----&gt;&lt;/div&gt;  &lt;b&gt;Response surface methods.  &lt;/b&gt;      In a localized region, our function of interest &lt;span style="font-style: italic;"&gt;y&lt;/span&gt; = f(&lt;span style="font-family:symbol;"&gt;q&lt;/span&gt;) can be fit using first (linear) and second order models. Fitting these models allows us to obtain a prediction of the parameter landscape of our function. Response surface methods use these models to estimate the most efficient path to the peak of the model (i.e. the maximum value of &lt;span style="font-style: italic;"&gt;y&lt;/span&gt;). It is at this peak where our experimental protocol is optimized (or at least locally optimal). Response surface methods are relatively time consuming, so fractional factorial methods are typically used to screen for factors to be later optimized by response surface methods.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Research Plan: &lt;/h3&gt; For our ChIP protocol, we want to optimize the enrichment, &lt;span style="font-style: italic;"&gt;y&lt;/span&gt;, of DNA bound to our transcription factor of interest. At the same time, we want to shorten the protocol as much as possible, so that the laborious protocol becomes more manageable. For this study, we will calculate &lt;span style="font-style: italic;"&gt;y&lt;/span&gt; as the change in enrichment of genes known to be bound by our transcription factor relative to the enrichment for randomly chosen genes (which are presumably not bound by our transcription factor). We calculate this relative enrichment from qPCR data. For each known target gene and random target gene, we first calculate their enrichment from an immunoprecipitation reaction with and without antibody as N = log((1+E&lt;sub&gt;i&lt;/sub&gt;)&lt;sup&gt;C&lt;sub&gt;i&lt;/sub&gt;+U&lt;sub&gt;i&lt;/sub&gt;&lt;/sup&gt;), where E&lt;sub&gt;i&lt;/sub&gt; is the median efficiency of the PCR primers for gene i, C&lt;sub&gt;i&lt;/sub&gt; is the qPCR Ct value for the DNA enriched using correct antibody for the transcription factor regulating gene i, and U&lt;sub&gt;i&lt;/sub&gt; is the qPCR Ct value for the DNA enriched without using an antibody for the transcription factor regulating gene i. We then calculate the increase in enrichment of our known targets relative to the random targets as &lt;span style="font-style: italic;"&gt;y&lt;/span&gt; = mean(N&lt;sub&gt;k&lt;/sub&gt;) &lt;span style="font-family:symbol;"&gt;-&lt;/span&gt; mean(N&lt;sub&gt;r&lt;/sub&gt;) where N&lt;sub&gt;k&lt;/sub&gt; is the ChIP enrichment for the known targets and N&lt;sub&gt;r&lt;/sub&gt; is the ChIP enrichment for our random targets. Our goal is to maximize the value of &lt;span style="font-style: italic;"&gt;y&lt;/span&gt; in the most directed manner possible using statistical methods coupled with intuition rather than simply intuition alone.&lt;br /&gt;&lt;br /&gt;&lt;div class="p"&gt;&lt;!----&gt;&lt;/div&gt;We will initially use fractional factorial methods to screen a large number of factors of potential importance to the ChIP protocol. For tested factors that are not found to be significant, we will select the factor state that requires the shortest time. For example if a 10 min incubation and a 2 hr incubation produce insignificant changes in &lt;span style="font-style: italic;"&gt;y&lt;/span&gt;, we can save 1 hr 50 min by using a 10 min incubation. Factors found to be significant in the fractional factorial screen will be optimized using response surface methods.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Preliminary Results:&lt;/h3&gt;&lt;span style="font-size:85%;"&gt;Note: these should be taken with caution, since I've not written the paper yet and haven't really sat down to analyze all of the results in detail yet.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;em&gt;We will use fractional factorial experimental designs to screen for unnecessary steps and factors that can be removed or shortened in the ChIP procedure.&lt;br /&gt;&lt;br /&gt;&lt;/em&gt;  &lt;div class="p"&gt;&lt;!----&gt;&lt;/div&gt;Thus far, we have screened twenty-three factors in the ChIP protocol. By choosing the fastest and cheapest alternatives for factors that did not significantly alter the enrichment of known targets relative to random targets (&lt;span style="font-style: italic;"&gt;y&lt;/span&gt; = mean(N&lt;sub&gt;k&lt;/sub&gt;) &lt;span style="font-family:symbol;"&gt;-&lt;/span&gt; mean(N&lt;sub&gt;r&lt;/sub&gt;)), we were able to reduce the cost of the protocol by three-quarters and to cut the total procedure time in half (from 5 work days to 2.5). The four most significant factors were formaldehyde concentration, shearing time, antibody concentration, and bead concentration.&lt;br /&gt;&lt;br /&gt;&lt;em&gt;Factors that have a significant influence on the enrichment of known transcription factor targets will be optimized using response surface methods.&lt;/em&gt;  &lt;div class="p"&gt;&lt;!----&gt;&lt;/div&gt;&lt;br /&gt;We plan to optimize all four of the most significant factors in the ChIP protocol. As an initial step, we focused on the optimization of the antibody and bead concentrations. We assume that values of these parameters taken in a local area will result in smooth changes in &lt;span style="font-style: italic;"&gt;y&lt;/span&gt; that can be modeled with first and second order models (Figure 1).  We can then use these models to efficiently direct us towards the optimal values of our bead and antibody concentrations.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bp2.blogger.com/_ErjdhkTSmXk/RxkrZx3g16I/AAAAAAAAABk/TCY5Yvz8BVs/s1600-h/responseSurface.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://bp2.blogger.com/_ErjdhkTSmXk/RxkrZx3g16I/AAAAAAAAABk/TCY5Yvz8BVs/s320/responseSurface.png" alt="" id="BLOGGER_PHOTO_ID_5123173772993550242" border="0" /&gt;&lt;/a&gt;&lt;span style="color: rgb(51, 51, 51);font-size:85%;" &gt;&lt;span style="font-weight: bold;"&gt;Figure 1&lt;/span&gt;. A hypothetical response surface describing the enrichment of our ChIP procedure as a function of the antibody and bead concentrations. By sequential experimentation and model refinement, response surface methods can locally define this surface and efficiently lead to local optima of the parameters to maximize ChIP enrichment.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;From the fractional factorial screening experiments above, we have already obtained four initial points in our surface for bead and antibody concentration (i.e. LA+LB, LA+HB, HA+LB, HA+HB where L = low, H = high, A = antibody concentration, and B = bead concentration). Unfortunately, we do not yet know the surface, so we can't know where our points lie on the surface. However, we can fit a plane using the data for these four combinations of antibody and bead concentration (e.g. P = a&lt;sub&gt;0&lt;/sub&gt; + a&lt;sub&gt;1&lt;/sub&gt;x&lt;sub&gt;1&lt;/sub&gt; + a&lt;sub&gt;2&lt;/sub&gt;x&lt;sub&gt;2&lt;/sub&gt;, where x&lt;sub&gt;1&lt;/sub&gt; and x&lt;sub&gt;2&lt;/sub&gt; are the concentrations of antibody and bead respectively and a&lt;sub&gt;i&lt;/sub&gt; are the regression coefficients). If we assume that the local area around our points is a linear plane, we can use the a&lt;sub&gt;i&lt;/sub&gt; coefficients to estimate the direction of steepest ascent. For instance in our hypothetical example in Figure 1&lt;span style="color: rgb(153, 0, 0);"&gt;&lt;/span&gt;, our four combinations might land us for example in the cyan region. A plane fit through these points can then be traversed in the direction of steepest assent to efficiently direct our future parameter value selections towards the red peak. &lt;div class="p"&gt;&lt;!----&gt;&lt;/div&gt;&lt;br /&gt;We fit such a plane to our bead and antibody concentration factorial data, and we choose new concentrations of these two factors along the direction of steepest ascent. These new concentrations led to a marked increase in the enrichment of our ChIP procedure (Figure 2a). It appeared that we had not yet reached saturation, so we tried an additional set of points further along the path of steepest ascent (Figure 2b). Although these new datapoints indicated that we might be close to the saturation point for these bead and antibody concentrations, the method of steepest ascent has pushed us into an expensive optimum, with almost three times the commonly used amount of beads for the ChIP procedure. We hypothesized that the amount of crosslinked-DNA was saturating our bead and antibody at low concentration - necessitating the use of large amounts of bead and antibody. &lt;div class="p"&gt;&lt;!----&gt;&lt;/div&gt;  &lt;div class="p"&gt;&lt;!----&gt;&lt;/div&gt; &lt;a name="tth_fIg9"&gt; &lt;/a&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bp3.blogger.com/_ErjdhkTSmXk/RxksKB3g17I/AAAAAAAAABs/GYXQX1ox-fM/s1600-h/steepestAscent.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://bp3.blogger.com/_ErjdhkTSmXk/RxksKB3g17I/AAAAAAAAABs/GYXQX1ox-fM/s400/steepestAscent.png" alt="" id="BLOGGER_PHOTO_ID_5123174601922238386" border="0" /&gt;&lt;/a&gt;&lt;span style="color: rgb(51, 51, 51);font-size:85%;" &gt;&lt;span style="font-weight: bold;"&gt;Figure 2: &lt;/span&gt;(A) Antibody and bead concentrations were optimized using the direction of steepest ascent determined by a linear model. (B) Further concentrations were tested to determine if we had reached a saturation point.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;To test this saturation hypothesis, we performed a factorial design using bead concentration, antibody concentration, and sheared chromatin concentration as factors. By using one-forth of the typical DNA concentration, we were able to obtain to improve our enrichment procedure using lower amounts of bead and antibody (Figure &lt;span style="text-decoration: underline;"&gt;3&lt;/span&gt;). With this lower concentration of DNA, we should be able to estimate more cost-effective optima for the bead and antibody concentrations.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bp0.blogger.com/_ErjdhkTSmXk/RxksjR3g18I/AAAAAAAAAB0/2EkCvJ3_X9k/s1600-h/7_11_07factorial4results.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://bp0.blogger.com/_ErjdhkTSmXk/RxksjR3g18I/AAAAAAAAAB0/2EkCvJ3_X9k/s400/7_11_07factorial4results.png" alt="" id="BLOGGER_PHOTO_ID_5123175035713935298" border="0" /&gt;&lt;/a&gt;&lt;span style="color: rgb(51, 51, 51);font-size:85%;" &gt;&lt;span style="font-weight: bold;"&gt;Figure 3: &lt;/span&gt;A factorial design was run using bead concentration, antibody concentration, and crosslinked chromatin concentration. For visualization purposes, the values for low chromatin concentration are plotted to the left of those with high chromatin (shifting them slightly along the x-axis), even though both experiments used the same concentration of bead. By using less crosslinked chromatin, we obtain larger enrichment using the standard concentrations of bead and antibody. The results suggest that at the standard values for these concentrations, the beads and antibody are saturated with chromatin.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;References&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;[1] Tong Ihn Lee, Sarah E Johnstone, and Richard A Young.  &lt;a href="http://www.ncbi.nlm.nih.gov/sites/entrez?Db=pubmed&amp;amp;Cmd=ShowDetailView&amp;amp;TermToSearch=17406303"&gt;Chromatin immunoprecipitation and microarray-based analysis of   protein location&lt;/a&gt;.  &lt;em&gt;Nat Protoc&lt;/em&gt;, 1(2):729-748, 2006.&lt;br /&gt;[2] Faith JJ, Hayete B, Thaden JT, Mogno I, Wierzbowski J, Cottarel G, Kasif S,   Collins JJ, and Gardner TS.  &lt;a href="http://biology.plosjournals.org/perlserv/?request=get-document&amp;amp;doi=10.1371%2Fjournal.pbio.0050008"&gt;Large-scale mapping and validation of escherichia coli   transcriptional regulation from a compendium of expression profiles&lt;/a&gt;.  &lt;em&gt;PLoS Biol&lt;/em&gt;, 5(1):e8, 2007.&lt;br /&gt;[3] GEP Box, Hunter JS, and Hunter WG.  &lt;a href="http://www.amazon.com/Statistics-Experimenters-Design-Innovation-Discovery/dp/0471718130/ref=pd_bbs_sr_1/103-4960295-8543004?ie=UTF8&amp;amp;s=books&amp;amp;qid=1192832316&amp;amp;sr=8-1"&gt;&lt;em&gt;Statistics for experimenters&lt;/em&gt;&lt;/a&gt;.  Wiley-Interscience, 2nd edition, 2005.&lt;br /&gt;[4]  Box G and Draper N.  &lt;em&gt;&lt;a href="http://www.amazon.com/Empirical-Model-Building-Response-Probability-Statistics/dp/0471810339/ref=sr_1_1/103-4960295-8543004?ie=UTF8&amp;amp;s=books&amp;amp;qid=1192832359&amp;amp;sr=1-1"&gt;Empirical Model-Building and Response Surfaces&lt;/a&gt;.&lt;/em&gt;  John Wiley and Sons, 1987.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3472100433840968771-1481685541432373288?l=blog-di-j.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://blog-di-j.blogspot.com/feeds/1481685541432373288/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3472100433840968771&amp;postID=1481685541432373288' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3472100433840968771/posts/default/1481685541432373288'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3472100433840968771/posts/default/1481685541432373288'/><link rel='alternate' type='text/html' href='http://blog-di-j.blogspot.com/2007/10/factorial-and-response-surface.html' title='Factorial and response surface optimization of a chromatin immunoprecipitation protocol'/><author><name>J</name><uri>http://www.blogger.com/profile/02818497841390003056</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://bp2.blogger.com/_ErjdhkTSmXk/RxkrZx3g16I/AAAAAAAAABk/TCY5Yvz8BVs/s72-c/responseSurface.png' height='72' width='72'/><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3472100433840968771.post-584145932117424492</id><published>2007-09-23T10:09:00.000-07:00</published><updated>2007-10-02T07:31:15.518-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='scientific ethics'/><category scheme='http://www.blogger.com/atom/ns#' term='peer review'/><category scheme='http://www.blogger.com/atom/ns#' term='scientific publishing'/><title type='text'>Center for Contributory Science</title><content type='html'>This article describes the &lt;span style="font-style: italic;"&gt;Center for Contributory Science&lt;/span&gt; (CFCS), an imaginary journal, which I envision as the &lt;span style="font-style: italic;"&gt;next generation&lt;/span&gt; scientific literature.  See my previous post for the &lt;a href="http://blog-di-j.blogspot.com/2007/09/towards-richer-scientific-literature.html"&gt;motivation for a next generation scientific literature&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size:180%;"&gt;&lt;span style="font-weight: bold;"&gt;The CFCS submission process&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;h2 style="font-weight: normal; font-style: italic;"&gt;&lt;span style="font-size:130%;"&gt; Submitting a paper&lt;/span&gt;&lt;/h2&gt;Scientists are encouraged to submit rigorous scientific research for publication in the journal. Choose the subject category appropriate for your paper. Based on your chosen categories, an editor with appropriate expertise will be randomly assigned to your paper.&lt;br /&gt;&lt;h3 style="font-weight: normal; font-style: italic;"&gt;&lt;span style="font-size:130%;"&gt; Authors must first review / contribute&lt;/span&gt;&lt;/h3&gt;All authors must be registered CFCS users.  Before manuscript submission is finalized &lt;i&gt;every&lt;/i&gt; author on your manuscript (including the corresponding author) must do one of the following:&lt;br /&gt;&lt;ul&gt;&lt;li&gt; If there are any papers in &lt;span style="font-style: italic;"&gt;Limbo&lt;/span&gt; in a category the author feels qualified in, the author must review a paper. Authors on the same manuscript submission can't review the same paper. (Papers are presented oldest to newest to prevent any one paper from remaining in &lt;span style="font-style: italic;"&gt;Limbo &lt;/span&gt;for too long.  See the section below, "the status of a manuscript", for details on &lt;span style="font-style: italic;"&gt;Limbo&lt;/span&gt;) &lt;/li&gt;&lt;li&gt; If an author claims they are not qualified for any of the papers in their qualified category, the papers are placed in the author's public "not qualified for" list along with an optional comment by the author (this public acknowledgment prevents people from always claiming they aren't qualified to review papers). &lt;/li&gt;&lt;li&gt; If the author does not feel qualified to review any of the available manuscripts in &lt;span style="font-style: italic;"&gt;Limbo&lt;/span&gt;, the author must score any 3 papers in &lt;span style="font-style: italic;"&gt;Purgatory &lt;/span&gt;or &lt;span style="font-style: italic;"&gt;Heaven &lt;/span&gt;with a thumbs up or down and a corresponding comment to each score (see below for definitions of &lt;span style="font-style: italic;"&gt;Purgatory &lt;/span&gt;and &lt;span style="font-style: italic;"&gt;Heaven&lt;/span&gt;). &lt;/li&gt;&lt;/ul&gt;The above features of CFCS aim to ensure  &lt;ol&gt;&lt;li&gt; there are at least as many reviewers as there are papers (and most likely many more) &lt;/li&gt;&lt;li&gt; authors along for the ride at least have to contribute to the review process &lt;/li&gt;&lt;li&gt;professors can't get out of reviews (and get credit for reviews) by sending work to their students &lt;ol&gt;&lt;li&gt; students get credit for their review work getting their name out early &lt;/li&gt;&lt;/ol&gt; &lt;/li&gt;&lt;li&gt; if you want to submit 100 papers in a year, you must be willing to review 100 as well &lt;/li&gt;&lt;/ol&gt;For details about how the review process works at CFCS, please see the section "The CFCS Reviewer Process" below.&lt;br /&gt;&lt;h3 style="font-style: italic; font-weight: normal;"&gt;&lt;span style="font-size:130%;"&gt;Authors decide a direction for their manuscript&lt;/span&gt;&lt;/h3&gt;Upon completion of the review/contribution requirement by all of the manuscript's authors, the manuscript submission will be finalized. Authors may then send their submitted manuscript on &lt;span style="font-style: italic;"&gt;Purgatory&lt;/span&gt; track or on &lt;span style="font-style: italic;"&gt;Heaven &lt;/span&gt;track (see below for details on &lt;span style="font-style: italic;"&gt;Purgatory&lt;/span&gt; and &lt;span style="font-style: italic;"&gt;Heaven&lt;/span&gt;).&lt;br /&gt;&lt;br /&gt;&lt;h3 style="font-weight: normal; font-style: italic;"&gt;&lt;span style="font-size:130%;"&gt; Editors decide paper status&lt;/span&gt;&lt;/h3&gt;Editors decide if a &lt;span style="font-style: italic;"&gt;Purgatory &lt;/span&gt;track paper goes to &lt;span style="font-style: italic;"&gt;Purgatory &lt;/span&gt;or if a &lt;span style="font-style: italic;"&gt;Heaven &lt;/span&gt;track paper goes to &lt;span style="font-style: italic;"&gt;Limbo&lt;/span&gt;. This editorial step is simply to weed out complete rubbish before it goes to review. Almost every manuscript should pass this minor screening.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size:180%;"&gt;&lt;span style="font-weight: bold;"&gt;The CFCS reviewer process&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;Reviewing a paper for CFCS works in a similar manner to most contemporary journals. However, reviews are not anonymous and are publicly visible with the manuscript upon submission. Reviewers do not have to be authors. Any user can do a review to get a credit, so they can later submit a manuscript without having to review. Ideally in the CFCS system, few if any reviewers must be &lt;span style="font-style: italic;"&gt;asked &lt;/span&gt;to review a manuscript by the editor.&lt;br /&gt;&lt;br /&gt;In general, reviewers choose the manuscripts they want to review from the set of all manuscripts in &lt;span style="font-style: italic;"&gt;Limbo &lt;/span&gt;they feel qualified to review.  Each manuscript in &lt;span style="font-style: italic;"&gt;Limbo &lt;/span&gt;requires four separate reviews. Upon receiving the authors' revised manuscript and response to the reviewer comments, each reviewer places a vote to decide if a manuscript belongs in &lt;span style="font-style: italic;"&gt;Heaven &lt;/span&gt;or &lt;span style="font-style: italic;"&gt;Purgatory&lt;/span&gt;.  The reviewed manuscript goes to &lt;span style="font-style: italic;"&gt;Heaven&lt;/span&gt; if the manuscript gets at least three out of four reviewers suggesting the manuscript for &lt;span style="font-style: italic;"&gt;Heaven&lt;/span&gt;.  In the case of a tie, the editor holds the tie-breaking vote, which he casts upon reading all four reviews.&lt;br /&gt;&lt;br /&gt;All reviews and the authors' responses to the reviews are publicly available with alongside the final manuscript. Both the original and the revised manuscript drafts are available as well.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size:180%;"&gt;&lt;span style="font-weight: bold;"&gt;The CFCS editor process&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;Editors are either  &lt;ol&gt;&lt;li&gt; reviewers whose quality reviews have gained them a large reviewer impact score and who agree to the job &lt;/li&gt;&lt;li&gt; invited editors (if there aren't enough high ranked reviewers) &lt;/li&gt;&lt;/ol&gt; &lt;p&gt;Editors decide if a &lt;span style="font-style: italic;"&gt;Purgatory &lt;/span&gt;track paper goes to &lt;span style="font-style: italic;"&gt;Purgatory &lt;/span&gt;or is sent to &lt;span style="font-style: italic;"&gt;Earth&lt;/span&gt;. Editors decide if a &lt;span style="font-style: italic;"&gt;Heaven &lt;/span&gt;track paper goes to &lt;span style="font-style: italic;"&gt;Limbo &lt;/span&gt;or is sent to &lt;span style="font-style: italic;"&gt;Earth&lt;/span&gt;. The main job of the editor is to eliminate rubbish (pseudoscience and just bad science). Editors must also decide if the subject categories selected by the authors are appropriate. Most importantly, editors hold the tie-breaking vote when there are two &lt;span style="font-style: italic;"&gt;Heaven &lt;/span&gt;votes and two &lt;span style="font-style: italic;"&gt;Purgatory &lt;/span&gt;votes from the four reviewers.  In cases with no tie, the reviewers alone decide the final destination of the manuscript.&lt;br /&gt;&lt;/p&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span style="font-size:180%;"&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;The CFCS user process&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; &lt;p&gt;Any registered user can score and comment any papers, comments, and reviews besides their own. A reader cannot score a paper, comment, or review without leaving a comment to explain their score. Scores and comments are publicly available with the manuscript and on the users' CFCS page.&lt;/p&gt;All users have a reviewer impact score, a comment impact score, and an author impact score.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span style="font-size:180%;"&gt;&lt;span style="font-weight: bold;"&gt;The status of a manuscript&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;The status of a paper follows two of the key ideas of CFCS: 1) information is always public; and 2) information is never deleted. Everything that happens to a paper on its route to &lt;span style="font-style: italic;"&gt;Heaven &lt;/span&gt;is recorded and posted for all to see. All reviewer comments, all responses to reviewer comments, and both versions of the manuscript are available for download. &lt;p&gt;A publication search in CFCS can be limited to certain types of papers (to allow for example only peer-reviewed work) or it can draw from all of the CFCS library.&lt;/p&gt;&lt;h3 style="font-weight: normal; font-style: italic;"&gt;&lt;span&gt;&lt;span style="font-size:130%;"&gt; Heaven&lt;/span&gt;&lt;/span&gt;&lt;/h3&gt;&lt;span style="font-style: italic;"&gt;Heaven &lt;/span&gt;is the pinnacle of CFCS. Manuscripts in &lt;span style="font-style: italic;"&gt;Heaven &lt;/span&gt;have been peer-reviewed by four reviewers, the authors have responded to the reviewer comments to improve their manuscript, and the manuscript received a majority &lt;span style="font-style: italic;"&gt;Heaven&lt;/span&gt; vote from the reviewers. Voting is carried out by the four reviewers plus the editor. The votes do not become public (to the reviewers or the editors) until all the votes are in (to prevent biased voting). Papers in &lt;span style="font-style: italic;"&gt;Heaven &lt;/span&gt;are charged a modest processing fee to allow them to be uniformly typeset in the style of the journal. Typeset papers are submitted to pubmed. The four reviewers set the initial paper impact score with their votes. These initial seed scores count double the normal reader submitted score. Once entering &lt;span style="font-style: italic;"&gt;Heaven&lt;/span&gt;, the manuscript can be scored and commented by all readers of CFCS to adjust each manuscript's impact score.&lt;br /&gt;&lt;h3 style="font-weight: normal; font-style: italic;"&gt;Limbo&lt;/h3&gt;Manuscripts for peer-review work are initially sent to &lt;span style="font-style: italic;"&gt;Limbo&lt;/span&gt;. A manuscript remains in &lt;span style="font-style: italic;"&gt;Limbo &lt;/span&gt;until it has received the necessary number of reviews, responded to those reviews, and been voted into &lt;span style="font-style: italic;"&gt;Heaven&lt;/span&gt;. Failure to respond to the reviewers (within a fixed time) and failure to receive a majority vote result in the manuscript being sent to &lt;span style="font-style: italic;"&gt;Purgatory&lt;/span&gt;.&lt;br /&gt;&lt;h3 style="font-weight: normal; font-style: italic;"&gt;Purgatory&lt;/h3&gt;&lt;span style="font-style: italic;"&gt;Purgatory &lt;/span&gt;track manuscripts only need to pass the editor's inspection (otherwise they go to &lt;span style="font-style: italic;"&gt;Earth&lt;/span&gt;). &lt;span style="font-style: italic;"&gt;Purgatory &lt;/span&gt;is an option for works where the authors don't want to go through peer-review. Examples of good pieces for purgatory include: reviews and reports of failed experiments. Upon entering &lt;span style="font-style: italic;"&gt;Purgatory&lt;/span&gt;, the manuscript can be scored by all CFCS readers to determine its paper impact score.&lt;br /&gt;&lt;h3 style="font-weight: normal; font-style: italic;"&gt;Earth&lt;/h3&gt;Manuscripts not passing the editor's initial quality screen go to &lt;span style="font-style: italic;"&gt;Earth&lt;/span&gt;. Authors get one petition to get out of &lt;span style="font-style: italic;"&gt;Earth &lt;/span&gt;and back into &lt;span style="font-style: italic;"&gt;Limbo &lt;/span&gt;or &lt;span style="font-style: italic;"&gt;Purgatory&lt;/span&gt;.&lt;h3 style="font-weight: normal; font-style: italic;"&gt;Hell&lt;/h3&gt;Manuscripts discovered to be fraudulent go to hell. (perhaps papers where the equation to word ratio is greater than one belong here too?)&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size:180%;"&gt;&lt;span style="font-weight: bold;"&gt;Definitions&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;score: a vote by a CFCS reader; a score can be positive (thumbs-up) or negative (thumbs-down); reviewer comments, reader comments, and manuscripts can all be scored by all CFCS readers; all scores must be accompanied by a comment where the reader explains their reasoning for the score&lt;/li&gt;&lt;li&gt;review: similar to the current scientific literature, a review in CFCS aims to strengthen the quality, rigor, and focus of the submitted manuscript; reviews are publicly viewable with the manuscript as are the author's response to the review&lt;/li&gt;&lt;li&gt;comment:     a comment is a CFCS reader's written opinion of a manuscript, review, or another person's comment&lt;/li&gt;&lt;li&gt;reviewer impact score: for each individual, this metric is determined by the number positive scores minus the number of negative scores from CFCS users for all of the reviews written by the individual&lt;/li&gt;&lt;li&gt;comment impact score: for each individual, this metric is determined by the number positive scores minus the number of negative scores from CFCS users for all of the comments written by the individual&lt;/li&gt;&lt;li&gt;paper impact score: for each manuscript, this metric is determined by the number positive scores minus the number of negative scores from CFCS users for that particular manuscript&lt;/li&gt;&lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3472100433840968771-584145932117424492?l=blog-di-j.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://blog-di-j.blogspot.com/feeds/584145932117424492/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3472100433840968771&amp;postID=584145932117424492' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3472100433840968771/posts/default/584145932117424492'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3472100433840968771/posts/default/584145932117424492'/><link rel='alternate' type='text/html' href='http://blog-di-j.blogspot.com/2007/09/center-for-contributory-science.html' title='Center for Contributory Science'/><author><name>J</name><uri>http://www.blogger.com/profile/02818497841390003056</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3472100433840968771.post-4679586148548275061</id><published>2007-09-23T10:07:00.000-07:00</published><updated>2007-09-23T10:13:31.221-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='scientific ethics'/><category scheme='http://www.blogger.com/atom/ns#' term='peer review'/><category scheme='http://www.blogger.com/atom/ns#' term='scientific publishing'/><title type='text'>towards a richer scientific literature</title><content type='html'>The &lt;a href="http://en.wikipedia.org/wiki/Peer_review"&gt;scientific review  and publication process&lt;/a&gt; has &lt;a href="http://wwmm.ch.cam.ac.uk/blogs/murrayrust/?p=529"&gt;received &lt;/a&gt;&lt;a href="http://www.nature.com/nature/peerreview/debate/index.html"&gt;increasing&lt;/a&gt; &lt;a href="http://www.retrovirology.com/content/3/1/55"&gt;attention&lt;/a&gt; over the last ten years; internet technologies have changed the way we search and read science; open access has changed our ability to share science; and highly publicized fraud causes have reminded us that our system has inherent flaws that may prove difficult to fix. Articles in this domain discuss the positive impact of open access, the growing problem of gift-authorship, and the burden on the review system caused by scientists who increasing opt for the top-down system of paper submission (i.e. submit to the top journals first and submit to increasingly lower impact factor journals as you get rejected and reedit your manuscript). I'm typically underwhelmed by the solutions proposed by such articles as they tend to send the message that grass roots revolution, "We're not gonna take it! No, we ain't gonna take it!", is needed to fix the system from the bottom-up. We should tell our deans, department chairs, PIs, etc... that we don't want to be ranked by our h-index, number of citations, and journal impact factors.&lt;br /&gt;&lt;br /&gt;But the reality is that we're all doing the best we can with the system that we have. If the system does not change, I guarantee that if I have my own lab someday, I'll submit my papers to the best journal I think they have a shot-in-hell of getting into. The truth is that when I submit to a good journal, &lt;a href="http://drugmonkey.wordpress.com/2007/08/17/my-research-rocks-and-yours-sucks/"&gt;I think that my paper belongs there&lt;/a&gt;. It's just the editors that incorrectly label my work as not novel enough. It's just the reviewers, defending their territory, that incorrectly label my work as lacking rigor because I don't know that the correct term for mismatches on the end of a RNA:RNA duplex is &lt;span style="font-style: italic;"&gt;dangling-ends&lt;/span&gt; not &lt;span style="font-style: italic;"&gt;shaggy-ends&lt;/span&gt; (I still like my term better mystery reviewer man).&lt;br /&gt;&lt;br /&gt;In my opinion, we have three problems with our current system:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;1) editors are not qualified to judge what will be a high impact paper&lt;br /&gt;&lt;/span&gt;I don't care if the editor is an active scientist or a full time editor. I don't care if he has two Nobel prizes. I don't care if he is related to Nostradamus. Besides the obvious, &lt;span style="font-style: italic;"&gt;have-to-be-cited&lt;/span&gt; papers like complete genome sequences, it's impossible to know what research done today will still be important 5 years from now. So why do we make this the first hurdle to publication?&lt;span style="font-style: italic;"&gt;&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style="font-weight: bold;"&gt;2) reviewers are helpful but are too focused on self-preservation to do the best job&lt;br /&gt;&lt;/span&gt;Please don't make me cite your paper, because in some obscure way you thought of my idea first. Please don't steal my result, because I can't easily identify you. Please don't nail me to a cross and treat me like an idiot, because I'm wearing a blind fold. The temptation is too strong. I noticed this in my own reviews, so &lt;a href="http://blog-di-j.blogspot.com/2007/09/what-i-read-before-i-write-review.html"&gt;I read something to tame my ego before starting and submitting every review I write&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;3) we have no good way to quickly judge papers, journals, and scientists&lt;/span&gt;&lt;br /&gt;Impact factors and h-indexes were designed to help, not hinder, science. Particularly in the USA, we strive for a meritocracy. Thus, we need some metric for sorting journals and scientists. I think most people would agree that the GRE, LSAT, and MCAT are poor predictors of a person's graduate school potential, but what else can a medical school with 3000 applications for 30 spots do? &lt;a href="http://www.dcscience.net/goodscience/?p=4"&gt;Perhaps there's no metric we can invent that is better than the opinion of human experts&lt;/a&gt;, but expert panels and opinions also suck a lot of time that could be used to &lt;span style="font-style: italic;"&gt;do&lt;/span&gt; science.&lt;br /&gt;&lt;br /&gt;To me, most of the other issues with our current publishing process derive from these three problems. Professors schmooze with editors at conferences, so that the editors will hopefully predict the future more favorably on their next submission. Reviewers reject valuable papers, because impact factor leery editors stress their journal's high rejection rate and the importance of novelty. Professors provide and receive gift-authorship, because they need a high h-index, lots of citations, and visibility in big journals to keep their jobs, get higher pay, and retain the respect of their peers.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;We are only human&lt;/span&gt;&lt;br /&gt;The writers of the US constitution and the great economists of the world accept our humanness and try to develop government and market systems that thrive &lt;span style="font-style: italic;"&gt;because of&lt;/span&gt; and &lt;span style="font-style: italic;"&gt;despite&lt;/span&gt; our human attributes. Checks-and-balances keep the government's power in check, while elections provide change as a society's goals evolve. Free market economic ideas allow efficient prices and economic growth, while federal monetary policies keep things like inflation from getting out of hand.&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;&lt;br /&gt;How can we integrate checks-and-balances into scientific review?&lt;br /&gt;&lt;/span&gt;Since transitions are often the trickest part, let's assume we're starting over from scratch with the scientific publication process. I think we can adapt ideas from Amazon.com, &lt;a href="http://en.wikipedia.org/wiki/Slashdot"&gt;Slashdot&lt;/a&gt;, and &lt;a href="http://en.wikipedia.org/wiki/Digg"&gt;Digg&lt;/a&gt; to create a better system.  People have already &lt;a href="http://www.nature.com/nature/peerreview/debate/nature04992.html"&gt;mentioned&lt;/a&gt; or even &lt;a href="http://www.plosone.org/static/ratingGuidelines.action"&gt;tried&lt;/a&gt; &lt;a href="http://www.biology-direct.com/info/about/"&gt;some&lt;/a&gt; &lt;a href="http://www.bmj.com/cgi/content/full/318/7175/4"&gt;of&lt;/a&gt; &lt;a href="http://www.nature.com/nature/peerreview/debate/nature05535.html"&gt;these&lt;/a&gt; &lt;a href="http://blogs.nature.com/wp/nascent/2006/12/nature_open_peerreview_trial_c.html"&gt;things&lt;/a&gt;, but so far nothing has struck me as likely to be successful. Journals are dabbling with these ideas, trying out one or two, but it is the combination of all of these in &lt;span style="font-style: italic;"&gt;one&lt;/span&gt; journal that I think has a chance of adoption and really modernizing the publication process.  For example, &lt;a href="http://phylogenomics.blogspot.com/2007/07/rated-my-first-paper-in-plos-one.html"&gt;few people&lt;/a&gt; are going to use the rating system at &lt;a href="http://www.plos.org/cms/node/241"&gt;PLoS One&lt;/a&gt;, because 1) it involves unnecessary work; and 2) it requires written public criticism of another scientist's work. Reason 2 alone will keep most people away, as flaky scientific egos are easily hurt, and science is a particularly bad field to accidentally burn your bridges. So a workable system would somehow need to compel people to comment and create an atmosphere where written criticism is the norm (and thus less dangerous; more like verbal criticism at a talk; note that &lt;a href="http://genefinding.blogspot.com/2007/09/plos-biology-vanity-publisher.html"&gt;good critical scientific debates do occur once in a while in the blogosphere - here's a good example on &lt;span&gt;Steven Salzberg's blog&lt;/span&gt;&lt;/a&gt;).&lt;br /&gt;&lt;br /&gt;In my opinion,&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;1) the new system must be comment/ratings rich&lt;br /&gt;&lt;/span&gt;Readers can rate reviewers and papers.    Similar to &lt;a href="http://digg.com/"&gt;digg.com&lt;/a&gt;, readers should be able to give an article a thumbs up or a thumbs down. The final score of a paper is just the sum of the up and down thumbs (e.g. 126 people like your article and 26 don't, your article has a score of 100)&lt;span style="font-weight: bold;"&gt;.  &lt;/span&gt;With these scores you can find the papers receiving the most attention (sum of up and down thumbs), most positive attention, and most negative attention.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;2) reviewers and commenters are reviewed&lt;br /&gt;&lt;/span&gt;&lt;span&gt;If someone on amazon.com writes an idiotic review, there's a nice &lt;span style="font-style: italic;"&gt;ReviewNotHelpful&lt;/span&gt; button you can click to make sure more people to waste time reading the review in the future.  &lt;a href="http://slashdot.org/"&gt;Slashdot&lt;/a&gt; has a similar, though more advanced, commenter scoring system.  We need a similar button to rate the ratings &lt;/span&gt;&lt;span&gt;in the scientific publication process&lt;/span&gt;&lt;span&gt;.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;3) the best set of reviewers in each subject category are invited to be editors&lt;br /&gt;&lt;/span&gt;&lt;span&gt;Rather than having a good-ole-boy pass the editorial torch to his former student, we can allow the hard working thoughtful reviewers to be our judges.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;4) the new system must be completely open&lt;br /&gt;&lt;/span&gt;No one is anonymous and all information is public. As reviewers accept a paper for review, their name should become publicly associated with the article. When they submit their review, the review should become available for everyone to see. The reviewer's score (determined by other people rating the reviewer) and all of their previous reviews and comments should also be available.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;5) nothing is destroyed&lt;br /&gt;&lt;/span&gt;&lt;span&gt;There should be no such thing as a rejected paper that no one sees. Trash science should be labeled as such by the community review and commenting system but not deleted. One man's trash might be another man's treasure.&lt;/span&gt;&lt;span style="font-weight: bold;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style="font-weight: bold;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style="font-weight: bold;"&gt;6) review or comment is a prerequirement to submission&lt;/span&gt;&lt;br /&gt;Before a paper goes to the editor, all authors on the paper must review another paper in the journal. A paper with 50 authors, contributes 50 reviews &lt;span style="font-style: italic;"&gt;before&lt;/span&gt; going to review. A professor that slaps his name on 100 publications a year must be willing to write 100 reviews a year. If the professor has their student write the review for them, they will at least know they are putting their own reputation on the line, because the review is associated with the professor's name, and the review is public. If there are papers to be reviewed, they must choose a paper if the paper is in their subject area. Otherwise, they must comment on a certain number of reviewers or papers (e.g. at least three). By forcing comments, you alleviate the laziness factor, which I think will cause other rating systems like PLoS One to fail. We barely have enough time as it is to read a paper no less leave a comment on it. But if doing so is a prerequisite to publication, we'll do so. And if we know that our comments will be publicly available and associated with our name, we'll make sure not to write rubbish.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;the ranking problems we don't need to worry about&lt;/span&gt;&lt;br /&gt;&lt;span&gt;Two problems with internet ratings systems are that they thrive on sensationalism and that they collect rubbish comments (e.g. youtube comments are often just idiots making fun of the people in the movie). Since a good reputation is vitally important to a scientist, we needn't worry too much about rubbish comments. I also think that scientists already have averse reactions to flashy papers driven more towards publicity than science, so perhaps a commenting system will actually &lt;span style="font-style: italic;"&gt;reduce &lt;/span&gt;sensationalism.&lt;br /&gt;&lt;/span&gt;&lt;span style="font-weight: bold;"&gt;&lt;span style="font-weight: bold;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;I've written up the details of a &lt;a href="http://blog-di-j.blogspot.com/2007/09/center-for-contributory-science.html"&gt;hypothetical journal that incorporates these scientific publishing ideas&lt;/a&gt; in a separate blog article.&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3472100433840968771-4679586148548275061?l=blog-di-j.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://blog-di-j.blogspot.com/feeds/4679586148548275061/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3472100433840968771&amp;postID=4679586148548275061' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3472100433840968771/posts/default/4679586148548275061'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3472100433840968771/posts/default/4679586148548275061'/><link rel='alternate' type='text/html' href='http://blog-di-j.blogspot.com/2007/09/towards-richer-scientific-literature.html' title='towards a richer scientific literature'/><author><name>J</name><uri>http://www.blogger.com/profile/02818497841390003056</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3472100433840968771.post-4785053361614737722</id><published>2007-09-23T10:06:00.001-07:00</published><updated>2007-09-23T10:14:30.253-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='scientific ethics'/><category scheme='http://www.blogger.com/atom/ns#' term='peer review'/><title type='text'>What I read before I write a review</title><content type='html'>Writing an anonymous scientific review can make even the tamest human take a jab or two at their blind-folded peer. Because of this, I'm a fan of moving towards open peer review where we can treat each other like humans.&lt;br /&gt;&lt;br /&gt;I noticed this aggressive tendency in myself when I first started being asked to write reviews four years ago. To make sure I don't step beyond where I'd like to be as a reviewer (i.e. critical and honest but not aggressive), I read the following text before starting and before submitting every review.&lt;br /&gt;&lt;br /&gt;&lt;h2&gt;&lt;span style="font-size:130%;"&gt;When reviewing papers&lt;/span&gt;&lt;/h2&gt;&lt;ul&gt;&lt;li&gt; don't be evil &lt;/li&gt;&lt;li&gt; start with a compliment &lt;ul&gt;&lt;li&gt; say the positive general comments before you say the negative general comments. If you don't have positive comments, read it again. The editor probably wouldn't give you total crap. &lt;/li&gt;&lt;/ul&gt; &lt;/li&gt;&lt;li&gt; don't nitpick too much just to feel powerful &lt;/li&gt;&lt;li&gt; try to say things you'd like to be told if it were your paper (i.e. comments to strengthen the manuscript not belittle the authors) &lt;/li&gt;&lt;li&gt;number the comments so the authors can easily refer to them if they resubmit &lt;/li&gt;&lt;li&gt; don't be evil &lt;/li&gt;&lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3472100433840968771-4785053361614737722?l=blog-di-j.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://blog-di-j.blogspot.com/feeds/4785053361614737722/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3472100433840968771&amp;postID=4785053361614737722' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3472100433840968771/posts/default/4785053361614737722'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3472100433840968771/posts/default/4785053361614737722'/><link rel='alternate' type='text/html' href='http://blog-di-j.blogspot.com/2007/09/what-i-read-before-i-write-review.html' title='What I read before I write a review'/><author><name>J</name><uri>http://www.blogger.com/profile/02818497841390003056</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3472100433840968771.post-8324382249064934131</id><published>2007-09-01T15:49:00.001-07:00</published><updated>2007-09-03T09:43:02.280-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='untested ideas'/><category scheme='http://www.blogger.com/atom/ns#' term='biotechnology'/><title type='text'>Effect of sequence level mutations on transcription, translation, and noise</title><content type='html'>One of the main biological questions explored when DNA sequencing first became a practical laboratory technique was how the nucletides in a gene's promoter and ribosomal binding site define the gene's interactions with transcription factors and the ribosome translation apparatus.  At least in prokaryotes, these interactions largely determine the levels of transcript and protein available for each gene, and thus provide the crucial information of how a genome regulates itself.&lt;br /&gt;&lt;br /&gt;This early work resulted in many of the promoter analysis tools that are still widely used today.  In particular, the promoters were often analyzed in terms of information content, and this information content was visualized using  &lt;a href="http://en.wikipedia.org/wiki/Sequence_logo"&gt;sequence logos&lt;/a&gt;.  These sequence logos are still the most popular way to display DNA binding sites.  I'm not sure why this field tappered off a little.  My guess is that the people in this field had maxed out the information that was affordably obtainable with the available technologies.&lt;br /&gt;&lt;br /&gt;But as most biotech loving biologists know, the times are a changin in biotech, and we have new sequencing technologies that enable drastically larger sequencing studies to be undertaken.  Importantly, we are faced with several quite-different sequencing methods (unlike the previous era in sequencing biotech which was almost exclusively driven by ABI's advances in Sanger sequencing).  I think with these new technologies coming online, it's time we dusted off our promoters and start figuring out how they work.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;What we still must learn about promoters&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;For several promoters (ideally for all promoters), we need to exhaustively determine how base-pair changes in the promoter lead to changes in the amount of transcription and translation.  We must determine this information across time, so we can also determine the rates of transcription and translation.  Finally, we must determine these values at the level of single-cells, so that we can also obtain information about the noise inherent in each promoter sequence.&lt;br /&gt;&lt;br /&gt;In the early 90s, these type of analyses were at least partially undertaken with populations of cells and 100-200 promoter mutations.  Now we must study several promoters, with millions of different mutations for each promoter, and with multiple single-cell replicates of each mutation so we can estimate noise.&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style="font-weight: bold;"&gt;What is this new level of promoter knowledge good for?&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;&lt;span&gt;We need to understand to what extent it is possible to build a computational model to predict translation and transcription from sequence alone.  Such a model could act like a molecular biologist's version of Hardy-Weinberg equilibrium.  That is if a promoter does not fit the model, it would suggest that there is some additional regulation (e.g. small RNA) that is not explained by the binding of transcription factors and the ribosome.  In addition, the ability to screen vast numbers of promoter variants could be of huge value to forward biological engineering.  Synthetic biologists often tune their human created networks using directed evolution.  While directed evolution is a very powerful and massively parallel way to optimize a genetic system, the human that created the system in the first place has limited control over the final result.  For example, it may be that the network evolved to generate ethanol from cellulose is extremely noise and could be made more efficient by fine-tuning the promoters in a more intelligent design.&lt;br /&gt;&lt;/span&gt;&lt;span style="font-weight: bold;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style="font-weight: bold;"&gt;An idea for a massively parallel method to determine the effect of sequence level mutations on transcription, translation, and noise&lt;br /&gt;&lt;/span&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bp3.blogger.com/_ErjdhkTSmXk/RtsIEU7t6sI/AAAAAAAAAAM/plPizSemKQ0/s1600-h/single_cell_transcription.png"&gt;&lt;img style="margin: 0pt 0pt 10px 10px; float: right; cursor: pointer;" src="http://bp3.blogger.com/_ErjdhkTSmXk/RtsIEU7t6sI/AAAAAAAAAAM/plPizSemKQ0/s400/single_cell_transcription.png" alt="" id="BLOGGER_PHOTO_ID_5105683472986270402" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;I think the tools are already available to determine the effect of sequence level mutations on transcription, translation, and noise in single-cells.  One approach I've thought about is shown in the figure on the right (&lt;a href="http://www.jeremiahfaith.com/blog_figs/single_cell_transcription.pdf"&gt;&lt;span style="font-size:78%;"&gt;click to see an easier to read/print pdf version of the figure&lt;/span&gt;&lt;/a&gt;).  The idea takes a cell-in-emulsion approach (see &lt;a href="http://www.pnas.org/cgi/content/full/98/8/4552?ijkey=3883104b42d1f5e4ee7fbc9c50368064b8b90610"&gt;&lt;span style="font-size:78%;"&gt;Directed evolution of polymerase function by compartmentalized self-replication&lt;/span&gt;&lt;/a&gt;) and combines it with the polony sequencing method pioneered in the Church lab (see &lt;a href="http://www.ncbi.nlm.nih.gov/sites/entrez?Db=pubmed&amp;Cmd=ShowDetailView&amp;amp;TermToSearch=16081699&amp;ordinalpos=4&amp;amp;itool=EntrezSystem2.PEntrez.Pubmed.Pubmed_ResultsPanel.Pubmed_RVDocSum"&gt;&lt;span style="font-size:78%;"&gt;Accurate multiplex polony sequencing of an evolved bacterial genome&lt;/span&gt;&lt;/a&gt;).&lt;br /&gt;&lt;br /&gt;The first step (top right) is to synthesize a known promoter with a large number of random nucleotides.  This is very similar to the method used by Stormo's lab many years ago (see &lt;a href="http://www.ncbi.nlm.nih.gov/sites/entrez?Db=pubmed&amp;Cmd=ShowDetailView&amp;amp;TermToSearch=8165145&amp;ordinalpos=2&amp;amp;itool=EntrezSystem2.PEntrez.Pubmed.Pubmed_ResultsPanel.Pubmed_RVDocSum"&gt;&lt;span style="font-size:78%;"&gt;Quantitative analysis of ribosome binding sites in E.coli&lt;/span&gt;&lt;/a&gt;), except that with modern sequencing methods we can drastically increase the number of random sites that we explore.  A GFP reporter is placed directly after the promoter so that we can measure the amount of protein generated.  Since, the vast majority of mutations will probably result in little to no expression, it may be useful to also add a bactericidal antibiotic resistance gene after the GFP to provide an easy way to get rid of unproductive promoters (for some studies, you would probably not want to remove these low output promoters).&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;&lt;br /&gt;&lt;/span&gt;The second step (top left) is to take a dynal bead and attach a primer to amplify the promoter, a second primer to amplify the GFP sequence, and an anti-GFP antibody.&lt;br /&gt;&lt;br /&gt;Next the bead and the bacteria a placed together into an emulsion.  In the emulsion solution, we also need to include reverse primers for the promoter and GFP sequence, reverse transcriptase, and PCR reagents.  You would need to mess around a little with the dilutions and concentrations of beads and cells to maximize the case where you have only one bead and one cell in each emulsion.&lt;br /&gt;&lt;br /&gt;Now we have the cells isolated into separate chambers, and we have one bead with the bacteria. This bead will provide the source of our future information read out.  Also remember that by synthesizing our promoters with N's, we actually have generated a huge library of different promoters.  So that each emulsion will have a different variant of our promoter. We then lyse the cells.  I'm not sure of the best way to lyse the cell.  But in the figure, I just assumed we used extreme heat.  Because of the next step, it may be wise to use a gentler method to lyse them, such as placing a protein that will cause cell lysis (e.g. lysozyme or ccdB) on a promoter that is heat inducible, so you'd only need to heat the cells up to 42C rather than 95C.  Once the cells are lysed, the GFP expressed from the synthetic promoter should diffuse around the emulsion until they meet and bind the antiGFP attached to our dynal bead.&lt;br /&gt;&lt;br /&gt;We've got the protein on our bead, now we need to attach the DNA.  Since one of the things we want to measure is transcript concentration which is mRNA, we need to do a reverse transcription reaction.  Reverse transcriptase is not very heat stable, which is why I stressed above that we might want to lyse our cells more gently than by heating them to 95C.   However, Superscript III from Invitrogen is pretty heat stable, so that might be worth a shot too.  Since we include a reverse primer to our GFP sequence into the emulsion, we should have a fairly specific reverse transcription.&lt;br /&gt;&lt;br /&gt;Finally, we need to attach the DNA to our bead, we can do so by running a multiplex PCR reaction for a few cycles.  Since the forward primers are on the dynal bead, the PCR reaction results in the DNA being stuck to the bead.&lt;br /&gt;&lt;br /&gt;And now for the fun part, let's measure protein concentration, transcript concentration, and determine the promoter sequence for our single cells (bottom row of the figure).  To do this we lay our beads out on a microscope slide or some type of microfluidic device.  We can measure the protein concentration directly as GFP fluorescence.   Next we measure the transcript concentration as the amount of GFP cDNA attached to the bead.  For increased accuracy, we might want to measure this concentration using cycled PCR reactions (like qPCR on a microscope).  The concentration of the promoter sequence and GFP sequence attached to the bead can be measured in each round using two different molecular beacons.  The concentration of promoter sequence can be used to normalize the concentration of the transcript (i.e. to try and remove artifacts due to the variance in emulsion sizes, emulsion PCR efficiency, and to remove the background transcript value that is due to amplification from the DNA rather than cDNA sequence).  This bead based DNA quantitation can borrow some ideas from the BEAMing method (see &lt;a href="http://www.ncbi.nlm.nih.gov/sites/entrez?Db=pubmed&amp;Cmd=ShowDetailView&amp;amp;TermToSearch=16791214&amp;ordinalpos=1&amp;amp;itool=EntrezSystem2.PEntrez.Pubmed.Pubmed_ResultsPanel.Pubmed_RVDocSum"&gt;&lt;span style="font-size:78%;"&gt;BEAMing: single-molecule PCR on microparticles in water-in-oil emulsions&lt;/span&gt;&lt;/a&gt;).&lt;br /&gt;Now that we've measured protein concentration and transcript concentration, it is time to determine the promoter sequence responsible for these concentrations.  In some ways, this is the most difficult step.  But in practice, it may be the easiest step, as the polony sequencing method does exactly that.&lt;br /&gt;&lt;br /&gt;With the size of the beads and the massively parallel nature of this protocol, it should be possible to have the same sequence appear multiple times, allowing the estimation of noise for the tested promoters.&lt;br /&gt;&lt;br /&gt;Again, I haven't tried any of this stuff, and I'm not sure it'll work.  I just wanted to throw the idea out there in case someone else is thinking about this problem too.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Open questions with this idea&lt;br /&gt;&lt;/span&gt;&lt;span style="font-weight: bold;"&gt;&lt;/span&gt;&lt;ol&gt;&lt;li&gt;how quantitative is emulsion PCR and how is noise influenced by the size of the emulsion&lt;/li&gt;&lt;ol&gt;&lt;li&gt;can we increase quantitative accuracy of our mRNA concentration by running very few emulsion cycles and then running a microscope based qPCR on our bead&lt;br /&gt;&lt;/li&gt;&lt;/ol&gt;&lt;li&gt;how strongly does the GFP bind to our bead (e.g. when we break the emulsions, can GFP move from one bead to the next?  we can test this by using mCherry in one sample, GFP in another, and then breaking the emulsions together to see if any beads have both proteins attached)&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Is crowding on the bead going to cause problems; that bead has a lot on it.  does this bias our results in unpredictable ways?&lt;/li&gt;&lt;li&gt;there are a &lt;span style="font-style: italic;"&gt;lot&lt;/span&gt; of steps.  long protocols can lead to excess experimenter derived error and slow the techniques adoption&lt;br /&gt;&lt;/li&gt;&lt;/ol&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3472100433840968771-8324382249064934131?l=blog-di-j.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://blog-di-j.blogspot.com/feeds/8324382249064934131/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3472100433840968771&amp;postID=8324382249064934131' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3472100433840968771/posts/default/8324382249064934131'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3472100433840968771/posts/default/8324382249064934131'/><link rel='alternate' type='text/html' href='http://blog-di-j.blogspot.com/2007/09/effect-of-sequence-level-mutations-on.html' title='Effect of sequence level mutations on transcription, translation, and noise'/><author><name>J</name><uri>http://www.blogger.com/profile/02818497841390003056</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://bp3.blogger.com/_ErjdhkTSmXk/RtsIEU7t6sI/AAAAAAAAAAM/plPizSemKQ0/s72-c/single_cell_transcription.png' height='72' width='72'/><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3472100433840968771.post-7285885234277639489</id><published>2007-07-10T10:46:00.000-07:00</published><updated>2007-09-03T09:49:26.939-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='scientific ethics'/><category scheme='http://www.blogger.com/atom/ns#' term='open notebook science'/><title type='text'>Tips (rules?) for Open Notebook Science</title><content type='html'>I recently decided to &lt;a href="http://blog-di-j.blogspot.com/2007/07/giving-open-notebook-science-try.html"&gt;give open notebook science a try&lt;/a&gt;.  In order for my lab notebook to be useful to others, I've gotta to put a little extra effort into making my notebook more understandable to outsiders.  I think a lab notebook will never and perhaps should never be as easy to understand as a paper, since you want to spend most of your time doing science rather than making beautiful figures and writing stunning introductions.  I would simply like to reach the point where someone in a similar field to me could pick up my notebook and understand it without too much effort.&lt;br /&gt;&lt;br /&gt;I'm trying to catalog some basic ideas that would promote better open notebooks, with better defined as:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;searchable&lt;/li&gt;&lt;li&gt;understandable&lt;/li&gt;&lt;li&gt;dependable (i.e. small software failures won't forever zap all of your results)&lt;br /&gt;&lt;/li&gt;&lt;/ol&gt;&lt;br /&gt;Here's what I've come up with so far.   Please comments if you think of other tips.&lt;br /&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;use some sort of version control system (wiki, cvs, subversion)&lt;/li&gt;&lt;ul&gt;&lt;li&gt;this is particularly important if you have an electronic only lab notebook as it creates a time stamp for everything you enter into the notebook, which would be important for patents and other legal stuff&lt;/li&gt;&lt;li&gt;it also allows you to go back and look at previous versions&lt;/li&gt;&lt;/ul&gt;&lt;li&gt;backup your notebook&lt;/li&gt;&lt;ul&gt;&lt;li&gt;with cvs or subversion back up your repository&lt;/li&gt;&lt;li&gt;with wiki's this becomes wiki specific, so check the documentation for your wiki&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;&lt;li&gt;organize hierarchically&lt;/li&gt;&lt;ul&gt;&lt;li&gt;break the notebook into sections&lt;/li&gt;&lt;/ul&gt;&lt;ul&gt;&lt;li&gt;break the sections into subsections&lt;/li&gt;&lt;li&gt;remember to include a time stamp in the text of your notebook at the beginning of each new experiment you do and at the beginning of each section you start&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;&lt;li&gt;introduce every section giving the bigger picture (not too long, just a paragraph or so on the big idea); a nice figure would be useful too since many scientists prefer skimming figures to skimming text&lt;br /&gt;&lt;/li&gt;&lt;li&gt;if a section is complete or dead (i.e. you've abandoned the project), state so very prominently at the start of the section.  If the work was published, provide a reference.  If the work was abandoned, perhaps explain why.&lt;/li&gt;&lt;ul&gt;&lt;li&gt;also if a section hasn't been touched for a long while, you might add something like "This chapter is not being actively worked on"&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;&lt;li&gt;link to raw data when and where you mention it in your notebook&lt;/li&gt;&lt;li&gt;remember the notebook is public, so be careful not to say stuff that might offend sensitive ears or sensitive scientists&lt;/li&gt;&lt;li&gt;include high quality images in your documents;  things like agarose gels will need to be zoomed in a lot to be inspected in detail; if you convert your full resolution tiff to low-quality jpeg, it'll just look like pixelated blah.  Then again, you can't always use full-size images, particularly from a high megapixels camera, because the notebook will quickly become giant;  so here is my suggestion:&lt;br /&gt;&lt;/li&gt;&lt;ul&gt;&lt;li&gt;if the image is small (&lt;1mb)&gt;&lt;/li&gt;&lt;li&gt;if it is huge but detail doesn't matter, include a decent resolution image that can be zoomed in 2-4x and still look nice&lt;/li&gt;&lt;li&gt;if it is huge and detail matters, include a decent resolution image, but also include a link to the full size image like you would for other raw data&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;&lt;li&gt;construct the document in such a way that it is easily indexed by search engines (otherwise no one will find your results;  people probably wont read your lab notebook for fun)&lt;/li&gt;&lt;ul&gt;&lt;li&gt;the above statement difficult to comply with if you use pdfs because Google currently only indexes the first few hundred kbytes of a pdf; my lab manual is 30MB&lt;/li&gt;&lt;/ul&gt;&lt;/ul&gt;&lt;br /&gt;please let me know if you have any ideas or suggestions about these rules.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3472100433840968771-7285885234277639489?l=blog-di-j.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://blog-di-j.blogspot.com/feeds/7285885234277639489/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3472100433840968771&amp;postID=7285885234277639489' title='6 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3472100433840968771/posts/default/7285885234277639489'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3472100433840968771/posts/default/7285885234277639489'/><link rel='alternate' type='text/html' href='http://blog-di-j.blogspot.com/2007/07/tips-rules-for-open-notebook-science.html' title='Tips (rules?) for Open Notebook Science'/><author><name>J</name><uri>http://www.blogger.com/profile/02818497841390003056</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>6</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3472100433840968771.post-7423683702014694860</id><published>2007-07-10T10:45:00.002-07:00</published><updated>2007-09-03T09:48:58.029-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='scientific ethics'/><category scheme='http://www.blogger.com/atom/ns#' term='open notebook science'/><title type='text'>Giving Open Notebook Science a try</title><content type='html'>One thing I didn't expect when I started blogging a month ago was to read other people's blogs.  But I did, and I've been positively surprised at the quality of the writing  in the science part of the of &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_0"&gt;&lt;span class="blsp-spelling-error" id="SPELLING_ERROR_0"&gt;blogosphere&lt;/span&gt;&lt;/span&gt;.   I think the lack of top-down editorial control spurs more novel ideas.&lt;br /&gt;&lt;br /&gt;I've seen a number of posts in the &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_1"&gt;&lt;span class="blsp-spelling-error" id="SPELLING_ERROR_1"&gt;blogosphere&lt;/span&gt;&lt;/span&gt; about different aspects of &lt;span style="font-style: italic;"&gt;Open Science&lt;/span&gt;.  I don't want to explain Open Science, particularly since it's not clear exactly what it is yet.  But Bill Hooker at 3 quarks daily wrote a nice three part series (&lt;a href="http://3quarksdaily.blogs.com/3quarksdaily/2006/10/the_future_of_s_1.html"&gt;I&lt;/a&gt;, &lt;a href="http://3quarksdaily.blogs.com/3quarksdaily/2006/11/the_future_of_s.html"&gt;II&lt;/a&gt;, &lt;a href="http://3quarksdaily.blogs.com/3quarksdaily/2007/01/the_future_of_s.html"&gt;III&lt;/a&gt;) on the subject, which you should read if you're interested in the details.  Here I'm only going to discuss &lt;a href="http://drexel-coas-elearning.blogspot.com/2006/09/open-notebook-science.html"&gt;Open Notebook Science&lt;/a&gt;, which is a term coined by Jean-Claude Bradley.  The idea is simply that the heart of every person's research - their lab notebook - should be open to the world.&lt;br /&gt;&lt;br /&gt;Since most of our scientific work is funded by tax payers who expect their money to be well-spent, it's interesting that openness isn't required.  Science typically builds on the body of available knowledge - the more knowledge available the faster science goes.  It's striking when you visit other labs in person; you see all of their unpublished work, and you know that most of their results and data won't be available to the bulk of the scientific community until a year after each particular scientific project is &lt;span style="font-style: italic;"&gt;finished&lt;/span&gt;.  By the time papers are in print, it's old news to the insiders.  More striking is when you visit labs whose work you've thought about replicating and expanding on.  It's not too uncommon to find that only one person in the entire lab is able to get the technique to work, and even for him the technique only works on Wednesdays.  This type of information would be useful to know &lt;span style="font-style: italic;"&gt;before&lt;/span&gt; you embark on a useless three months trying to adapt their method.  But scientific publications are covered in a thick coat of high-gloss finish, making these unacknowledged difficulties hard to detect.&lt;br /&gt;&lt;br /&gt;Lab notebooks on the other hand are flat black.  As long as people keep them regularly updated, they contain the good, the bad, and the completely &lt;span class="blsp-spelling-corrected" id="SPELLING_ERROR_2"&gt;nonsensical&lt;/span&gt; results.&lt;br /&gt;&lt;br /&gt;Today I test the waters of Open Notebook Science.&lt;br /&gt;&lt;br /&gt;The latest version of my lab notebook is now automatically posted on &lt;a href="http://www.jeremiahfaith.com/open_notebook_science/"&gt;J's Lab Notebook Page&lt;/a&gt; each night.  I've been using an electronic lab notebook for two years now, so there's quite a bit of data in there - good and bad (300+ pages).&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;What I hope to gain by being Open Notebook:&lt;br /&gt;&lt;/span&gt;&lt;ol&gt;&lt;li&gt;a nice warm fuzzy feeling that I have nothing to hide&lt;br /&gt;&lt;/li&gt;&lt;li&gt;less likely to be accused of scientific fraud (though I really wasn't worried about this in the first place)&lt;br /&gt;&lt;/li&gt;&lt;li&gt;potentially helping others by allowing early access to my results and failed experiments&lt;br /&gt;&lt;/li&gt;&lt;li&gt;I really hope people will notice stuff I'm doing wrong and LET ME KNOW - would be a very big benefit if it were to occur&lt;br /&gt;&lt;/li&gt;&lt;/ol&gt;&lt;span style="font-weight: bold;"&gt;Bad things I don't think will happen by being Open Notebook:&lt;br /&gt;&lt;/span&gt;&lt;ol&gt;&lt;li&gt;people will take little details of the results from my experiments and nitpick about conclusions I've published based on the results - claiming the results in my notebook don't support the results and conclusions in my publications.&lt;/li&gt;&lt;ul&gt;&lt;li&gt;I don't think this will happen, since I'm pretty careful with what I publish and with doing proper stats and such.&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;&lt;li&gt;people will take my data and scoop me&lt;/li&gt;&lt;ul&gt;&lt;li&gt;I think people are busy enough with their own work that they don't need to publish mine.&lt;/li&gt;&lt;/ul&gt;&lt;ul&gt;&lt;li&gt;By putting my data on the web as soon as I make it, I have a pretty strong case to say I'm first (as long as other people see my results too; otherwise, you have the problem with the tree falling in the forest that may or may not make a sound)&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;&lt;/ol&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3472100433840968771-7423683702014694860?l=blog-di-j.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://blog-di-j.blogspot.com/feeds/7423683702014694860/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3472100433840968771&amp;postID=7423683702014694860' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3472100433840968771/posts/default/7423683702014694860'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3472100433840968771/posts/default/7423683702014694860'/><link rel='alternate' type='text/html' href='http://blog-di-j.blogspot.com/2007/07/giving-open-notebook-science-try.html' title='Giving Open Notebook Science a try'/><author><name>J</name><uri>http://www.blogger.com/profile/02818497841390003056</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3472100433840968771.post-2715824006678053416</id><published>2007-07-04T21:07:00.000-07:00</published><updated>2007-09-03T09:39:25.108-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='scientific ethics'/><title type='text'>pushing the boundaries of advertising in science</title><content type='html'>&lt;span style="font-size:100%;"&gt;Soon after starting their first lab, most new professors are disappointed to find that they spend a disproportionate amount of time on marketing and fund raising.  There are a lot of smart  scientists doing interesting work, and in the end, the ideas that are shouted loudest and most frequently become the accepted doctrine at any given time.  That's not to say that working hard towards solving important scientific problems isn't important.  But if you work hard and solve an important problem, it is very likely that no one will know unless you go out and advertise it (Mendel, Newton, and Einstein were all lost to the world initially because they were closet geniuses).  And since everyone is advertising the important problems that they've solved, science becomes something of a popularity contest.&lt;br /&gt;&lt;br /&gt;To be popular, you need to be a constant member of the lecture circuit.  Every field has their own set of important conferences.  I do bioinformatics research.  If you want people to know you in bioinformatics (unless you are old and have already established a reputation through many years on the lecture circuit), it would be very useful to give a lecture at RECOMB, PSB, and ISMB.  Enough people hear your story, enough people hear your story again, and again, and again, and they start believing.  They start telling their friends.  Next thing you know, people with pocket protectors are coming up to you on the street asking for your autograph.&lt;br /&gt;&lt;br /&gt;The problem with the lecture circuit is that it is pretty expensive.   After you pay for the flight, hotel, etc... you've spent $1000.  But you can't just bring yourself. You need to bring a lab member or two with a poster, so that your students can start learning how the lecture circuit works.  So the conference costs you $1000-5000.  Our tax dollars hard at work.&lt;br /&gt;&lt;br /&gt;I know, I'm a little sarcastic, and sometimes useful things like long fruitful collaborations get started at conferences.  But you get my point.  What we're doing with all of this is a roundabout version of what coke, pepsi, nike, apple, and state farm do more directly.  We're securing name recognition and our place in the marketplace.&lt;br /&gt;&lt;br /&gt;I propose that we be more direct.  Why not skip one conference a year.  That saves about $3000.  Take this money and invest it into Adwords, Google's text based advertising product.  Let me giving an example.  Recently I was involved in some network inference work that resulted in the PLoS Biology publication,&lt;/span&gt;&lt;span style="font-size:100%;"&gt;&lt;span style="font-weight: bold;"&gt; &lt;/span&gt;&lt;a href="http://biology.plosjournals.org/perlserv/?request=get-document&amp;doi=10.1371%2Fjournal.pbio.0050008"&gt;&lt;span style="font-style: italic;"&gt;Large-Scale Mapping and Validation of &lt;/span&gt;&lt;em style="font-style: italic;"&gt;Escherichia coli&lt;/em&gt;&lt;/a&gt;&lt;span style="font-style: italic;"&gt;&lt;a href="http://biology.plosjournals.org/perlserv/?request=get-document&amp;doi=10.1371%2Fjournal.pbio.0050008"&gt; Transcriptional Regulation from a Compendium of Expression Profiles&lt;/a&gt;.  &lt;/span&gt;Along with our analysis, we collected 445 &lt;span style="font-style: italic;"&gt;E. coli&lt;/span&gt; Affymetrix microarrays, and we organized a set of software for benchmarking our algorithms and future network algorithms using the large amount of regulatory information that's already known for &lt;span style="font-style: italic;"&gt;E. coli&lt;/span&gt; (yes, I'm marketing now, so go check out the paper if you're interested in network inference).&lt;br /&gt;&lt;br /&gt;With Google Adwords, you bid on keywords; when those keywords are searched for and you are the winning bidder, your link and a little text goes up on the side of the Google search.  For example if someone searches for "network inference", I might want a link that says &lt;a style="color: rgb(51, 204, 255);" href="http://gardnerlab.bu.edu/netinfer_plos_2007/?page_id=5"&gt;&lt;span style="color: rgb(0, 204, 204);"&gt;Benchmark Your Network Inference Algorithm using our open source matlab scripts&lt;/span&gt;.&lt;/a&gt;  Or if they type "E. coli affymetrix", I might want a sponsored link ad that says, &lt;a href="http://m3d.bu.edu/"&gt;&lt;span style="color: rgb(0, 204, 204);"&gt;Download custom datasets from a publicly available E. coli Affymetrix compendium at M3D&lt;/span&gt;.&lt;/a&gt;  Since there's zero competition for those keywords, each click on my advertisement costs the Adwords minimum: 5 cents.  And unlike my $3000 conference where the lecture hall is empty because I got the 8AM slot on the last day of the conference, these people are actually interested enough to click on my ad, which means they'll probably have a decent look at what I have to say.   At 5 cents a click, my $3000 will give me 60,000 people that might find my data useful - more than the largest of conferences.&lt;br /&gt;&lt;br /&gt;We want to create useful science that others can build on. And we want to build on other peoples' useful science.  Why shouldn't we pay a dime to find each other?  I know some people will think this is going too far,  but we're doing it indirectly already.  And why are we doing it?  To a large extent, because no scientist has any free time.  So if you want another scientist to look at your science, you've got to stick it right in their face.   Once it's there, they can decide if it's the science they're looking for.&lt;br /&gt;&lt;br /&gt;I think Adwords could free up a lot of time for professors, allowing them to spend less time marketing and more time on the original focus of their professional lives:  solving important scientific problems.&lt;br /&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3472100433840968771-2715824006678053416?l=blog-di-j.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://blog-di-j.blogspot.com/feeds/2715824006678053416/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3472100433840968771&amp;postID=2715824006678053416' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3472100433840968771/posts/default/2715824006678053416'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3472100433840968771/posts/default/2715824006678053416'/><link rel='alternate' type='text/html' href='http://blog-di-j.blogspot.com/2007/07/pushing-boundaries-of-advertising-in_04.html' title='pushing the boundaries of advertising in science'/><author><name>J</name><uri>http://www.blogger.com/profile/02818497841390003056</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3472100433840968771.post-4328806124597782299</id><published>2007-06-26T13:07:00.000-07:00</published><updated>2007-09-03T09:42:39.006-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='untested ideas'/><category scheme='http://www.blogger.com/atom/ns#' term='host microbe interactions'/><category scheme='http://www.blogger.com/atom/ns#' term='evolution'/><title type='text'>Mutations, gene passing, and the evolution of gut microbes</title><content type='html'>The gut is a particularly interesting niche for evolution because things change there so rapidly.  You eat a undercooked hamburger from a poorly butchered cow and all of a sudden your intestine is full of &lt;a href="http://en.wikipedia.org/wiki/Escherichia_coli_O157:H7"&gt;&lt;span style="font-style: italic;"&gt;E. &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_0"&gt;coli&lt;/span&gt;&lt;/span&gt; O157:H7&lt;/a&gt;, you've got the worst stomach ache of your life, and you have bloody diarrhea.   But this doesn't just create an difficult life for you, this also causes a great deal of confusion for the bacteria that were happily living in your intestine.  Before the O157:H7 invaders arrived, your normal intestine residents (i.e. your normal flora) were grabbing the food you didn't eat and passing on the some of the benefits to you.  Now all of a sudden the flow rate through your intestine is much faster.  Perhaps the normal flora are having trouble staying attached to your intestinal lining?  A new food source, blood, has arrived from the &lt;span style="font-style: italic;"&gt;E. &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_1"&gt;coli&lt;/span&gt;&lt;/span&gt; O157:H7 that so rudely invaded your intestine.   And most of your intestines normal residents are probably not optimized to eat blood.&lt;br /&gt;&lt;br /&gt;Now let's assume you were misdiagnosed and the doctor gave you antibiotics to get rid of the new &lt;span style="font-style: italic;"&gt;E. &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_2"&gt;coli&lt;/span&gt;&lt;/span&gt; O157:H7 residents in your intestine (unfortunately, the current best treatment for O157:H7 is to wait a week or two for it to go away).  Now, the bacteria in your intestine are being bombarded by these antibiotic chemicals that kill off most of them.&lt;br /&gt;&lt;br /&gt;So how do bacteria survive, and often thrive, in such a complicated environment?&lt;br /&gt;&lt;ol&gt;&lt;li&gt;they are tiny so lots of them can live in a small space; their large number allows for diversity; and diversity is largely why they survive drastic changes in their environment - only a few diverse individuals of a particular species need to survive each in order for the species to remain a resident of the normal flora&lt;br /&gt;&lt;/li&gt;&lt;li&gt;the different species trade DNA in the gut;  this means that if one type of bacteria in the gut develops resistance to a particular antibiotic, another type of bacteria can develop resistance more easily, because they can just obtain the important piece of DNA from the bacteria that already figured out how to survive&lt;/li&gt;&lt;/ol&gt;Point 1 above deals with mutations and selective pressure.  Point 2, the gene-exchanging idea, deals with a phenomenon called horizontal gene transfer.  However, it's not clear how often any of these things happen, or if they occur more often in some situations than others (e.g. is there more horizontal gene transfer when there is a strong selecting force like an antibiotic). Mutations and selective pressure has been studied for quite a while in the lab (see the NY Times article &lt;a href="http://www.nytimes.com/2007/06/26/science/26lab.html"&gt;"Fast-Reproducing Microbes Provide a Window on Natural Selection"&lt;/a&gt;) with pretty interesting results.  But I think its time we moved these evolution studies into more complicated environments like the gut. We also need to further explore the extent to which horizontal gene transfer plays a role in these organisms' survival and adaptation, because previous studies (as far as I know) have focused more on the mutations in cultures of a single bacteria put under some sort of selective pressure.&lt;br /&gt;&lt;br /&gt;Here's what I propose:&lt;br /&gt;&lt;br /&gt;Inoculate a &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_3"&gt;gnotobiotic&lt;/span&gt; mouse with N species of bacteria (probably make N = low = 2-4;  also make the species diverse: one &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_4"&gt;Firmicute&lt;/span&gt;, one &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_5"&gt;Bacteroidetes&lt;/span&gt;, and one &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_6"&gt;Archaea&lt;/span&gt;).  You should probably place things under selective pressure to &lt;span style="font-style: italic;"&gt;push&lt;/span&gt; the organisms in different directions.  For example, give the mouse diets that have few of the nutrients necessary for the gut residents, or add one antibiotic resistant strain of bacteria and give the mouse a weak but constant dose of the antibiotic (to see how long / if the bacteria horizontally pass on the gene).&lt;br /&gt;&lt;br /&gt;Now pass on the microbial residents to new mice (either the children of the &lt;span class="blsp-spelling-corrected" id="SPELLING_ERROR_7"&gt;inoculated&lt;/span&gt; mouse or another germ-free mouse; both would be interesting).  This passing could be done by mixing a little feces in their food, but it would probably just happen naturally if you put them in the same cage for a few days.  Now at set times in each mouse's life take a feces sample to be sequenced at a later date (might as well delay sequencing as long as possible, since the stuff gets so much cheaper with time).  Then sequence the frozen samples to see the extent of the mutations and gene transfers over time and in different selective environments and genetic backgrounds of mice.  The sequencing will also show how the proportions of the normal flora change over time.&lt;br /&gt;&lt;br /&gt;The problem with this experiment is that it would take several years of work, and you'd always need to be careful to pass on the flora before the mouse died.  But according to that &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_8"&gt;NYTimes&lt;/span&gt; article I linked to above, most of the action in the single-species studies &lt;span class="blsp-spelling-corrected" id="SPELLING_ERROR_9"&gt;occurred&lt;/span&gt; at the beginning, so even the early results might yield some interesting insights into gut ecology and evolution.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3472100433840968771-4328806124597782299?l=blog-di-j.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://blog-di-j.blogspot.com/feeds/4328806124597782299/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3472100433840968771&amp;postID=4328806124597782299' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3472100433840968771/posts/default/4328806124597782299'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3472100433840968771/posts/default/4328806124597782299'/><link rel='alternate' type='text/html' href='http://blog-di-j.blogspot.com/2007/06/mutations-gene-passing-and-evolution-of.html' title='Mutations, gene passing, and the evolution of gut microbes'/><author><name>J</name><uri>http://www.blogger.com/profile/02818497841390003056</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3472100433840968771.post-272742930984556487</id><published>2007-06-15T09:39:00.000-07:00</published><updated>2007-07-01T09:14:51.795-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='how the mind works'/><title type='text'>why can you feel when you are close to remembering something?</title><content type='html'>There are plenty of models for how memory works at both a psychological and a physiological level.  What I want to talk about now is a curious thing that happens with what is typically referred to as &lt;span style="font-style: italic;"&gt;long-term&lt;/span&gt; memory.  Long-term memory refers to the stuff that we remember for more than a few minutes - things like our phone number, the names of our friends, and where we live.  These memories must be stored chemically in the brain.  I think the current working model is that memories are somehow encoded in the strength and pattern of synapses, which are junctions between the neurons in our brains.&lt;br /&gt;&lt;br /&gt;The curious thing about memory that I'd like to discuss now is (as you probably have inferred from the title) &lt;span style="font-style: italic;"&gt;why can you feel when you are close to remembering something&lt;/span&gt;? In case you aren't with me, let me give an example.  Let's say you're watching a movie; you see an actor that you've seen in many movies before, and your friend next to you says, "who's that actor?".   If you know the actor extremely well, the name will flow off your tongue like a reflex.   If you don't know the actor's name instantly, you will almost immediately get a &lt;span style="font-style: italic;"&gt;feeling&lt;/span&gt; inside, like a gut instinct, that you can use to estimate if you'll probably come up with the actor's name if you &lt;span style="font-style: italic;"&gt;think harder&lt;/span&gt; about it.&lt;br /&gt;&lt;br /&gt;It's odd right?  Somehow we can feel if further searching of our brain is likely to reveal that actor's name.  And the feeling is relatively accurate.   For example, I don't watch that many movies, but like many people I remember faces pretty well.  So it's pretty common I'll recognize an actor's face in a movie, but I'll know that I have no idea what their name is.  However, if I had previously learned the actor's name and the name was still at least weakly burned into my memory, I would get a sorta gut feeling for how likely I'd be able to dig that name out of my brain.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class="blsp-spelling-error" id="SPELLING_ERROR_0"&gt;Ok&lt;/span&gt;, let's say we had a good feeling that we'd be able to find the actor's name if we &lt;span style="font-style: italic;"&gt;think harder.  &lt;/span&gt;What happens next?&lt;br /&gt;&lt;br /&gt;We would start to &lt;span style="font-style: italic;"&gt;search&lt;/span&gt; our brains around the areas of our brains where the actor is.  Somehow we go to the memories we have for the actor, and we &lt;span class="blsp-spelling-corrected" id="SPELLING_ERROR_1"&gt;dig&lt;/span&gt; around and look through them.  We'll see what other movies we can remember the actor in and what other actors we've seen him with.  We can often even think of friends we know who would be able to help us answer the question.   And then to move towards the actor's name, we might toss around names in our heads that &lt;span style="font-style: italic;"&gt;feel right&lt;/span&gt;, so that we can listen to them and see if they &lt;span style="font-style: italic;"&gt;sound right&lt;/span&gt; too.  It is all a sorta fuzzy process that seems to move by intuition, but the intuition has a definite sense of direction.  We can feel when we've made progress towards the name in our head even if we still do not know the name yet (e.g. we say, "I've almost got it, just give me a second"). When we do find the right name, it's like &lt;span style="font-style: italic;"&gt;&lt;span class="blsp-spelling-error" id="SPELLING_ERROR_2"&gt;BAMM&lt;/span&gt;&lt;/span&gt; - after all that searching you've found the name, said it to yourself, and you get a sorta feeling of satisfaction that instantly lets you know that - yes that is the right name.&lt;br /&gt;&lt;br /&gt;Why does this happen?&lt;br /&gt;&lt;br /&gt;How does this work?&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3472100433840968771-272742930984556487?l=blog-di-j.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://blog-di-j.blogspot.com/feeds/272742930984556487/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3472100433840968771&amp;postID=272742930984556487' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3472100433840968771/posts/default/272742930984556487'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3472100433840968771/posts/default/272742930984556487'/><link rel='alternate' type='text/html' href='http://blog-di-j.blogspot.com/2007/06/why-can-you-feel-when-you-are-close-to.html' title='why can you feel when you are close to remembering something?'/><author><name>J</name><uri>http://www.blogger.com/profile/02818497841390003056</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3472100433840968771.post-7031701470770520016</id><published>2007-06-12T18:47:00.000-07:00</published><updated>2007-06-13T05:26:17.399-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='how the mind works'/><title type='text'>the brain isn't multithreaded</title><content type='html'>It is impossible to have more than one thought at a time.&lt;br /&gt;&lt;br /&gt;Try it.&lt;br /&gt;&lt;br /&gt;Try to use your brain to have two unrelated thoughts at the same time.  For example think about your kids and listen to a song on the radio - being careful to listen closely and understand all of the words.  You can either listen to the words or think about your kids.  You may be able to swap back and forth really fast to create the illusion that you are doing both.  The two may even get mixed up so that you think about how the words you're listening to relate to your kids. But if you really pay close attention, you'll notice you are either listening to the words or thinking about the kids and never doing both at &lt;span style="font-style: italic;"&gt;exactly&lt;/span&gt; the same time.&lt;br /&gt;&lt;br /&gt;I've asked lots of my friends to try the same thing - none could have two simultaneous thoughts.&lt;br /&gt;&lt;br /&gt;So in some ways our minds must function a little like a single-processor computer (up until about 2005 almost all personal computers only had one processing core).   Single-processor computers that are sufficiently fast create the illusion of multiple simultaneous computation.  Although you can play your itunes and type an email at the same time, the computer is actually going back and forth between the two so fast that you don't notice it.  The computer does this swapping by using memory and different buffers.  It puts all the information for your email and for your itunes in a short-term memory, and then it simply goes back and forth between the two and computes the next things that need to be computing.  Similarly I imagine our brains are just sticking the things we are currently multitasking into a short-term memory, and it just swaps them in and out to think about them.  This process creates the illusion of thinking about multiple things simultaneously.&lt;br /&gt;&lt;br /&gt;Our inability to have simultaneous thoughts does not preclude us from simultaneously &lt;span style="font-style: italic;"&gt;doing &lt;/span&gt;multiple things.   We must use the spinal cord or other neuronal cells throughout our body as the buffers to keep us running smoothly.  So we can think about our next tennis shot while our arm is busy hitting the current one.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3472100433840968771-7031701470770520016?l=blog-di-j.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://blog-di-j.blogspot.com/feeds/7031701470770520016/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3472100433840968771&amp;postID=7031701470770520016' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3472100433840968771/posts/default/7031701470770520016'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3472100433840968771/posts/default/7031701470770520016'/><link rel='alternate' type='text/html' href='http://blog-di-j.blogspot.com/2007/06/brain-isnt-multithreaded.html' title='the brain isn&apos;t multithreaded'/><author><name>J</name><uri>http://www.blogger.com/profile/02818497841390003056</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3472100433840968771.post-8327834924150266914</id><published>2007-06-12T17:56:00.000-07:00</published><updated>2007-06-30T13:44:33.921-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='scientific ethics'/><category scheme='http://www.blogger.com/atom/ns#' term='microarrays'/><category scheme='http://www.blogger.com/atom/ns#' term='epistemology'/><title type='text'>Microarrays: scientific indulgences</title><content type='html'>&lt;p&gt;Let me first admit that the analogy isn't perfect, but then let's move on (if you're curious) to see how the two are more  similar than they appear at first glance.  I also assume you know a little about religion and a bit about  microarrays, otherwise, why are you here?&lt;/p&gt;   &lt;b&gt;Indulgences&lt;/b&gt;  &lt;p&gt;I'm not much of a Roman Catholic aficionado, so I only know the bad side of indulgences (was/is there a good side?). Pope Leo X, true to his Medici lineage, needed lots of cash to build nice pieces of art (in this case primarily St. Peter's Basilica in Rome) to dazzle the public and to make him look like a powerful badass. When the cash flow got low, he decided to sell piles of indulgences (which were little pieces of paper offering forgiveness). This method worked well for a while. Unfortunately for Leo, the U.S.A. did not exist at this time, as this product would have been a hit in the land known for its people that throw money at quick fixes for all of their problems. More unfortunately for Leo, a German dude, Martin Luther, had the crazy idea to read the Bible and the even more preposterous idea to write the whole book out in a language that people could actually read (we'll kinda; actually few people were literate back then, but more read German than Latin I'd assume). So when people started reading the Bible (particularly Luther), they soon realized that selling forgiveness was a load of crap. Many took this as an excuse to rebel, loot, etc... The Catholic Church was in shambles, the world was at war, and it was our first big step towards creating the more intellectually free society we (sorta) have today in modern science. &lt;/p&gt;  &lt;b&gt;Microarrays&lt;/b&gt;  &lt;p&gt;Science has a few sins of its own.  Not bathing, acting bizarre on purpose so people think you're  the out-there smart type, and asking stupid irrelevant questions to show people how clever you are, are all just inconveniences  not sins.  Science has one &lt;i&gt;mortal&lt;/i&gt; sin: knowingly publishing false information; it brings instant fame,  but leads to certain excommunication if you're caught.  The venial sins are more common.  The two most frequent  being: not publishing enough papers and not having enough grants.  The punishment can be harsh for non-tenured scientists,  but the tenured amongst us still must suffer the psychological trauma of being disassociated by your colleagues and being  considered a has-been.  Thanks to microarrays (which are  little pieces of glass with DNA on them that allow you  to measure the relative expression of &lt;i&gt;many&lt;/i&gt; genes simultaneously) there is help.  There is enough information  in a microarray result that you're bound to find something interesting to publish, and &lt;i&gt;so far&lt;/i&gt;  there is no modern day Martin Luther.&lt;br /&gt;&lt;/p&gt;&lt;br /&gt; &lt;b&gt;Microarray Indulgence Quotes&lt;br /&gt;&lt;br /&gt;&lt;/b&gt;&lt;table border="0" cellpadding="5" cellspacing="0"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td class="doc" valign="top"&gt;My current professor doesn't ask me to go to conferences too often,   but when I was working at &lt;span style="font-size:-2;"&gt;&lt;i&gt;an unnamed famous university&lt;/i&gt;&lt;/span&gt;, I'd present posters all the time.   "Ten chips and a t-test and you got an Abstract".&lt;/td&gt;  &lt;td class="doc" valign="top"&gt;&lt;i&gt;a phD student&lt;/i&gt; at the Boston University Pub&lt;/td&gt;&lt;/tr&gt;  &lt;tr&gt;&lt;td class="doc" valign="top"&gt;There are two major ways people use microarrays: hypothesis testing and hypothesis  generation. A lot of labs buy ten chips for hypothesis generation when they need to write a grant.  &lt;/td&gt;  &lt;td class="doc" valign="top"&gt;affymetrix field application specialist&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3472100433840968771-8327834924150266914?l=blog-di-j.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://blog-di-j.blogspot.com/feeds/8327834924150266914/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3472100433840968771&amp;postID=8327834924150266914' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3472100433840968771/posts/default/8327834924150266914'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3472100433840968771/posts/default/8327834924150266914'/><link rel='alternate' type='text/html' href='http://blog-di-j.blogspot.com/2007/06/microarrays-scientific-indulgences.html' title='Microarrays: scientific indulgences'/><author><name>J</name><uri>http://www.blogger.com/profile/02818497841390003056</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3472100433840968771.post-7878155139684215707</id><published>2007-06-12T17:53:00.000-07:00</published><updated>2007-06-12T17:55:26.494-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='bioinformatics'/><category scheme='http://www.blogger.com/atom/ns#' term='books'/><title type='text'>Bioinformatics books</title><content type='html'>Bioinformatics has been around for many years, and there's still only  one good book: &lt;i&gt;Biological Sequence Analysis&lt;/i&gt; by Durbin &lt;i&gt;et. al.&lt;/i&gt;, which summed up the state-of-the-art in 1999.    It's sad but true that almost  everything in bioinformatics is in that book. Since the books writing, the number of bioinformatics  publications has sky-rocketed, but the great majority of the publications are variations on a theme of Durbin &lt;i&gt;et.al..&lt;/i&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3472100433840968771-7878155139684215707?l=blog-di-j.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://blog-di-j.blogspot.com/feeds/7878155139684215707/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3472100433840968771&amp;postID=7878155139684215707' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3472100433840968771/posts/default/7878155139684215707'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3472100433840968771/posts/default/7878155139684215707'/><link rel='alternate' type='text/html' href='http://blog-di-j.blogspot.com/2007/06/bioinformatics-books.html' title='Bioinformatics books'/><author><name>J</name><uri>http://www.blogger.com/profile/02818497841390003056</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3472100433840968771.post-4371608738171952289</id><published>2007-06-12T11:36:00.000-07:00</published><updated>2007-12-02T14:03:19.440-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='untested ideas'/><category scheme='http://www.blogger.com/atom/ns#' term='host microbe interactions'/><title type='text'>Live imaging of host-microbe interactions</title><content type='html'>&lt;a href="http://en.wikipedia.org/wiki/Two-photon_excitation_microscopy"&gt;Two photon excitation microscopy&lt;/a&gt; allows imaging of living tissue up to a depth of one millimeter.  Karel Svoboda has used this to amazing effect to study neurons in mice.  I wonder if it would be possible to apply a similar approach to study host-microbe interactions in real time.  The idea would be to create a few strains of fluorescently labeled bacteria (or make a fusion protein or U. Alon style fluorescent promoter of a key protein or two) and watch them interact in real time with the gut of a mouse after you feed or inject the mouse with different drugs.&lt;br /&gt;&lt;br /&gt;Questions:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;how thick is the intestinal lining  (i.e. is it greater than 1mm?)&lt;/li&gt;&lt;li&gt;how easy it is to do surgery on the mouse to get to the intestine?&lt;/li&gt;&lt;li&gt;how easy is it to sew the mouse back up and have sorta a ready-access flap into the intestine like Svoboda does with the mouse neurons?&lt;/li&gt;&lt;li&gt;how much will it alter the physiology of the mouse to have an open wound near the intestine?&lt;/li&gt;&lt;/ol&gt;&lt;a href="http://ryasuda.blogspot.com/2005/01/2-photon-papers-karels-recommendations.html"&gt;2-photon links&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Update&lt;/span&gt; &lt;span style="font-style: italic;"&gt;Dec 2, 2007&lt;/span&gt;&lt;br /&gt;&lt;strong style="font-weight: normal;"&gt;Jost Enninga, Philippe Sansonetti and Regis Tournebize wrote an excellent review that covers this idea and much more (as is always the case, this idea has already be worked on to some extent, though never in the context of an intestine as far as I know).&lt;br /&gt;&lt;br /&gt;&lt;/strong&gt;&lt;span style="font-style: italic;"&gt;Roundtrip explorations of bacterial infection: from single cells to the entire host and back.&lt;/span&gt;&lt;br /&gt;Trends in Microbiology, Nov 2007&lt;br /&gt;&lt;a href="http://dx.doi.org/10.1016/j.tim.2007.10.006" target="doilink" onclick="var doiWin; doiWin=window.open('http://dx.doi.org/10.1016/j.tim.2007.10.006','doilink','scrollbars=yes,resizable=yes,directories=yes,toolbar=yes,menubar=yes,status=yes'); doiWin.focus()"&gt;doi:10.1016/j.tim.2007.10.006&lt;br /&gt;&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3472100433840968771-4371608738171952289?l=blog-di-j.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://blog-di-j.blogspot.com/feeds/4371608738171952289/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3472100433840968771&amp;postID=4371608738171952289' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3472100433840968771/posts/default/4371608738171952289'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3472100433840968771/posts/default/4371608738171952289'/><link rel='alternate' type='text/html' href='http://blog-di-j.blogspot.com/2007/06/two-photo-microscopy-live-imaging-of.html' title='Live imaging of host-microbe interactions'/><author><name>J</name><uri>http://www.blogger.com/profile/02818497841390003056</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3472100433840968771.post-992814717804270128</id><published>2007-06-12T10:59:00.000-07:00</published><updated>2007-09-03T09:42:05.047-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='untested ideas'/><category scheme='http://www.blogger.com/atom/ns#' term='host microbe interactions'/><title type='text'>Do microbes directly sense host hormones?</title><content type='html'>Microbes have been living inside vertebrates for a long time now.  They and their hosts are constantly interacting.   I think it is not clear however on how many scales they interact.  For sure the host has things an acidic stomach and bile secretions (which is like a detergent) to keep the microbe populations from growing outta control.  The innate immune system also has plenty of peptides that have antimicrobial properties.  For every host combatant action, there&lt;br /&gt;must be a subsequent microbe response for the microbe to survive (e.g. &lt;span style="font-style: italic;"&gt;E. &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_0"&gt;coli&lt;/span&gt;&lt;/span&gt; can export detergent-like bile molecules to survive in intestine).  But is it known if bacteria can respond to general non-combatant host properties like the presence of hormones circulating about?&lt;br /&gt;&lt;br /&gt;For sure if you have a big boast in adrenaline the bacteria populations will be stimulated, but this could occur simply as a side effect to the host's response to the hormone.  I want to know  if bacteria can directly sense any hormones.   What about &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_1"&gt;leptin&lt;/span&gt;?  Work in &lt;a href="http://gordonlab.wustl.edu/"&gt;Gordon lab&lt;/a&gt; at &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_2"&gt;WashU&lt;/span&gt; suggests that low &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_3"&gt;leptin&lt;/span&gt; might send a signal to the &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_4"&gt;microbiota&lt;/span&gt; to become more efficient at extracting calories from food.  I'd like to know if this or any other hormonal signals are directly sensed by the microbial community.&lt;br /&gt;&lt;br /&gt;Could try running &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_5"&gt;microarrays&lt;/span&gt; of microbes in the presence/&lt;span class="blsp-spelling-corrected" id="SPELLING_ERROR_6"&gt;absence&lt;/span&gt; of different hormones.&lt;br /&gt;&lt;br /&gt;(see &lt;span style="font-style: italic;"&gt;Nature News &amp;amp; Views&lt;/span&gt; article "Obesity and gut flora", &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_7"&gt;Bajzer&lt;/span&gt; M and &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_8"&gt;Seeley&lt;/span&gt; &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_9"&gt;RJ&lt;/span&gt; and the two articles from the Gordon lab in the same issue "An obesity-associated gut &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_10"&gt;microbiome&lt;/span&gt; with increased capacity for energy harvest" and "Human gut microbes associated with obesity").&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3472100433840968771-992814717804270128?l=blog-di-j.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://blog-di-j.blogspot.com/feeds/992814717804270128/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3472100433840968771&amp;postID=992814717804270128' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3472100433840968771/posts/default/992814717804270128'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3472100433840968771/posts/default/992814717804270128'/><link rel='alternate' type='text/html' href='http://blog-di-j.blogspot.com/2007/06/do-microbes-response-to-host-hormones.html' title='Do microbes directly sense host hormones?'/><author><name>J</name><uri>http://www.blogger.com/profile/02818497841390003056</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry></feed>
