Richard Roberts at BioVision Alexandria 2010: I give you the sequence and you give me the function!
Posted by: Mariam Rizkallah in Attended for you, BioinformaticsI had the chance to attend the international conference BioVision Alexandria 2010 held at the Bibliotheca Alexandrina Conference Center in Alexandria, Egypt, from 12-15 April 2010. I really want to share with you the >50 talks that I attended, given by Nobel laureates and other remarkable scientists specialized in health-related topics.
I will start with this talk by Dr. Richard Roberts, who received the Nobel Prize in Physiology or Medicine in 1993 for the discovery of split genes and mRNA splicing in 1977. He is now joint Research Director at New England Biolabs. Dr. Roberts entitled his talk: “Collaborating to bridge the gap between computation and experimentation”. I will try to sum it up for you.
I. Let’s start with stating this fact that Genomics is rapidly taking over the field of biology, at the research level at least.
Examples:
- Sequencing of the human genome or “The Human Genome Project” provides the basis of the emerging field “personalized medicine”.
- Plant genomics are unbelievably important for food and –maybe- for energy production purposes, unicellular plants mostly.
- Ocean organisms are very interesting, as they produce potential new antibiotics and many other useful substances.
- Bacteria and archaea are making up to 50% of the living biomass.
Bacteria are everywhere, they live in the oceans, the soil -plants require them for nitrogen fixation- animals and us; our gut, skin, nose and mouth. Most of these bacteria we know absolutely nothing about because we can’t grow them on cultures.
But this is about to change now thanks to DNA sequencing.
II. So, the core of today’s science is DNA sequencing… but unfortunately, DNA sequencing has its drawbacks.
1) DNA sequencing is getting faster and cheaper in a rate that is exceeding our ability to understand the function or the biochemical pathways of every single gene sequenced. Or, if we’re really lucky, we can make a guess –based on sequence similarity– that this gene, for example, encodes for a “hydrolase”, but just a hydrolase with no clue about the exact biochemical pathways it’s involved in or its substrate.
Dr. Roberts gave this interesting simile that getting more and more DNA sequences of bacteria is like getting a car with a list of all its parts with no idea about how they fit together or how they work. Biology is about understanding how life works. If we’re talking about synthesizing life today, we have to understand how life works first. He dreams that before he dies, he can understand how a very simple bacterium actually works, what is the chemistry that is going on there?
So, the first problem is in the very rapid growth in DNA sequencing without a similar growth in annotation/renaming/finding the function. Here’s a quite older graph showing the growth of sequence databases and annotations from 1982 till 2006, close to the one Dr. Roberts presented, from 1995 till 2009. If you can get to a newer one, please do not hesitate to comment on the post and add its link.
2) The computer is not enough! Do the biochemistry in the lab! In spite of the large amount of money spent on sequencing different organisms; we still are not making any progress in understanding them. This might be that when we get the DNA sequence, translate it into its corresponding amino acid sequence, our best shot then is to compare it to the existing protein sequences in the databases to know how it looks like what and thus predict its function. If two protein sequences look the same, there’s a chance, not a guarantee, that they have the same function, because if there’s a one amino acid difference, they may have different substrates and thus different functions. How to tell? The computer is not enough! Do the biochemistry in the lab! This will lead us to the third problem.
3) All substrates are not available to all labs all the time. So, one lab can’t determine the function of all genes on earth. He gave this example: if you want to assay a specific disaccharide hydrolase; to determine its substrate, you need to have disaccharide combinations of all possible sugars and test it on them.
4) Lack of good funding for biochemistry. Funding agencies think that biochem- is an “old-fashioned” field! They are funding the more appealing genome-wide studies, which is very superficial.
III. Dr. Roberts’ suggestions for a solution: “COMBREX”
Identifying Protein Function—A Call for Community Action.
Dr. Roberts and colleagues have got an NIH fund in October 2009 to establish COMBREX (maybe: COMputational Biology Reading EXperiments). The work flow will be very much like this:
Step1: Establishment of a database. From 1200 complete bacterial and archaeal genome sequences, computational biologists groups generate protein families/domains of unknown function (DUFs), predict the function based on sequence similarity and establish a database.
Step2: Coordination of the efforts between biochemistry labs, experimentalists/biochemists (young grads, even technicians) offer a proposal to test those predictions, gain an exclusive access to those genes of interest for 6 months + a small grant (5,000-10,000 USD) to carry out single gene studies. If we know one protein’s function, we know the function of the whole protein family.
Step3: Making of a Wikipedia-type page for suggestions and predictions.
Step4: Establishment of a journal to publish the findings.
IV. What genes should we focus on/start with?
Dr. Roberts suggested this list, which is ordered in a descending order:
1) Genes abundant in many many different organisms; in humans, animals, bacteria… etc. Those are likely to have conserved important functions.
2) E. coli, the most widely used and so-called “the best studied” organism, we can make a full characterization of it.
3) Helicobacter pylori, to understand the biochemistry of such an important pathogen that we know nothing about.
4) Identify cloned, translated and frozen open reading frames (ORFs) products.
V. Who can help?
Dr. Roberts said almost everybody, computational biologists to predict, biochemists to test, geneticists, as personnel university students -even high school students it can help them to get a genuine science project-, retired professors to supervise and maybe get back to the lab, and funding agencies.
You can watch this talk and most of the conference’s talks via the Bibliotheca Alexandrina webcast.
Tags: annotation, Bibliotheca Alexandrina, biochemistry, BioVision Alexandria 2010, COMBREX, database, DNA sequencing, domains of unknown function, drawbacks, DUFs, E. coli, Helicobacter pylori, human genome project, New England Biolabs, Nobel laureates, open reading frames, ORFs, Richard Roberts, sequence similarity, synthetic life