Archive for the “Bioinformatics” Category
After the Human Genome Project was successfully completed in April 2003 and it was assured that humans are identical in the sequence of their genome by 99.9%, researchers are moving on to find more about the 0.1% left. Although our genome is made out of 3 billion bases (A’s, G’s, C’s and T’s), the 0.1% genetic difference is extremely significant. This is due to the fact that this small percentage holds the key to most of the consequences of such a variation between human beings (e.g susceptibility to diseases and also response to drugs). This introduced the term SNPs (single nucleotide polymorphism) which was found to be highly involved in variation of response to many drugs (like anti-cancers) and the list is increasing every day.
To be more focused on the human variations represented by the 0.1 percent, the NIH led the international HapMap project. Their work was performed on four populations groups: the Yoruba people in Ibadan Nigeria,the Japanese in Tokyo, the Han Chinese from Beijing and Utah residents from western and northern Europe. The work began in October 2002 and successfully ended in October 2005. The work of three successive years helped the researchers invent a shortcut for studying SNPs. Scientists believe that there are about 10 million SNPs distributed among 3 billion base pairs which make up our genome, so scanning the whole genome of millions of people for such SNPs would be extremely expensive. After the HapMap project, researchers demonstrated that variants usually tend to cluster into neighborhoods (called haplotypes) and thus the number could be reduced to 300,000 SNPs only. This means that they could reduce the work load by about 30 folds.
The genome-wide association (GWA) studies aim to pinpoint the genetic differences, which cause a certain disease (or a biological trait) by comparing a group of people (who have the trait under research) to a control group (people who are free from this trait). Utilizing thousands of SNPs markers, we can identify regions (loci) which are statistically different between patient and control groups. Thus, we can identify the genetic difference between sick and control people, even though the difference was subtle. This means that the combination of slightly altered genes plus environmental factors could be well studied. The conventional ways, usually used to study genetic differences, are mainly based on selecting the candidate gene based on knowing or suspecting the mechanism of the disease. GWA helps scanning of the whole genome in a comprehensive unbiased manner. It will let us get the whole picture about other, non expected, contributing genes. In this way, GWA studies will help us study the multi-factorial diseases (like cancer and diabetes) in a more rationalised way.
Another challenge has come up: What about the genetic variations due to geographic ancestry? It is also a significant contributing factor to variation among humans and all the efforts are directed towards making a somewhat universal map of human genome to help develop individualized drugs. A group of scientists led by David Reich, an assistant professor at Harvard Medical School, described a quantitative method that can correct such errors due to geographical ancestry known collectively as “population stratification”. It will help if the disease groups, sharing the same trait, have differences in their geographic ancestry.
Tags: Genome wide association studies, Genome-wide association studiesX, GWA, GWAS, Hap-Map project, HGP, human variome project, SNPs, variome
No Comments »
Metagenomics is a culture independent approach that has contributed extensively to the study and understanding previously unidentified microbial communities. Seeking a further understanding of employing metagenomics in the study of the Red Sea microbial communities, we are pleased to interview Dr. Rania Siam, an Associate Professor in the Biology Department, the Director of the Biotechnology Graduate Program at the American University in Cairo (AUC) and an Investigator in the Red Sea Marine Metagenomics Project that is currently running at the AUC in collaboration with King Abdullah University of Science and Technology (KAUST), Woods Hole Oceanographic Institution and Virginia Bioinformatics Institute at Virginia Tech. Dr. Siam holds a Ph.D. in Microbiology and Immunology from McGill University. In addition, she held several post-doctoral positions at McGill Oncology Group, Royal Victoria Hospital, The Salk Institute for Biological Studies and The Scripps Research Institute. Since 2008, Dr. Hamza El Dorry (PI) and Dr. Siam (Co-PI) have been leading the Red Sea Metagenomics research team. The team is exploring novel bacterial communities in the Red Sea through actively participating in Red Sea expeditions for sampling and performing extensive molecular biology, genomics and computational analysis of the data.

1- Dr. Siam, thank you very much for accepting our request. Would you please explain for us the driving motives behind doing metagenomics research in the Red Sea?
The Red Sea is a unique environment in the region that remains to be explored. Thus, working on the Red Sea gives us the opportunity to perform essential research for the region. Furthermore, our main interest are the Red Sea brine pools that are unique environments in terms of high temperature, high salinity, high metal contents and low oxygen. Microbial communities living in these environments are known as extremophiles. The survival of extremophiles in such drastic conditions indicates their possession of genes with novel properties that underlie these unique survival characteristics. Thus, we are highly motivated to explore these novel microbial communities and their unique properties. In addition, we are interested in extracting biotechnological products from the Red Sea that can be beneficial as antimicrobials and anticancer agents.
2- Would you please outline for us the objectives of the project and the main activities inside and outside the laboratory?
In our project, one of our main aims is establishing a Red Sea marine genomic database to be accessed by scientists’ worldwide. Additionally, we are screening this database for biotechnological pharmaceutical products as enzymes and anticancer agents. There are three main activities in the project: sampling, molecular biology/genomics work and computational analysis of the data. Concerning sampling, it is a challenging process that requires rigorous planning where samples should be subjected to proper processing and storage till arrival to the labs. In labs, samples are subjected to different procedures starting from DNA extraction followed by Whole Genome Sequencing (WGS) to identify unknown genes or unique ones and help us understand novel microbial communities. This requires rigorous computational analysis to make sense of our data. Furthermore, we construct fosmid libraries for isolation and purification of genes of biotechnological interest as lipases and cellulases. In addition, we carry out 16s rRNA phylogenetic analysis on the microbial communities present in the samples.
3- Since the idea is novel, we would like to know about the nature of samples, the parameters and the challenges imposed during the process of sampling.
Basically, the sampling process requires well-equipped research vessels as the Woods Hole Oceanographic Institute ‘Oceanus’ and The Hellenic Center for Marine Research (HCMR) ‘Aegeo’. In addition, it is essential to have a team of physical oceanographics for adequate sampling.

http://www1.aucegypt.edu/publications/auctoday/AUCTodayFall09/Cure.htm
Image Source: AUC Today
We started with two different brine pools: Atlantis II Deep and Discovery Deep. As brine pools, these two regions are characterized by the presence of extremely harsh conditions as I mentioned before. In the 2008 and the 2010 KAUST Expedition to the Red Sea, we collected two forms of samples: Large volume water and sediments. In both cases, we face challenges during sample collection. Collecting large volume water samples can take up to 4 hours. In case of bad weather, it is nearly impossible to collect samples. Regarding the sediments, the heavy weight of the sediment core is the main challenge since it may drag people to the water during the sampling process.
The water samples are collected using CTDs (an acronym for Conductivity, Temperature and Depth), it is formed of 10 liter bottles that collect samples and measure the conductivity, temperature and depth, in addition to other parameters. CTDs are capable of measuring these physical properties from each meter of water. Accordingly, they are able to retrieve 2200 readings for each parameter at 2200 m depth. This is very beneficial as it allows us to correlate the physical and chemical parameters with the nature of microbial communities obtained from each sample.
4- What are the reasons behind the choice of Atlantis II Deep and Discovery Deep brine pools for study?
These sites are unique. Atlantis II Deep is 2200 meters below the water surface. It is characterized by having high temperature (68 °C), high heavy metal content, high salinity and little oxygen. Thus, these extremely drastic conditions are motivating us to explore the extremophilic microbial communities in this pool. Discovery Deep is adjacent to Atlantis II but the conditions are less harsh and varies in its heavy metal content. This encourages us to undergo comparative genomic analysis between the two regions.
5- Did the relatively new metagenomic approach of sequencing multiple genomes, combined with, the novelty of employing this approach in the study of microbial milieu in the Red Sea imply some unprecedented practical challenges in retrieving entire and authentic DNA sequences?
Yes, many challenges are present. For example, in case of sediments, many cells die. This in turn can make the process of DNA extraction more difficult. However, we managed to cope with this problem by quickly extracting the DNA from the sediments following its arrival to the labs and avoid freezing and thawing. Another problem with sediments is the presence of impurities that are co-extracted with DNA and interfere with the analysis. Concerning water samples, the main challenge is that the amount of DNA in the samples is very minute. This was dealt with by filtering large volumes of water up to 500 liters per sample.
6- Is recognizing and validating sets of data that are pointing out to specific patterns of microbial diversity or novel genes considered to be challenging?
Actually, we have retrieved enormous amount of data and a large percentage of the data has no match to sequences present in the other genomic databases. This has inspired us to think of new approaches for data analysis to identify the role of these novel sequences and seek collaborations with computational biologists.
7- Finally, do you think there are other environments in Egypt with unique properties which make it promising for employing metagenomics to discover novel genes and bacterial strains?
Yes, actually Egypt is very rich in environments with unique properties. For example, we have the deserts, Siwa‘s hot springs and the Nile. Many unique environments are yet to be explored and lots of research needs to be performed. Our natural resources are limitless.
Tags: 16s rRNA phylogenetic analysis, Aegeo, Atlantis II, AUC, Brine pools, Computational analysis, Computational biologists, CTDs, Culture-independent, Discovery Deep, DNA extraction, Drastic conditions, Expeditions, Extremophiles, Fosmid libraries, genomics, KAUST, Metagenomics, microbial communities, Molecular biology, Oceanus, Red Sea, Virginia Bioinformatics Institute, Virginia Tech., WGS, Woods Hole Oceanographic Institution
2 Comments »
Man or Machine? Bioinformaticians at McGill university are betting on man. They want to put, what was previously wasted, time on the internet into use. Thus, Phylo was created. That is the name of an online interactive game, aiming to solve the problem of multiple sequence alignments, one that has been agonizing researchers for some time now. The human mind is evolved in a way, that even computers supposedly can’t beat. We are capable of recognizing certain patterns and forming interrelations between them, a skill which numerous lines of codes can not easily accomplish.
So what to do? Once you open the link, go ahead and sign up, although it is possible to play as a guest. But hey, if I am taking time off to contribute to science, I want to be able to brag about it later on. 🙂 The creators of the game have formed a very comprehensive tutorial, explaining how the game works. They use down-to-earth terms and comparisons to simplify matters, so people from all walks of life can jump in as well.
The coloured blocks: Those symbolize the nucleotides. Correspondingly, there are four of them: Orange, Green, Blue, Purple. I wasn’t able to find exactly which colour codes for which nucleotide, something which particularly intrigued me, since purple blocks were scanty in my alignment.
Aim of the game: Our job is to align these blocks, as best as possible, so that the blocks’ colour in the first line are matching those in the second line. Matching blocks gives you a score of 1 point and mismatched ones deduct 1 point. This should be preferably done WITHOUT having to create gaps. They point out that gaps represent the mutations, which the sequences have incurred during evolution. In the easier stages, the sequences are provided on two lines, representing two different species. As it gets more difficult, more lines are provided and related together through a mini-phylogenetic tree, to allow you to pinpoint your priorities. Once you have reached the same score a computer had previously provided “par”, a star will blink to indicate that you are ready to move on, as the alignments are stored in a database for future use.
My experience: I stumbled upon a feature, where you can choose the type of sequences you want to work with. They are arranged according to disease, level ID, or simply random. I chose the blood and immune system disorders and was granted sequences, related to essential thrombocytopenia.
Statistics: At the end, I was provided with the following astonishing numbers. So far, 5344 users have submitted 70196 alignments for 2137 different levels. Personally, I think this number is quite surprising, since that many people are joining in since only November 29th, the date of the official launch.
Interested in more: In the “about” page, the following sentence is provided: “For more information about any one of these topics, click here“.
Tags: alignment, Bioinformatics, contributions to science, essential thrombocytopenia, online games, sequence
No Comments »
The pursuit of renewable sources of energy just hit a crucial breakthrough. Since the stores of fossil fuel are diminishing as we speak, researchers are trying to exploit the machinery of microorganisms for the production of diverse chemical compounds, which can be consumed by themselves or later channeled into some form of combustible fuel. One major group of those compounds are alkanes, those saturated organic compounds abundantly found in gasolline.
The study started off when ten out of the eleven strains of Cyanobacteria, that were photoautotrophically cultured, produced forms of alkanes, mostly those with 15 & 17 carbons atoms “termed penatdecane and heptadecane respectively”. Logically, that indicates that the ‘alkane-producing gene’ is shared in all ten of them, yet absent in that unlucky 11th strain. So the search was launched.
Trying to pinpoint the gene responsible for the production of alkanes through using a method referred to as subtractive genome analysis, the study authors compared ten genomes of the alkane-producing strains to figure out which genes they have in common. Next, any of those shared genes was immediately eliminated if it had additionally showed up in the genome of the NON-alkane producing strain. Eventually, the researchers were left with 17 genes found in common and the function of 10 of them had already been previously assigned. Through careful scrutiny of the families to which proteins of those remaining 7 genes probably belong to, two of them particularly stood out, being likely participants in the pathway of the alkane synthesis.
And as always, there is no better way to test the hypothesis than to consult a microbiologist’s favorite lab microbe. To our pleasant surprise, extracts from the colonies of Escherichia coli engineered to express both genes did in fact contain alkanes. So, although we are still not fully aboard the track heading the way towards large scale production of alkanes using such microbes, this is definitely a gigantic leap in the right direction!
Tags: alkane, cyanobacteria, escherichia coli, fuel, renewable, subtractive genome analysis
No Comments »
I had the chance to attend the international conference BioVision Alexandria 2010 held at the Bibliotheca Alexandrina Conference Center in Alexandria, Egypt, from 12-15 April 2010. I really want to share with you the >50 talks that I attended, given by Nobel laureates and other remarkable scientists specialized in health-related topics.
 Dr. Richard J. Roberts
I will start with this talk by Dr. Richard Roberts, who received the Nobel Prize in Physiology or Medicine in 1993 for the discovery of split genes and mRNA splicing in 1977. He is now joint Research Director at New England Biolabs. Dr. Roberts entitled his talk: “Collaborating to bridge the gap between computation and experimentation”. I will try to sum it up for you.
I. Let’s start with stating this fact that Genomics is rapidly taking over the field of biology, at the research level at least.
Examples:
- Sequencing of the human genome or “The Human Genome Project” provides the basis of the emerging field “personalized medicine”.
- Plant genomics are unbelievably important for food and –maybe- for energy production purposes, unicellular plants mostly.
- Ocean organisms are very interesting, as they produce potential new antibiotics and many other useful substances.
- Bacteria and archaea are making up to 50% of the living biomass.
Bacteria are everywhere, they live in the oceans, the soil -plants require them for nitrogen fixation- animals and us; our gut, skin, nose and mouth. Most of these bacteria we know absolutely nothing about because we can’t grow them on cultures.
But this is about to change now thanks to DNA sequencing.
II. So, the core of today’s science is DNA sequencing… but unfortunately, DNA sequencing has its drawbacks.
1) DNA sequencing is getting faster and cheaper in a rate that is exceeding our ability to understand the function or the biochemical pathways of every single gene sequenced. Or, if we’re really lucky, we can make a guess –based on sequence similarity– that this gene, for example, encodes for a “hydrolase”, but just a hydrolase with no clue about the exact biochemical pathways it’s involved in or its substrate.
Dr. Roberts gave this interesting simile that getting more and more DNA sequences of bacteria is like getting a car with a list of all its parts with no idea about how they fit together or how they work. Biology is about understanding how life works. If we’re talking about synthesizing life today, we have to understand how life works first. He dreams that before he dies, he can understand how a very simple bacterium actually works, what is the chemistry that is going on there?
So, the first problem is in the very rapid growth in DNA sequencing without a similar growth in annotation/renaming/finding the function. Here’s a quite older graph showing the growth of sequence databases and annotations from 1982 till 2006, close to the one Dr. Roberts presented, from 1995 till 2009. If you can get to a newer one, please do not hesitate to comment on the post and add its link.
 The growth of sequence databases and annotations (1082-2006) - Argonne National Laboratory
2) The computer is not enough! Do the biochemistry in the lab! In spite of the large amount of money spent on sequencing different organisms; we still are not making any progress in understanding them. This might be that when we get the DNA sequence, translate it into its corresponding amino acid sequence, our best shot then is to compare it to the existing protein sequences in the databases to know how it looks like what and thus predict its function. If two protein sequences look the same, there’s a chance, not a guarantee, that they have the same function, because if there’s a one amino acid difference, they may have different substrates and thus different functions. How to tell? The computer is not enough! Do the biochemistry in the lab! This will lead us to the third problem.
3) All substrates are not available to all labs all the time. So, one lab can’t determine the function of all genes on earth. He gave this example: if you want to assay a specific disaccharide hydrolase; to determine its substrate, you need to have disaccharide combinations of all possible sugars and test it on them.
4) Lack of good funding for biochemistry. Funding agencies think that biochem- is an “old-fashioned” field! They are funding the more appealing genome-wide studies, which is very superficial.
III. Dr. Roberts’ suggestions for a solution: “COMBREX”
Identifying Protein Function—A Call for Community Action.
Dr. Roberts and colleagues have got an NIH fund in October 2009 to establish COMBREX (maybe: COMputational Biology Reading EXperiments). The work flow will be very much like this:
Step1: Establishment of a database. From 1200 complete bacterial and archaeal genome sequences, computational biologists groups generate protein families/domains of unknown function (DUFs), predict the function based on sequence similarity and establish a database.
Step2: Coordination of the efforts between biochemistry labs, experimentalists/biochemists (young grads, even technicians) offer a proposal to test those predictions, gain an exclusive access to those genes of interest for 6 months + a small grant (5,000-10,000 USD) to carry out single gene studies. If we know one protein’s function, we know the function of the whole protein family.
Step3: Making of a Wikipedia-type page for suggestions and predictions.
Step4: Establishment of a journal to publish the findings.
IV. What genes should we focus on/start with?
Dr. Roberts suggested this list, which is ordered in a descending order:
1) Genes abundant in many many different organisms; in humans, animals, bacteria… etc. Those are likely to have conserved important functions.
2) E. coli, the most widely used and so-called “the best studied” organism, we can make a full characterization of it.
3) Helicobacter pylori, to understand the biochemistry of such an important pathogen that we know nothing about.
4) Identify cloned, translated and frozen open reading frames (ORFs) products.
V. Who can help?
Dr. Roberts said almost everybody, computational biologists to predict, biochemists to test, geneticists, as personnel university students -even high school students it can help them to get a genuine science project-, retired professors to supervise and maybe get back to the lab, and funding agencies.
You can watch this talk and most of the conference’s talks via the Bibliotheca Alexandrina webcast.
 Dr. Roberts' talk at BioVision Alexandria 2010
 Richard Roberts with BioVision Alexandria 2010 attendees
Tags: annotation, Bibliotheca Alexandrina, biochemistry, BioVision Alexandria 2010, COMBREX, database, DNA sequencing, domains of unknown function, drawbacks, DUFs, E. coli, Helicobacter pylori, human genome project, New England Biolabs, Nobel laureates, open reading frames, ORFs, Richard Roberts, sequence similarity, synthetic life
No Comments »
I had the chance to attend this interesting webinar hosted by Pubget, a new search engine for life-science PDFs. The webinar was held on Friday, December 11, 2009 (you can catch the recording here). There were 160 attendees and the GoToWebinar tool enabled live interaction with the speakers.
The webinar meant to have speakers who are experts in their areas and to cover different segments dealing with searching, analyzing, and reusing scientific articles. The webinar was moderated by Ryan Jones, President of Pubget, and the speakers represented:
- Publishers: Peter Binfield, Managing Editor, PLoS One
- Libraries: Marcus Banks, Manager of Education and Research Services, UCSF
- End Users: Ansuman Chattopadhyay, PhD, Bioinformatics, University of Pittsburgh
- Tools: Ramy Arnaout, MD PhD, Chairman and CEO, Pubget
Peter Binfield talked about his experience with PLoS One as a journal established in the digital era, and all of its content is digital. He was much concerned with how to monitor the “reuse” of an article and the tools incorporated in PLoS to achieve that. PLoS uses multi-dimensional, article-level metrics rather than a monolithic system like impact factor. PLoS metrics system enables every one to know the exact usage of an article, downloads and views. PLoS also enables commenting, rating, discussing, selecting a part/line and writing a note about it, sharing/bookmarking, and showing trackbacks to blogs and citations.
Marcus Banks said that the digital “libraries” are still in need of a librarian to analyze, organize and link publications. He also talked about the need of a tool that enables researchers to highlight only the parts of a publication that they need, instead of consuming time reading through the whole publication. He talked about sharing tools like: Zotero, Mendeley, Del.icio.us, RefShare, CiteULike, and Pubget.
Representing the end-users was Ansuman Chattopadhyay on the stand. His presentation was entitled: “Beyond PubMed: Next generation literature searching”. With PubMed, it’s difficult to narrow down your search and reduce the number of the results/hits, but this could be achieved by the newer Google-like tools such as:
- GoPubMed, which gives the users suggestions as they are typing
- Novoseek, which categorizes search results into: diseases, pharmacological substances, genes/proteins, procedures, organisms, etc.
and text-similarity tools like:
- eTBLAST, a web server to identify expert reviewers, appropriate journals and similar publications (the paper)
- JANE, Journal/Author Name Estimator
- DeepDyve
One point I didn’t get is the need of a “daily journal of negative results”.
Ramy Arnaout presented Pubget as a search tool that is:
- like an on-the-web Acrobat Reader (the search results are the PDFs of the papers)
- able to deliver science at speed
- legal and free, as researchers use their institution’s license to get to all publications including the non-open-access ones
- user-friendly, as a user chooses from a list of publications a paper that opens in the same window
The concerns that all four speakers expressed at the end of the webinar were mostly:
- How to achieve the balance between delivering science and preserving copyrights, a problem that is being partly solved by Open-Access journals.
- How to tell the end-user what is related to his/her field.
- Although everything is “online”, the challenge is how to get to it and use it.
- How to interact with the end-users and make them discover the tools/features of search engines, this can be solved by workshops and tutorials.
I do thank Pubget for giving me the chance to attend this very informative webinar by making it freely available.
Edited on Dec 22, 2009 09:31 p.m. CLT
Tags: Ansuman Chattopadhyay, CiteULike, DeepDyve, Delicious, double-matrix technology, eTBLAST, Google-like, GoPubMed, GoToWebinar, JANE, librarian, Marcus Banks, Mendeley, Novoseek, Peter Binfield, PLoS ONE, Pubget, Ramy Arnaout, RefShare, Ryan Jones, scientific articles, sharing tools, text-similarity, trackback, UCSF, University of Pittsburgh, webinar, Zotero
3 Comments »
What is bioinformatics?
It can simply be defined as a link between biology and computer science, in which the biological data is processed and computed through software, to yield an output, that is later interpreted in different ways.
Biological data indicates the nucleic acid or protein sequences, their simple or complicated forms, whereas the software is the computer program, specially designed for processing these data in a certain way, done using a certain algorithm (it is a recipe to solve a program problem). The data output is usually numerical or visual (often graphical), but mostly it needs to be well understood. The last one is the key point in the bioinformatics.
What is the need of bioinformatics?
In the research field, we need to be led to certain road, to choose one way or another, or to try many options until we define our research plan. Bioinformatics simply brings the solutions into your hands by a few mouse clicks.
One simple example to make it all clear is the PCR (Polymerase Chain Reaction). We always need to design a primer to trigger our reaction. If we did this through the ordinary ways, we would have to practically try out so many primers and this would surely take a tremendous amount of time. Now, what if you are computer- and internet-literate? You can simply use software to get many primer options for the DNA piece under investigation; doesn’t this save time, efforts and money?
Can bioinformatics be useful in different ways, other than the PCR example?
Some people may think that using bioinformatics is limited to some fields of biological research, and some others might think it is only a matter of prediction, which always needs to be evaluated for its accuracy, specificity and efficiency. But indeed, bioinformatics can be used in the analysis of nucleic acids and proteins.
Analysis?!! That is a vague word, how can you analyze a protein using bioinformatics?
Now you’ll see what bioinformatics can do for protein analysis:
- Retrieving protein sequences from different databases, either specialized or general databases and it is not an easy job if you would think so.
- Computing a protein or amino acid sequence to obtain:
-
So much of the physicochemical properties of you sequence like the molecular weight, and isoelectric point…etc
-
Hydrophilicity / hydrophobicity ratio
Both of the above can provide us with the probabilities of one protein acting as a receptor on the cell surface or it might be antigenic or even secreted outside the cell.
3. On the prediction aspect, we can predict:
-
Some sequences of certain importance, e.g. the prediction of signal peptides that can lead us to know the secretory proteins of one organism
-
-
The last two points are applications of what is called structural bioinformatics, through which computer is capable of predicting the 2ry and 3ry (3-D) configuration of your protein, using special programs with advanced algorithms and artificial intelligence. Amazingly, this may be useful in understanding the receptor-substrate interactions.
4. Comparing sequences to obtain the best alignment (it means compare 2 or more sequences to find their relation to each other, i.e. finding similarities and differences), it will help in:
-
Classifying your protein and relate it to its protein family
-
Making your evolutional expectations about your protein to define whether it descends from another protein or not. This is called phylogenetic analysis, at which the proteins under investigation are studied to know which protein is considered a mother to the others, which are the daughter, the grand daughter, and so on
-
Detection of the common domains, this will help us understanding the functions of unknown protein when it is compared to sequences of other proteins of known functions
Then, what will we gain if we compute DNA? Or you can say, what can bioinformatics do for DNA research?
On the same level as with protein, though different applications, we can use it in:
- Retrieving DNA sequences from different databases
- Computing a sequence to obtain information about its properties (like proteins) e.g. GC% which could be used with other properties to identify a gene
- Assembling sequence fragments (usually DNA is sequenced in the form of fragments which are needed to be assembled in the best way, bioinfo. does this in a faster and more accurate way rather than the ordinary assembly)
- Designing a PCR primer
- Prediction of DNA and RNA secondary structures (e.g. prediction the stems and loops of the t-RNA)
- Performing alignments between 2 or more sequences that can lead to many applications (as those mentioned above in protein alignments)
- Finding of repeats, restriction sites, Single Nucleotide Polymorphism (SNPs), and/or open reading frames, all of which have so huge applications in the medical and paramedical fields and typically in the research activities.
Tags: algorithms, alignment, analysis, artificial intelligence, Bioinformatics, biology, computer science, DNA, PCR, phylogenetic, phylogenetic analysis, protein, RNA, sequence, signal peptide, single nucleotide polymorphism, SNPs
1 Comment »
From a humble point of view, as I was attending a bioinformatics and genomics workshop held in FOPCU, the lecturer was pointing to us, that up until now, no one has managed to come up with a method capable of converting a full-functioning protein back into the original nucleotide sequence on its corresponding gene. At that instance, the following thought occurred to me, as to why this would ever be needed?
For starters, we already have the protein in hand, its 3D structure is, for many, completely figured out and some even their orientation in space, their actions and functions. Then, as far as I understand, being the mould from which a protein is later assembled is the only function a gene, or one which is expressed anyways, has. Knowing that for instance, in gastrin hormone, the 4th amino acid is leucine, would it matter whether it was translated from the codon CUA and not UUG?
Now three thoughts impose themselves. I could only imagine that the presence of SNPs (which is basically a nucleotide that varies among individuals and thought to influence certain traits) within the nucleotide sequence of the gene is the reason behind the researchers’ attention. However, this ultimately means, that if a method were to exist, it would have to produce a different nucleotide sequence for proteins coming from different people. Simple logic.
Another probable explanation, that could come to mind, would be the existence of a difference in the structure of the leucine amino acid, held on tRNA molecules with varying anticodons, where each would have some “characteristic” features that distinguish it from the other tRNA. If that were the case, then it probably has managed to fly below the radar for quite some time, as no matter which reference I turn to, it is taken for granted that these amino acids are carbon copies. So being non-identical in any way, would cause the resulting protein to function in a slightly different manner, which could explain the diversity of their actions in varying individuals. Who knows?
Last, but not least, is the possibility of gaining fast insight into the genome of a previously undiscovered species of living organism, where one can quickly figure out all the expressed genes through this simple task of “reverse translation”. However sequences of the unexpressed genes would still have to adopt the old-fashioned way. No choice there!
Just wondering what the future has in store.
Tags: Bioinformatics, fopcu, gene sequence, genome, genomics, nucleotide, protein, research, reverse translation, sequence, translation, workshop
No Comments »
Microbiology, Immunology & Biochemistry Dept.*
Faculty of Pharmacy
Cairo University
Bioinformatics Practical Exam – Winter 2010**
Time allowed: Lab computers will automatically hibernate after 2 hours.***
Target: Assigning the function of the uncharacterized protein O67940_ AQUAE from Aquifex aeolicus ****
A suggested procedure:
1- Get the amino acid sequence of the protein from UniProtKB
— Run it through BLAST to find homologs (related sequences). Do not forget to choose Blastp & PSI-BLAST
— Check the assigned hits (known function & solved crystal structure) which have highest possible similarity (highest score/ highest % id) to your query.
2- Check obtained BLAST alignment of those proteins against your query.
3- Check if the protein belongs to any protein family using PIRSF & COGs
— Check if the protein shares any conserved domain with assigned function using Pfam.
— Using PROSITE the functional site database, check if the protein shares any sequence motifs with other proteins
4- Check if the protein belongs to a superfamily using SCOP database, which provides structural and evolutionary relationships between proteins.
5- As you don’t have the crystal structure of your Aquae protein & you have the structure of the closest assigned protein, use VAST to search & align protein related structures to yours.
6- Extract homologs.
7- Multiple alignment (structure-guided alignment) using Cn3D
— Neighbor-joining (NJ) phylogenetic analysis using CDTree
8- Use PDBSum to obtain an overview of the protein–ligand interactions available for your query.
9- Alignment of homologous sequences to identify conserved functional residues.
10- Evidence-based assignment of biological function of query O67940_Aquefix.
🙂GOOD LUCK 🙂
* What have I got to lose?!
** I have faith.
*** I can provide that; I know a guy who knows a guy!
**** Frankly, I wanted to pick a different protein, but I hesitated.
Tags: alignment, Bioinformatics, Cn3D, COG, crystal structure, functional site, homolog, motif, PDBSum, Pfam, phylogenetic analysis, PIRSF, PROSITE, protein family, PSI-BLAST, related structures, SCOP, superfamily, uncharacterized protein, UniProtKB, VAST
9 Comments »
Hello, hello. You’re now tuned to your favorite blog: micro-writers.egybio.net. Tonight we have this very special guest, live, online. After two months of waiting, we finally got this exclusive interview with the emerged Streptococcus pyogenes strain, the most dangerous ever, M1T1. We have it here, with us, in the studio.
– Hello, M1T1. Welcome in our studio.
– Hey there.
– We knew from our resources, which are totally classified, that you got yourself in trouble recently.
– (Interrupting), I did NOT get myself in trouble. EID set me up.
– M1T1, Would you please calm down & tell us a little more about yourself?
– Well, I belong to Group A streptococci (GAS) aka Streptococcus pyogenes. M1T1 is my serotype; I’m just a clonal strain. As you know, S. pyogenes colonize human skin & throat causing either non-invasive (sore throat, tonsillitis & impetigo) or invasive (necrotizing fasciitis NF, scarlet fever & streptococcal toxic-shock syndrome STSS) infections. Actually, NF gave me my nick: Flesh-eating bacteria.
– So, you cause all people NF & STSS?
– No, kid. It depends on their genetic susceptibility, what you call “Host–pathogen interactions”. I was isolated from patients with invasive as well as non-invasive infections during 1992–2002. This is NOT entirely my fault; humans can make me extra virulent by selecting the most virulent members.
– Back to your history, when have you exactly been isolated?
– M1 & her sisters were the worst nightmare in US & UK in the 19th century as they caused the famous pandemic of scarlet fever. “Nevertheless”, early 1980s was the golden age of my strain as well as my very close sisters M3T3 & M18. We caused STSS & NF in different parts of the world. Great times, great times!
– Only for you, I suppose! So, what made you hypervirulent? What caused you this “epidemiologic shift”?
– Two reasons Dr Ramy K. Aziz identified that improved my fitness to humans: the new genes I got from phages & “host-imposed pressure”. Both resulted in the selection & survival of me M1T1 the hypervirulent strain. Dr Aziz’s work at Dr Kotb’s lab resulted in identification of a group of genes I got from phages that changed my entire life.
– Interesting! Tell us more about that. How did phages “change your life”?
– Dr Aziz proved that I differ from my ancestral M1 when he found that I have 2 extra prophages (lysogenized phages didn’t get the chance to lyse me, so they became integrated in my genome):
1. SPhinX which carries a gene encodes the potent superantigen SpeA or pyrogenic exotoxin A (scarlet fever toxin).
2. PhiRamid which carries another gene encodes the most potent streptococcal nuclease ever, Sda1.
3. He also found that phages conversion from the lytic state to the lysogenic state resulted in exchange of toxins between our different strains (aka Horizontal Gene Transfer). Phages are very good genetic material transporters, what makes “strains belonging to the same serotype may have different virulence components carried by the same or highly similar phages & those belonging to different serotypes may have identical phage-encoded toxins.” What a quote from Rise and Persistence of Global M1T1 Clone of Streptococcus pyogenes.
– Well, It was not that interesting. So, what? What’s the significance? How that made you hypervirulent?
– You can’t get it? You’re not that smart, are you? Tell me, what made M1 hypervirulent causing scarlet fever in the 1920s and me hypervirulent causing STSS in the 1980s with a 50-years decline period?
– Superantigen?
– Exactly. You do have your moments! Superantigen encoding-gene was present in us and absent in strains isolated in the period between them. The interesting part, for me of course, that humans after 50 years of absence of hypervirulent strains had absolutely no superantigen-neutralizing antibodies. That was the real invasive party. Superantigen causes high inflammatory response because of its non-specific binding to immune system components (antibodies & complements) causing an extremely high inflammatory response. In fact, SuperAg inflammatory response is “host-controlled”.
– So, what about Sda1?
– Streptodornase (streptococcal extracellular nuclease) helps me to degrade neutrophils that entrap me in the neutrophil extracellular traps (NETs). So, I can invade humans freely & efficiently and be able to live in their neutrophils. Dr Aziz proved in his paper “Post-proteomic identification of a novel phage-encoded streptodornase, Sda1, in invasive M1T1 Streptococcus pyogenes” that it’s all about C-terminus in my Sda1; the frame-shift mutation increased my virulence while deletion decreased it.
– Now we know about your SuperAg & nuclease (DNase), what’s the “host-imposed pressure”?
– I have my own SpeB (Protease), I use it to degrade my other proteins (virulence factors), which provides me with a good camouflage & gives me access to blood. When the host immune system recognizes me, it traps me in NETs. At this time, I secret Sda1 to degrade neutrophils. Actually, SpeB protects you, humans, from my Sda1& my other toxins. When SpeB was compared in patients with severe & non-severe strep infections, it was found that SpeB wasn’t expressed in case of severe infection. Expression of SpeB may be host-controlled, as host selects the mutants with a mutation in covS, a part of my regulatory system which regulates my gene expression including SpeB gene.
– Finally, M1T1. How do you see your future?
– More new phage-encoded genes, more selection of the hypervirulent strains by the host & more regulation of expression of my virulence factors. Pretty good future! I also count on humans to not develop immunity against me like what happened in 1980, when I got new virulence factors or allelic variations in my old ones.
Thank you, M1T1. Pleasure talking to you…….M1T1? M1T1, where are you? Why do I feel this strange pain in my throat?
Image credits:
Streptococcus pyogenes: http://adoptamicrobe.blogspot.com/
Tags: covS, epidemiologic shift, flesh-eating bacteria, global M1T1 clone, group A streptococci (GAS), horizontal gene transfer, host-imposed pressure, host–pathogen interactions, impetigo, M1T1, Malak Kotb, necrotizing fasciitis, neutrophil extracellular traps (NETs), phage-encoded toxins, pyrogenic exotoxin A, Ramy K. Aziz, S. pyogenes, scarlet fever, sore throat, streptococcal nuclease Sda1, streptodornase, STSS, superantigen SpeA, tonsillitis
5 Comments »
|