Posts Tagged “phylogenetic analysis”

What is bioinformatics?

It can simply be defined as a link between biology and computer science, in which the biological data is processed and computed through software, to yield an output, that is later interpreted in different ways.

Biological data indicates the nucleic acid or protein sequences, their simple or complicated forms, whereas the software is the computer program, specially designed for processing these data in a certain way, done using a certain algorithm (it is a recipe to solve a program problem). The data output is usually numerical or visual (often graphical), but mostly it needs to be well understood. The last one is the key point in the bioinformatics.

What is the need of bioinformatics?

In the research field, we need to be led to certain road, to choose one way or another, or to try many options until we define our research plan. Bioinformatics simply brings the solutions into your hands by a few mouse clicks.

One simple example to make it all clear is the PCR (Polymerase Chain Reaction). We always need to design a primer to trigger our reaction. If we did this through the ordinary ways, we would have to practically try out so many primers and this would surely take a tremendous amount of time. Now, what if you are computer- and internet-literate? You can simply use software to get many primer options for the DNA piece under investigation; doesn’t this save time, efforts and money?

Can bioinformatics be useful in different ways, other than the PCR example?

Some people may think that using bioinformatics is limited to some fields of biological research, and some others might think it is only a matter of prediction, which always needs to be evaluated for its accuracy, specificity and efficiency. But indeed, bioinformatics can be used in the analysis of nucleic acids and proteins.

Analysis?!! That is a vague word, how can you analyze a protein using bioinformatics?

Now you’ll see what bioinformatics can do for protein analysis:

  1. Retrieving protein sequences from different databases, either specialized or general databases and it is not an easy job if you would think so.
  2. Computing a protein or amino acid sequence to obtain:
  • So much of the physicochemical properties of you sequence like the molecular weight, and isoelectric point…etc
  • Hydrophilicity / hydrophobicity ratio

Both of the above can provide us with the probabilities of one protein acting as a receptor on the cell surface or it might be antigenic or even secreted outside the cell.

3. On the prediction aspect, we can predict:

The last two points are applications of what is called structural bioinformatics, through which computer is capable of predicting the 2ry and 3ry (3-D) configuration of your protein, using special programs with advanced algorithms and artificial intelligence. Amazingly, this may be useful in understanding the receptor-substrate interactions.

4. Comparing sequences to obtain the best alignment (it means compare 2 or more sequences to find their relation to each other, i.e. finding similarities and differences), it will help in:

  • Classifying your protein and relate it to its protein family
  • Making your evolutional expectations about your protein to define whether it descends from another protein or not. This is called phylogenetic analysis, at which the proteins under investigation are studied to know which protein is considered a mother to the others, which are the daughter, the grand daughter, and so on
  • Detection of the common domains, this will help us understanding the functions of unknown protein when it is compared to sequences of other proteins of known functions

Then, what will we gain if we compute DNA? Or you can say, what can bioinformatics do for DNA research?

On the same level as with protein, though different applications, we can use it in:

  • Retrieving DNA sequences from different databases
  • Computing a sequence to obtain information about its properties (like proteins) e.g. GC% which could be used with other properties to identify a gene
  • Assembling sequence fragments (usually DNA is sequenced in the form of fragments which are needed to be assembled in the best way, bioinfo. does this in a faster and more accurate way rather than the ordinary assembly)
  • Designing a PCR primer
  • Prediction of DNA and RNA secondary structures (e.g. prediction the stems and loops of the t-RNA)
  • Performing alignments between 2 or more sequences that can lead to many applications (as those mentioned above in protein alignments)
  • Finding of repeats, restriction sites, Single Nucleotide Polymorphism (SNPs), and/or open reading frames, all of which have so huge applications in the medical and paramedical fields and typically in the research activities.

Tags: , , , , , , , , , , , , , , , ,

Comments 1 Comment »

Microbiology, Immunology & Biochemistry Dept.*

Faculty of Pharmacy

Cairo University

Bioinformatics Practical Exam – Winter 2010**
Time allowed: Lab computers will automatically hibernate after 2 hours.***

Target: Assigning the function of the uncharacterized protein O67940_ AQUAE from Aquifex aeolicus ****

A suggested procedure:
1- Get the amino acid sequence of the protein from UniProtKB
— Run it through BLAST to find homologs (related sequences). Do not forget to choose Blastp & PSI-BLAST
— Check the assigned hits (known function & solved crystal structure) which have highest possible similarity (highest score/ highest % id) to your query.
2- Check obtained BLAST alignment of those proteins against your query.
3- Check if the protein belongs to any protein family using PIRSF & COGs
— Check if the protein shares any conserved domain with assigned function using Pfam.
— Using PROSITE the functional site database, check if the protein shares any sequence motifs with other proteins
4- Check if the protein belongs to a superfamily using SCOP database, which provides structural and evolutionary relationships between proteins.
5- As you don’t have the crystal structure of your Aquae protein & you have the structure of the closest assigned protein, use VAST to search & align protein related structures to yours.
6-  Extract homologs.
7- Multiple alignment (structure-guided alignment) using Cn3D
—  Neighbor-joining (NJ) phylogenetic analysis using CDTree
8- Use PDBSum to obtain an overview of the protein–ligand interactions available for your query.
9- Alignment of homologous sequences to identify conserved functional residues.
10- Evidence-based assignment of biological function of query O67940_Aquefix.


* What have I got to lose?!
** I have faith.
*** I can provide that; I know a guy who knows a guy!
**** Frankly, I wanted to pick a different protein, but I hesitated.

Tags: , , , , , , , , , , , , , , , , , , , ,

Comments 9 Comments »