Metagenomics is a culture independent approach that has contributed extensively to the study and understanding previously unidentified microbial communities. Seeking a further understanding of employing metagenomics in the study of the Red Sea microbial communities, we are pleased to interview Dr. Rania Siam, an Associate Professor in the Biology Department, the Director of the Biotechnology Graduate Program at the American University in Cairo (AUC) and an Investigator in the Red Sea Marine Metagenomics Project that is currently running at the AUC in collaboration with King Abdullah University of Science and Technology (KAUST), Woods Hole Oceanographic Institution and Virginia Bioinformatics Institute at Virginia Tech. Dr. Siam holds a Ph.D. in Microbiology and Immunology from McGill University. In addition, she held several post-doctoral positions at McGill Oncology Group, Royal Victoria Hospital, The Salk Institute for Biological Studies and The Scripps Research Institute. Since 2008, Dr. Hamza El Dorry (PI) and Dr. Siam (Co-PI) have been leading the Red Sea Metagenomics research team. The team is exploring novel bacterial communities in the Red Sea through actively participating in Red Sea expeditions for sampling and performing extensive molecular biology, genomics and computational analysis of the data.

1- Dr. Siam, thank you very much for accepting our request. Would you please explain for us the driving motives behind doing metagenomics research in the Red Sea?

The Red Sea is a unique environment in the region that remains to be explored. Thus, working on the Red Sea gives us the opportunity to perform essential research for the region. Furthermore, our main interest are the Red Sea brine pools that are unique environments in terms of high temperature, high salinity, high metal contents and low oxygen. Microbial communities living in these environments are known as extremophiles. The survival of extremophiles in such drastic conditions indicates their possession of genes with novel properties that underlie these unique survival characteristics. Thus, we are highly motivated to explore these novel microbial communities and their unique properties. In addition, we are interested in extracting biotechnological products from the Red Sea that can be beneficial as antimicrobials and anticancer agents.

2- Would you please outline for us the objectives of the project and the main activities inside and outside the laboratory?

In our project, one of our main aims is establishing a Red Sea marine genomic database to be accessed by scientists’ worldwide. Additionally, we are screening this database for biotechnological pharmaceutical products as enzymes and anticancer agents. There are three main activities in the project: sampling, molecular biology/genomics work and computational analysis of the data. Concerning sampling, it is a challenging process that requires rigorous planning where samples should be subjected to proper processing and storage till arrival to the labs. In labs, samples are subjected to different procedures starting from DNA extraction followed by Whole Genome Sequencing (WGS) to identify unknown genes or unique ones and help us understand novel microbial communities. This requires rigorous computational analysis to make sense of our data. Furthermore, we construct fosmid libraries for isolation and purification of genes of biotechnological interest as lipases and cellulases. In addition, we carry out 16s rRNA phylogenetic analysis on the microbial communities present in the samples.

3- Since the idea is novel, we would like to know about the nature of samples, the parameters and the challenges imposed during the process of sampling.

Basically, the sampling process requires well-equipped research vessels as the Woods Hole Oceanographic Institute ‘Oceanus’ and The Hellenic Center for Marine Research (HCMR) ‘Aegeo’. In addition, it is essential to have a team of physical oceanographics for adequate sampling.
Image Source: AUC Today

We started with two different brine pools: Atlantis II Deep and Discovery Deep. As brine pools, these two regions are characterized by the presence of extremely harsh conditions as I mentioned before. In the 2008 and the 2010 KAUST Expedition to the Red Sea, we collected two forms of samples: Large volume water and sediments. In both cases, we face challenges during sample collection. Collecting large volume water samples can take up to 4 hours. In case of bad weather, it is nearly impossible to collect samples. Regarding the sediments, the heavy weight of the sediment core is the main challenge since it may drag people to the water during the sampling process.

The water samples are collected using CTDs (an acronym for Conductivity, Temperature and Depth), it is formed of 10 liter bottles that collect samples and measure the conductivity, temperature and depth, in addition to other parameters. CTDs are capable of measuring these physical properties from each meter of water. Accordingly, they are able to retrieve 2200 readings for each parameter at 2200 m depth. This is very beneficial as it allows us to correlate the physical and chemical parameters with the nature of microbial communities obtained from each sample.

4- What are the reasons behind the choice of Atlantis II Deep and Discovery Deep brine pools for study?

These sites are unique. Atlantis II Deep is 2200 meters below the water surface. It is characterized by having high temperature (68 °C), high heavy metal content, high salinity and little oxygen. Thus, these extremely drastic conditions are motivating us to explore the extremophilic microbial communities in this pool. Discovery Deep is adjacent to Atlantis II but the conditions are less harsh and varies in its heavy metal content. This encourages us to undergo comparative genomic analysis between the two regions.

5- Did the relatively new metagenomic approach of sequencing multiple genomes, combined with, the novelty of employing this approach in the study of microbial milieu in the Red Sea imply some unprecedented practical challenges in retrieving entire and authentic DNA sequences?

Yes, many challenges are present. For example, in case of sediments, many cells die. This in turn can make the process of DNA extraction more difficult. However, we managed to cope with this problem by quickly extracting the DNA from the sediments following its arrival to the labs and avoid freezing and thawing. Another problem with sediments is the presence of impurities that are co-extracted with DNA and interfere with the analysis. Concerning water samples, the main challenge is that the amount of DNA in the samples is very minute. This was dealt with by filtering large volumes of water up to 500 liters per sample.

6- Is recognizing and validating sets of data that are pointing out to specific patterns of microbial diversity or novel genes considered to be challenging?

Actually, we have retrieved enormous amount of data and a large percentage of the data has no match to sequences present in the other genomic databases. This has inspired us to think of new approaches for data analysis to identify the role of these novel sequences and seek collaborations with computational biologists.

7- Finally, do you think there are other environments in Egypt with unique properties which make it promising for employing metagenomics to discover novel genes and bacterial strains?

Yes, actually Egypt is very rich in environments with unique properties. For example, we have the deserts, Siwa‘s hot springs and the Nile. Many unique environments are yet to be explored and lots of research needs to be performed. Our natural resources are limitless.

From a humble point of view, as I was attending a bioinformatics and genomics workshop held in FOPCU, the lecturer was pointing to us, that up until now, no one has managed to come up with a method capable of converting a full-functioning protein back into the original nucleotide sequence on its corresponding gene. At that instance, the following thought occurred to me, as to why this would ever be needed?

For starters, we already have the protein in hand, its 3D structure is, for many, completely figured out and some even their orientation in space, their actions and functions. Then, as far as I understand, being the mould from which a protein is later assembled is the only function a gene, or one which is expressed anyways, has. Knowing that for instance, in gastrin hormone, the 4th amino acid is leucine, would it matter whether it was translated from the codon CUA and not UUG?

Now three thoughts impose themselves. I could only imagine that the presence of SNPs (which is basically a nucleotide that varies among individuals and thought to influence certain traits) within the nucleotide sequence of the gene is the reason behind the researchers’ attention. However, this ultimately means, that if a method were to exist, it would have to produce a different nucleotide sequence for proteins coming from different people. Simple logic.

Another probable explanation, that could come to mind, would be the existence of a difference in the structure of the leucine amino acid, held on tRNA molecules with varying anticodons, where each would have some “characteristic” features that distinguish it from the other tRNA. If that were the case, then it probably has managed to fly below the radar for quite some time, as no matter which reference I turn to, it is taken for granted that these amino acids are carbon copies. So being non-identical in any way, would cause the resulting protein to function in a slightly different manner, which could explain the diversity of their actions in varying individuals. Who knows?

Last, but not least, is the possibility of gaining fast insight into the genome of a previously undiscovered species of living organism, where one can quickly figure out all the expressed genes through this simple task of “reverse translation”. However sequences of the unexpressed genes would still have to adopt the old-fashioned way. No choice there!

Just wondering what the future has in store.

A shematic diagram comparing conventional vaccinology to reverse vaccinologyFor many decades, conventional vaccinology has faced many obstacles. One major problem is that among several antigens of the microbe, you have to identify the most immunogenic (and thus protective) antigens (such as virulent factors, toxins, surface-associated proteins, etc.) suitable for vaccine development. This process is very fastidious and costs a lot as it relies mainly on traditional biochemical and microbiological methods. As a summary, it is carried out as following:

  • Firstly, you have to cultivate the microbe and harvest proteins.
  • Then you have to identify the antigens one by one.
  • After that you can pass to vaccine development stage.

Introducing genomics has greatly contributed to providing a new impulse to vaccinology field. The major role it plays is in the antigen discovery stage. As the genome sequence of many microbes has been identified, the integration between the sequence, proteomics and microarray has introduced what is called “reverse vaccinology” . Reverse vaccinology (RV) means to identify and characterise the antigen using bioinformatics. In RV, you start from the genome and not from the pathogen itself i.e.  you start from the opposite direction, that’s why it is called “reverse”.

RV will provide solutions to some problems that usually come up during vaccine development as:

  • It will provide fast access to almost all antigens including:less common antigens and antigens not expressed in vitro.
  • It represents a new approach for non culturable microorganisms.

On the other hand, the major disadvantage of RV is that it cannot be applied to non-proteinaceous antigens such as lipopolysaccharides and glycolipids.

Reconstruction of a NeanderthaI child from GibraltarTwo years ago, the project of sequencing the Neanderthal genome started. They (Max Planck Institute & 454 Life sequencing) promised to end by this year. Well, they kept their promise. Frankly, some mitochondrial DNA sequences (mtDNA) have been published but contamination was the major defect in those published sequences. They collected more than 60 bone specimens from museums (We’re talking about 38,000-year-old bone); they repeated the sequencing for 35 times in the same clean room of extraction to avoid contamination with human DNA.


From the total 13 protein-encoding genes of the sequenced mtDNA, they identified only one with amino acids difference than the human sapien version. It is cytochrome c oxidase subunit 2 (COX2 – part of the respiratory chain), but even this difference has no significant effect on the functional domain of COX2. They hope to answer this questions in a few months: Why Neanderthals died out & human didn’t?!

We already know that Neanderthals & humans share 99.5% of the sequence, but answering questions about having a common ancestor & extinction through absorption (bred with humans) needs lots & lots of researches, collecting & sequencing samples at different time intervals to come with hypotheses. The mtDNA is not enough as Trinkaus (an expert on Neanderthal biology and human evolution) said: “The genome sequence data may tell us something about the selection of a couple of proteins, but it tells us nothing about language or social behavior.”

