Research

 

My research work has been very broad ranging from development of statistical methods to analyze molecular data, to simulation studies and real data analyses. My focus has been on the estimation of natural selection on protein-coding genes, species divergence time estimation and molecular epidemiology of human viruses such as HIV and HBV. Most of my empirical data analyses concern metazoan and mammal species and HIV and HBV viruses.

 

 

Estimating natural selection on protein-coding genes

 

The field of Molecular Biology has experienced an explosive growth during the last decades due to the accumulation of large genomic data sets as a result of advanced sequencing technologies and due to improved computer technology. However, powerful statistical methods and fast computational algorithms are essential to analyze the large genetic datasets and address interesting biological questions. Such an interesting question could be the kind of selective pressure acting on a protein-coding gene. Selection can leave telltale signs in the genetic material of species and thus by analyzing the species genetic information we can make inference on the past evolutionary selection pattern. A number of approaches have been proposed to measure signs of past selective pressures. Such a measure is the nonsynonymous/synonymous rate ratio (ω) which is used as an indicator of the mode and strength of natural selection acting on protein-coding genes. Homologous genes from different species with high ω ratio (ω > 1) are said to be under positive selection while genes with low ω ratio (ω< 1) are under purifying selection.

Recently, I worked on the development of a new method and computer software for the Bayesian estimation of ω for pairwise sequence comparisons. Pre-existing counting methods and the maximum likelihood do not have nice statistical properties, as the estimates can be zero or infinity in some data sets. The Bayesian method has better statistical properties as the prior on ω shrinks the estimates away from extreme values. The method is fast and can be applied in genome-scale analyses. You can read more about this project here.

 

 

Species divergence time estimation using Bayesian methods

 

The history of evolution consists of two primary components, the relationships among the organisms (phylogeny) and their divergence times. Together they form what is called the Timetree: a phylogenetic tree with time scale. After the proposal of the molecular clock in the 60s there has been a great interest in estimating the species divergence times using molecular sequences. However, estimating the divergence times using molecular data is not a trivial task. Sequences provide information only about the distances (the product of rates and times) among the species on a phylogeny and not about the geological ages of the nodes neither of the evolutionary rate. For this reason information from the fossil record is used to specify priors on the ages of nodes in the phylogeny. Then MCMC methods can be used to estimate both divergence times and evolutionary rates on the phylogeny. However, a series of parameter settings such as prior specifications, model assumptions and phylogenetic uncertainties may affect divergence time estimates.

I am currently examining the effect that ancestral polymorphism and incomplete lineage sorting may have in divergence time estimates when those factors are ignored by the Bayesian model. Analytical Bayesian methods and computer simulations are used to examine the potential effect on deep and shallow phylogenies. You can read an abstract of this project here.

I also work on the impact of partitioning strategy in divergence time estimates using large genomic data sets. Different genomic regions may evolve under different evolutionary processes and thus partitioning genomic data sets into groups may increase the accuracy of time estimates. However, there is not an established method on how to partition a genomic data set and typically researchers partition based on their intuition, most of times according to genes and codon positions. I perform a simulation analysis to explore the performance of several partitioning strategies in terms of their accuracy and robustness of results under different parameter settings.

I have also participated in a project involving the estimation of divergence times of Metazoa. Metazoa involve all the animal species and dating the origins and diversification of Metazoan evolution can provide an important insight into the underlying processes of animal evolution. An abstract of this project is available here.

 

 

Molecular epidemiology

 

I am also interested in applications of statistical phylogenetic techniques on molecular epidemiology of human viruses. During the last years I have been an external collaborator of Prof. Dimitrios Paraskevis at the Department of Hygiene, epidemiology and Medical Statistics, Medical School, University of Athens (http://molepi.org/en/). Our main focus has been the origin, evolution and spatial dispersal (phylogeography, phylodynamics) of human pathogens such as HIV and HBV in local and global scales. We recently published a study on the global spread pattern of the HIV-1 CRF01_AE epidemic. You can read more about this here.

 

 

Miscellaneous

 

I have also reviewed papers for the MBE and PlosOne journals.