Defining Species When There is Gene Flow

Jiao X, Yang Z

We use the multispecies coalescent model with continuous-time migration or episodic introgression to study the impact of gene flow on genetic differences within and between species and highlight a surprising but plausible scenario in which different population sizes and asymmetrical migration rates cause a genetic sequence to be on average more closely related to a sequence from another species than to a sequence from the same species.

Systematic Biology, 2020

A simulation study to examine the information content in phylogenomic datasets under the multispecies coalescent model

Huang J, Flouri T, Yang Z

This study examines the information content in multilocus datasets for inference under the multispecies coalescent model. The problems considered include estimation of evolutionary parameters, species tree estimation and species delimitation based on Bayesian comparison of delimitation models.

Molecular Biology and Evolution, 2020


Full list of publications


Jiao X, Yang Z. (2020) Defining Species When There is Gene Flow. Systematic Biology. doi: 10.1093/sysbio/syaa052

Huang J, Flouri T, Yang Z. (2020) A simulation study to examine the information content in phylogenomic datasets under the multispecies coalescent model. Molecular Biology and Evolution. doi: 10.1093/molbev/msaa166

Kapli P, Yang Z, Telford M. (2020) Phylogenetic tree building in the genomic age. Nature Reviews Genetics, 21(7):428-444. doi: 10.1038/s41576-020-0233-0

Flouri T, Rannala B, Yang Z. (2020) A tutorial on the use of BPP for species tree estimation and species delimitation. Phylogenetics in the Genomic Era. PDF

Rannala B, Yang Z. (2020) Species delimitation. Phylogenetics in the Genomic Era. PDF

Rannala B, Edwards SV, Leache AD, Yang Z. (2020) The multi-species coalescent model and species tree inference. Phylogenetics in the Genomic Era. PDF

Weber C, Yang Z, Goldman N. (2020) Ambiguity coding allows accurate inference of evolutionary parameters from alignments in an aggregated state-space. Systematic Biology. doi: 10.1093/sysbio/syaa036

Jiao X, Flouri T, Rannala B, Yang Z. (2020) The Impact of Cross-Species Gene Flow on Species Tree Estimation. Systematic Biology. PDF doi: 10.1093/sysbio/syaa001

Flouri T, Jiao X, Rannala B, Yang Z. (2020) A Bayesian implementation of the multispecies coalescent model with introgression for phylogenomic analysis. Molecular Biology and Evolution, 37(4):1211-1223. PDF Suppl. Mat. doi: 10.1093/molbev/msz296


Alvarez-Carretero S, Goswami A, Yang Z, and dos Reis M. (2019) Bayesian estimation of species divergence times using correlated quantitative characters. Systematic Biology, 68(6):967-986. PDF doi: 10.1093/sysbio/syz015

Yang Z. (2019) Adaptive molecular evolution. Handbook of Statistical Genomics, 4th Editions, 68(6):967-986. Wiley, New York. PDF doi: 10.1002/9781119487845.ch13

Halliday TJD, Dos Reis M, Tamuri AU, Ferguson-Gow H, Yang Z, Goswami A. (2019) Rapid morphological evolution in placental mammals post-dates the origin of the crown group. Proc Biol Sci, 286(1898):20182418. PDF doi: 10.1098/rspb.2018.2418

Leache AD, Zhu T, Rannala B, Yang Z. (2019) The spectre of too many species. Systematic Biology, 68(1):168-181. PDF doi: 10.1093/sysbio/syy051


Morris JL, Puttick MN, Clark J, Edwards D, Kenrick P, Pressel S, Wellman CH, Yang Z, Schneider H, Donoghue PCJ. (2018) Accurate timetrees do indeed require accurate calibrations. Response to comment by Hedges et al. Proc Nat Acad Sci, 115(41):E9512-E9513. PDF doi: 10.1073/pnas.1812816115

Flouri T, Jiao X, Rannala B, Yang Z. (2018) Species tree inference with BPP using genomic sequences and the multispecies coalescent. Molecular Biology and Evolution, 35(10):2585-2593. PDF doi: 10.1093/molbev/msy147

Morris JL, Puttick MN, Clark J, Edwards D, Kenrick P, Pressel S, Wellman CH, Yang Z, Schneider H, Donoghue PCJ. (2018) The timescale of early land plant evolution. Proc Nat Acad Sci, 115(10):E2274-E2283. PDF doi: 10.1073/pnas.1719588115

Yang Z, Zhu T. (2018) Bayesian selection of misspecified models is overconifdent and may cause spurious posterior probabilities for phylogenetic trees. Proc Nat Acad Sci, 115(8):1854-1859. PDF doi: 10.1073/pnas.1712673115

Yang Z, Zhu T. (2018) The good, the bad, and the ugly: Bayesian model selection produces spurious posterior probabilities for phylogenetic trees. (This is the version published at the arXive of the paper above). PDF

Yang Z. (2018) Molecular Phylogenetics. Oxford Bibliographies in Evolutionary Biology. Ed. Karin Pfennig. New York: Oxford University Press. doi: 10.1093/obo/9780199941728-0098

Yang Z. (2018) AWF Edwards and the origin of Bayesian phylogenetics. In AWF Edwards (R. G. Winther, ed.), 352-362. Cambridge University Press, Cambridge, England. PDF doi: 10.1017/9781316276259.035

Thawornwattana Y, Dalquen DA, Yang Z. (2018) Coalescent analysis of phylogenomic data confidently resolves the species relationships in the Anopheles gambiae species complex. Molecular Biology and Evolution, 35(10):2512-2527. PDF doi: 10.1093/molbev/msy158

Thawornwattana Y, Dalquen DA, Yang Z. (2018) Designing simple and efficient Markov chain Monte Carlo proposal kernels. Bayesian Analysis, 13(4):1037-1063. PDF doi: 10.1214/17-BA1084

dos Reis M, Gunnell GF, Barba-Montoya J, Wilkins A, Yang Z, Yoder AD. (2018) Using phylogenomic data to explore the effects of relaxed clocks and calibration strategies on divergence time estimation: Primates as a test case. Systematic Biology, 67(4):594-615. PDF doi: 10.1093/sysbio/syy001

Barba-Montoya J, dos Reis M, Schneider H, Donoghue PCJ, Yang Z. (2018) Constraining uncertainty in the timescale of angiosperm evolution and the veracity of a Cretaceous terrestrial revolution. New Phytologist, 218(2):819-834. PDF doi: 10.1111/nph.15011

Angelis K, Alvarez-Carretero S, dos Reis M, Yang Z. (2018) An evaluation of different partitioning strategies for Bayesian estimation of species divergence times. Systematic Biology, 67(1):61-77. PDF (Publisher’s award for best student paper) doi: 10.1093/sysbio/syx061

Shi CM, Yang Z. (2018) Coalescent-based analyses of genomic sequence data provide a robust resolution of phylogenetic relationships among major groups of gibbons. Molecular Biology and Evolution, 35(1):159-179. PDF doi: 10.1093/molbev/msx277


Zeng L, et al. (2017) Discovery of a high-altitude ecotype and ancient lineage of Arabidopsis thaliana from Tibet. Science Bulletin, 62(24):1628-1630. PDF doi: 10.1016/j.scib.2017.10.007

Barba-Montoya J, dos Reis M, Yang Z. (2017) Comparison of different strategies for using fossil calibrations to generate the time prior in Bayesian molecular clock dating. Molecular Phylogenetics and Evolution, 114:386-400. PDF doi: 10.1016/j.ympev.2017.07.005

Nascimento FF, dos Reis M, Yang Z. (2017) A biologist’s guide to Bayesian phylogenetic analysis. Nature Ecology and Evolution, 1(10):1446-1454. PDF doi: 10.1038/s41559-017-0280-x

Warnock RC, Yang Z, Donoghue PCJ. (2017) Testing the molecular clock using mechanistic models of both fossil preservation and molecular evolution. Proc Biol Sci, 284(1857):20170227. PDF doi: 10.1098/rspb.2017.0227

Rannala B, Yang Z. (2017) Efficient Bayesian species tree inference under the multispecies coalescent. Systematic Biology, 66(5):823-842. PDF doi: 10.1093/sysbio/syw119

Yang Z, Rannala B. (2017) Bayesian species identification under the multispecies coalescent provides significant improvements to DNA barcoding analyses. Molecular Ecology, 26(11):3028-3036. PDF doi: 10.1111/mec.14093

Dalquen D, Zhu T, Yang Z. (2017) Maximum likelihood implementation of an isolation-with-migration model for three species. Systematic Biology, 66(3):379-398. PDF doi: 10.1093/sysbio/syw063


Xu B, Yang Z. (2016) Challenges in species tree estimation under the multispecies coalescent model. Genetics, 204(4):1353-1368. PDF doi: 10.1534/genetics.116.190173

Yang Z. (2016) Bayesian phylogenetic inference. Encyclopedia of Evolutionary Biology, 1:137-140. Elsevier. PDF doi: 10.1016/B978-0-12-800049-6.00208-0

Donoghue PCJ, Yang Z. (2016) The evolution of methods for establishing evolutionary timescales. Proc Biol Sci, 371(1699):20160020. PDF doi: 10.1098/rstb.2016.0020

Yang Z, Donoghue PCJ. (2016) Dating species divergences using rocks and clocks: An introduction. Proc Biol Sci, 371(1699):20160020. PDF doi: 10.1098/rstb.2015.0126

dos Reis M, Donoghue PCJ, Yang Z. (2016) Bayesian molecular clock dating of species divergences in the genomics era. Nature Reviews Genetics, 17(2):71-80. PDF doi: 10.1038/nrg.2015.8


dos Reis M, Thawornwattana Y, Angelis K, Telford MJ, Donoghue PCJ, Yang Z. (2015) Uncertainty in the timing of origin of animals and the limits of precision in molecular timescales. Current Biology, 25(22):2939-2950. PDF doi: 10.1016/j.cub.2015.09.066

Yang Z. (2015) The BPP program for species tree estimation and species delimitation. Current Zoology, 61(5):854-865. PDF doi: 10.1093/czoolo/61.5.854

Matsumoto T, Akashi H, Yang Z. (2015) Evaluation of ancestral sequence reconstruction methods to infer nonstationary patterns of nucleotide substitution. Genetics, 200(3):873-890. PDF doi: 10.1534/genetics.115.177386

Liu J, Zhang DX, Yang Z. (2015) A discrete-beta model for testing gene flow after speciation. Methods in Ecology and Evolution, 6(6):715-724. PDF doi: 10.1111/2041-210X.12356

Zhu T, dos Reis M, Yang Z. (2015) Characterization of the uncertainty of divergence time estimation under relaxed molecular clock models using multiple loci. Systematic Biology, 64(2):267-280. PDF doi: 10.1093/sysbio/syu109


Wang Y, Yang Z. (2014) Priors in Bayesian Phylogenetics. Bayesian Phylogenetics: Methods, Algorithms and Applications, 5-23. Chapman & Hall/CRC, London.

Yang Z, Rannala B. (2014) Unguided species delimitation using DNA sequence data from multiple loci. Molecular Biology and Evolution, 31(12):3125-3135. PDF doi: 10.1093/molbev/msu279

Zhang C, Rannala B, Yang Z. (2014) Bayesian species delimitation can be robust to guide tree inference errors. Systematic Biology, 63(6):993-1004. PDF doi: 10.1093/sysbio/syu052

Angelis K, dos Reis M, Yang Z. (2014) Bayesian estimation of nonsynonymous/synonymous rate ratios for pairwise sequence comparisons. Molecular Biology and Evolution, 31(7):1902-1913. PDF doi: 10.1093/molbev/msu142

dos Reis M, Zhu T, Yang Z. (2014) The impact of the rate prior on Bayesian estimation of divergence times with multiple loci. Systematic Biology, 63(4):555-565. PDF doi: 10.1093/sysbio/syu020

dos Reis M, Donoghue PCJ, Yang Z. (2014) Neither phylogenomic nor palaeontological data support a Paleogene origin of placental mammals. Biology Letters, 10(1):20131003. PDF doi: 10.1098/rsbl.2013.1003

Yoder AD, Chan LM, dos Reis M, Larsen PA, Campbell CR, Rasolarison R, Barrett M, Roos C, Kappeler P, Bielawski JP, Yang Z. (2014) Molecular evolutionary characterization of a V1R subfamily unique to Strepsirrhine primates. Genome Biology and Evolution, 6(1):213-227. PDF doi: 10.1093/gbe/evu006

Leache AD, Harris RB, Rannala B, Yang Z. (2014) The influence of gene flow on species tree estimation: a simulation study. Systematic Biology, 63(1):17-30. PDF doi: 10.1093/sysbio/syt049


Yang Z, Rodriduez CE. (2013) Searching for efficient Markov chain Monte Carlo proposal kernels. Proceedings of the National Academy of Sciences, 110(48):19307-19312. PDF and an extended version of the proof of equation (11) doi: 10.1073/pnas.1311790110

Xu B, Yang Z. (2013) PamlX: A graphical user interface for PAML. Molecular Biology and Evolution, 30(12):2723-2724. PDF doi: 10.1093/molbev/mst179

Rannala B, Yang Z. (2013) Molecular clock dating. The Princeton Guide to Evolution, 67-74. Princeton University Press, New York. PDF

dos Reis M, Yang Z. (2013) Why do more divergent sequences produce smaller nonsynonymous/synonymous rate ratios in pairwise sequence comparisons?. Genetics, 195(1):195-204. PDF doi: 10.1534/genetics.113.152025

Stadler T, Yang Z. (2013) Dating phylogenies with sequentially sampled tips. Systematic Biology, 62(5):674-688. PDF doi: 10.1093/sysbio/syt030

Zou XH, Yang Z, Doyle JJ, Ge S. (2013) Multilocus estimation of divergence times and ancestral effective population sizes of Oryza species and implications for the rapid diversification of the genus. New Phytologist, 198(4):1155-1164. PDF doi:

Rannala B, Yang Z. (2013) Improved reversible jump algorithms for Bayesian species delimitation. Genetics, 194(1):245-253. PDF doi: 10.1534/genetics.112.149039

dos Reis M, Yang Z. (2013) The unbearable uncertainty of Bayesian divergence time estimation. Journal of Systematics and Evolution, 51(1):30-43. PDF doi: 10.1111/j.1759-6831.2012.00236.x


Schabauer H, Valle M, Pacherz C, Stockingerx H, Stamatakis A, Robinson-Rechavi M, Yang Z, Salamin N. (2012) SlimCodeML: An optimized version of CodeML for the branch-site model. IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), 700-708. IEEE. doi: 10.1109/IPDPSW.2012.88

dos Reis M, Inoue J, Hasegawa M, Asher R, Donoghue PCJ, Yang Z. (2012) Phylogenomic data sets provide both precision and accuracy in estimating the timescale of placental mammal evolution. Proc Biol Sci, 279(1742):3491-3500. PDF Suppl. Mat. doi: 10.1098/rspb.2012.0683

Zhai W, Nielsen R, Goldman N, Yang Z. (2012) Looking for Darwin in genomic sequences - validity and success of statistical methods. Molecular Biology and Evolution, 29(10):2889-2893. PDF doi: 10.1093/molbev/mss104

Zhu T, Yang Z. (2012) Maximum likelihood implementation of an isolation-with-migration model with three species for testing speciation with gene flow. Molecular Biology and Evolution, 29(3):379-398. PDF doi: 10.1093/sysbio/syw063

Zhang C, Rannala B, Yang Z. (2012) Robustness of compound Dirichlet priors for Bayesian inference of branch lengths. Systematic Biology, 61(5):779-784. PDF doi: 10.1093/sysbio/sys030

Yang Z, Rannala B. (2012) Molecular phylogenetics: principles and practice. Nature Review Genetics, 13(5):303-314. PDF doi: 10.1038/nrg3186

Parham J, Donoghue PCJ, Bell C, Calway T, Head J, Holroyd P, Inoue J, Irmis R, Joyce W, Ksepka D, Patane J, Smith N, Tarver J, van Tuinen M, Yang Z, Angielczyk K, Greenwood J, Hipsley C, Louis J, Makovicky P, Mueller J, Smith K, Theodor J, Warnock R, Benton M. (2012) Best Practices for Justifying Fossil Calibrations. Systematic Biology, 61(2):346-359. PDF doi: 10.1093/sysbio/syr107

Warnock RCM, Yang Z, Donoghue PCJ. (2012) Exploring uncertainty in the calibration of the molecular clock. Biology Letters, 8(1):156-159. PDF doi: 10.1098/rsbl.2011.0710

Rannala B, Zhu T, Yang Z. (2012) Tail paradox, partial identifiability and influential priors in Bayesian branch length inference. Molecular Biology and Evolution, 29(1):325-335. PDF doi: 10.1093/molbev/msr210


Zhang C, Zhang DX, Zhu T, Yang Z. (2011) Evaluation of a Bayesian coalescent method of species delimitation. Molecular Biology and Evolution, 60(6):747-761. PDF doi: 10.1093/sysbio/syr071

Brown RP, Yang Z. (2011) Rate variation and estimation of divergence times using strict and relaxed clocks. BMC Evolutionary Biology, 11:271. PDF doi: 10.1186/1471-2148-11-271

Zang LL, Zou XH, Zhang FM, Yang Z, Ge S. (2011) Phylogeny and species delimitation of the C-genome diploid species in Oryza. Journal of Systematics and Evolution, 49(5):386-395. PDF doi: 10.1111/j.1759-6831.2011.00145.x

Groussin M, Pawlowski J, Yang Z. (2011) Bayesian relaxed clock estimation of divergence times in Foraminifera. Molecular Phylogenetics and Evolution, 61(1):157-166. PDF doi: 10.1016/j.ympev.2011.06.008

Yoshida I, Sugiura W, Shibata J, Ren F, Yang Z, Tanaka H. (2011) Change of positive selection pressure on HIV-1 envelope gene inferred by early and recent samples. PLOS One, 6(4):e18630. doi: 10.1371/journal.pone.0018630

dos Reis M, Yang Z. (2011) Approximate likelihood calculation for Bayesian estimation of divergence times. Molecular Biology and Evolution, 28(7):2161-2172. PDF doi: 10.1093/molbev/msr045

Zhu T, Hu Y, Ma Z, Zhang DX, Li T, Yang Z. (2011) Efficient simulation under a population genetics model of carcinogenesis. Bioinformatics, 27(6):837-843. PDF doi: 10.1093/bioinformatics/btr025

Yang Z, dos Reis M. (2011) Statistical properties of the branch-site test of positive selection. Molecular Biology and Evolution, 28(3):1217-1228. PDF doi: 10.1093/molbev/msq303

Wilkinson RD, Steiper ME, Soligo C, Martin RD, Yang Z, Tavare S. (2011) Dating primate divergences through an integrated analysis of palaeontological and molecular data. Systematic Biology, 60(1):16-31. PDF doi: 10.1093/sysbio/syq054


Yang Z. (2010) The Timetree of Life. Quarterly Review of Biology, 85(3):360-361. PDF doi: 10.1086/655063

Chen MS, Liu X, Yang Z, Zhao H, Shukle R, Stuart J, Hulbert S. (2010) Unusual conservation among genes encoding small secreted salivary gland proteins from a gall midge. BMC Evolutionary Biology, 10:296. PDF doi: 10.1186/1471-2148-10-296

Fletcher W, Yang Z. (2010) The effect of insertions, deletions and alignment errors on the branch-site test of positive selection. Molecular Biology and Evolution, 27(10):2257-2267. PDF doi: 10.1093/molbev/msq115

Yang Z, Rannala B. (2010) Bayesian species delimitation using multilocus sequence data. Proceedings of the National Academy of Sciences, 107(20):9264-9269. PDF doi: 10.1073/pnas.0913022107

Yang Z. (2010) A likelihood ratio test of speciation with gene flow using genomic sequence data. Genome Biology and Evolution, 2:200-211. PDF doi: 10.1093/gbe/evq011

Beaumont M, Nielsen R, Robert C, Hey J, Gaggiotti O, Knowles L, Estoup A, Panchal M, Corander J, Hickerson M, Sisson S, Fagundes N, Chikhi L, Beerli P, Vitalis R, Cornuet JM, Huelsenbeck J, Novembre J, Foll M, Yang Z, Rousset F, Balding D, Excoffier L. (2010) In defence of model-based inference in phylogeography. Molecular Ecology, 19(3):436-446. PDF doi: 10.1111/j.1365-294X.2009.04515.x

Brown RP, Yang Z. (2010) Bayesian dating of shallow phylogenies with a relaxed clock. Systematic Biology, 59(2):119-131. PDF doi: 10.1093/sysbio/syp082

Inoue J, Donoghue PCJ, Yang Z. (2010) The impact of the representation of fossil calibrations on Bayesian estimation of species divergence times. Systematic Biology, 59(1):74-89. PDF doi: 10.1093/sysbio/syp078


Yang Z, Nielsen R, Goldman N. (2009) In defense of statistical methods for detecting positive selection. Proceedings of the National Academy of Sciences, 106(36):E95. PDF doi: 10.1073/pnas.0904550106

Fletcher W, Yang Z. (2009) INDELible: A flexible simulator of biological sequence evolution. Molecular Biology and Evolution, 26(8):1879-1888. PDF doi: 10.1093/molbev/msp098

Ren F, Tanaka H, Yang Z. (2009) A likelihood look at the supermatrix-supertree controversy. Gene, 441:119-125. PDF doi: 10.1016/j.gene.2008.04.002

Ren F, Tanaka H, Yang Z. (2009) MtZoa: a general mitochondrial amino acid substitutions model for animal evolutionary studies. Molecular Phylogenetics and Evolution, 52(1):268-272. PDF doi: 10.1016/j.ympev.2009.01.011


Schmid KJ, Yang Z. (2008) The trouble with sliding windows. PLOS One, 3(11):e3746. PDF doi: 10.1371/journal.pone.0003746

Goldman N, Yang Z. (2008) Statistical and computational challenges in molecular phylogenetics and evolution. Phil. Trans. R. Soc. B, 363(1512):3889-3892. PDF doi: 10.1098/rstb.2008.0182

Yang Z. (2008) Empirical evaluation of a prior for Bayesian phylogenetic inference. Phil. Trans. R. Soc. B, 363(1512):4031-4039. PDF doi: 10.1098/rstb.2008.0164

Vamathevan J, Hasan S, Emes R, Amrine-Madsen H, Rajagopalan D, Topp S, Kumar V, Word M, Simmons M, Foord S, Sanseau P, Yang Z, Holbrook J. (2008) The role of positive selection in determining the molecular cause of species differences in disease. BMC Evolutionary Biology, 8:273. PDF doi: 10.1186/1471-2148-8-273

Burgess R, Yang Z. (2008) Estimation of hominoid ancestral population sizes under Bayesian coalescent models incorporating mutation rate variation and sequencing errors. Molecular Biology and Evolution, 25(9):1979-1994. PDF doi: 10.1093/molbev/msn148

Rannala B, Yang Z. (2008) Phylogenetic inference using whole genomes. Annual Review of Genomics and Human Genetics, 9:217-231. PDF doi: 10.1146/annurev.genom.9.081307.164407

Emes RD, Yang Z. (2008) Duplicated paralogous genes subject to positive selection in the genome of Trypanosoma brucei. PLOS One, 3(5):e2295. PDF doi: 10.1371/journal.pone.0002295

Furlong RF, Yang Z. (2008) Diversifying and purifying selection in the peptide binding region of DRB in mammals. Journal of Molecular Evolution, 66(4):384-394. PDF doi: 10.1007/s00239-008-9092-6

Yang Z, Nielsen R. (2008) Mutation-selection models of codon substitution and their use to estimate selective strengths on codon usage. Molecular Biology and Evolution, 25(3):568-579. PDF doi: 10.1093/molbev/msm284


Zhou R, Zeng K, Wu W, Chen X, Yang Z, Shi S, Wu CI. (2007) Population genetics of speciation in nonmodel organisms. Molecular Biology and Evolution, 24(12):2746-2754. PDF doi: 10.1093/molbev/msm209

Anisimova M, Bielawski JP, Dunn K, Yang Z. (2007) Phylogenomic analysis of natural selection pressure in Streptococcus genomes. BMC Evolutionary Biology, 7:154. PDF doi: 10.1186/1471-2148-7-154

Yang Z. (2007) Fair-balance paradox, star-tree paradox and Bayesian phylogenetics. Molecular Biology and Evolution, 24(8):1639-1655. PDF doi: 10.1093/molbev/msm081

Yang Z. (2007) PAML 4: Phylogenetic Analysis by Maximum Likelihood. Molecular Biology and Evolution, 24(8):1586-1591. PDF doi: 10.1093/molbev/msm088

Rannala B, Yang Z. (2007) Inferring speciation times under an episodic molecular clock. Systematic Biology, 56(3):453-466. PDF doi: 10.1080/10635150701420643

Anisimova M, Yang Z. (2007) Multiple hypothesis testing to detect lineages under positive selection that affects only a few sites. Molecular Biology and Evolution, 24(5):1219-1228. PDF doi: 10.1093/molbev/msm042

Hurley IA, Mueller RL, Dunn KA, Schmidt RJ, Friedman M, Ho RK, Prince VE, Yang Z, Thomas MG, Coates MI. (2007) A new time-scale for ray-finned fish evolution. Proc. R. Soc. B, 274(1609):489-498. PDF Suppl. Mat. doi: 10.1098/rspb.2006.3749

Yang Z. (2007) Adaptive molecular evolution. Handbook of statistical genetics, 3rd Edition, 375-406. Wiley, New York. doi: 10.1002/9780470061619.ch12


Yang Z. (2006) On the varied pattern of evolution of two fungal genomes: a critique of Hughes and Friedman. Molecular Biology and Evolution, 23(12):2279-2282. PDF doi: 10.1093/molbev/msl122

Ren F, Tsubota A, Hirokawa T, Kumada H, Yang Z, Tanaka H. (2006) A unique amino acid substitution, T126I, in human genotype C of hepatitis B virus S gene and its possible influence on antigenic structural change. Gene, 383:43-51. PDF doi: 10.1016/j.gene.2006.07.018

Aguileta G, Bielawski JP, Yang Z. (2006) Proposed standard nomenclature for the a- and b-globin gene families. Genes and Genetic Systems, 81(5):367-371. doi: 10.1266/ggs.81.367

Aguileta G, Bielawski JP, Yang Z. (2006) Evolutionary rate variation among vertebrate beta globin genes: implications for dating gene family duplication events. Gene, 380(1):21-29. doi: 10.1016/j.gene.2006.04.019

Yang Z, Rannala B. (2006) Bayesian estimation of species divergence times under a molecular clock using multiple fossil calibrations with soft bounds. Molecular Biology and Evolution, 23(1):212-226. PDF doi: 10.1093/molbev/msj024


Ren F, Tanaka H, Yang Z. (2005) An empirical examination of the utility of codon-substitution models in phylogeny reconstruction. Systematic Biology, 54(5):808-818. PDF doi: 10.1080/10635150500354688

Zhang J, Nielsen R, Yang Z. (2005) Evaluation of an improved branch-site likelihood method for detecting positive selection at the molecular level. Molecular Biology and Evolution, 22(12):2472-2479. PDF doi: 10.1093/molbev/msi237

Bielawski JP, Yang Z. (2005) Maximum likelihood methods for detecting adaptive protein evolution. In: Statistical Methods in Molecular Evolution. Statistics for Biology and Health, 103-124. Springer, New York, NY. doi:

Yang Z, Rannala B. (2005) Branch-length prior influences Bayesian posterior probability of phylogeny. Systematic Biology, 54(3):455-470. PDF doi: 10.1080/10635150590945313

Sainudiin R, Wong WSW, Yogeeswaran K, Nasrallah J, Yang Z, Nielsen R. (2005) Detecting site-specific physicochemical selective pressures: applications to the class-I HLA of the human major histocompatibility complex and the SRK of the plant sporophytic self-incompatibility system. Journal of Molecular Evolution, 60(3):315-326. PDF doi: 10.1007/s00239-004-0153-1

Yang Z, Wong WSW, Nielsen R. (2005) Bayes empirical Bayes inference of amino acid sites under positive selection. Molecular Biology and Evolution, 22(4):1107-1118. PDF doi: 10.1093/molbev/msi097

Yang Z. (2005) The power of phylogenetic comparison in revealing protein function. Proceedings of the National Academy of Sciences, 102(9):3179-3180. PDF doi: 10.1073/pnas.0500371102

Yang Z. (2005) Bayesian inference in molecular phylogenetics. Mathematics of Evolution and Phylogeny, 63-90. Oxford University Press, Oxford. poor-quality pdf (9.5MB) and book at OUP


Wong WSW, Yang Z, Goldman N, Nielsen R. (2004) Accuracy and power of statistical methods for detecting adaptive evolution in protein coding sequences and for identifying positively selected sites. Genetics, 168(2):1041-1051. PDF doi: 10.1534/genetics.104.031153

Anisimova M, Yang Z. (2004) Molecular evolution of hepatitis delta virus antigen gene: recombination or positive selection?. Journal of Molecular Evolution, 59(6):815-826. PDF (alignment is at the EMBL nucleotide sequence database - accession number ALIGN_000712) doi: 10.1007/s00239-004-0112-x

Aguileta G, Bielawski JP, Yang Z. (2004) Gene conversion and functional divergence in the β-globin gene family. Journal of Molecular Evolution, 59(2):177-189. PDF doi: 10.1007/s00239-004-2612-0

Yang Z. (2004) A heuristic rate smoothing procedure for maximum likelihood estimation of species divergence times. Journal of Molecular Evolution, 50(4):645-656. PDF (in English with a Chinese abstract)

Yoder AD, Yang Z. (2004) Divergence dates for Malagasy lemurs estimated from multiple gene loci: fit with climatological events and speciation models. Molecular Ecology, 13(4):757-773. PDF doi: 10.1046/j.1365-294X.2004.02106.x

Schein M, Yang Z, Mitchell-Olds T, Schmid KJ. (2004) Rapid evolution of a pollen-specific oleosin-like gene family from Arabidopsis thaliana and closely related species. Molecular Biology and Evolution, 21(4):659-669. doi: 10.1093/molbev/msh059

Bielawski JP, Yang Z. (2004) Likelihood analysis of the chalcone synthase genes suggests the role of positive selection in the morning glories (Ipomoea). Journal of Molecular Evolution, 59(1):121-132. PDF doi: 10.1007/s00239-004-2597-8

Yang Z. (2004) A probabilist’s account of modern molecular population genetics, Review of Probability Models for DNA Sequence Evolution (by Rick Durrett. Springer-Verlag, New York, 2002). Heredity, 92:474. PDF doi: 10.1038/sj.hdy.6800419


Aris-Brosou S, Yang Z. (2003) Bayesian models of episodic evolution support a late pre-cambrian explosive diversification of the Metazoa. Molecular Biology and Evolution, 20(12):1947-1954. PDF doi: 10.1093/molbev/msg226

Furlong RF, Yang Z. (2003) Comparative genomics coming of age. Heredity, 91(6):533-534. PDF doi: 10.1038/sj.hdy.6800372

Yang Z. (2003) Phylogenetics as applied mathematics. Review of Phylogenetics (by Charles Semple and Mike Steel. Oxford University Press, 2003). Trends in Ecology and Evolution, 18(11):558-559. PDF doi: 10.1016/S0169-5347(03)00194-0

Yang Z, Ro S, Rannala B. (2003) Likelihood models of somatic mutation and codon substitution in cancer genes. Genetics, 165(2):695-705. PDF

Yang W, Bielawski JP, Yang Z. (2003) Widespread adaptive evolution in the human immunodeficiency virus type 1 genome. Journal of Molecular Evolution, 57(2):212-221. PDF Data (HIVdata.YBY.tar2003.gz) doi: 10.1007/s00239-003-2467-9

Yang Z, Yoder AD. (2003) Comparison of likelihood and bayesian methods for estimating divergence times using multiple gene loci and calibration points, with application to a radiation of cute-looking mouse lemur species. Systematic Biology, 52(5):705-716. PDF (Data files for the ML analysis are included in the paml release in the examples/MouseLemurs/ folder. Data files for the Bayesian analysis are in the T3 distribution in the MouseLemurs/ folder.) doi: 10.1080/10635150390235557

Rannala B, Yang Z. (2003) Bayes estimation of species divergence times and ancestral population sizes using DNA sequences from multiple loci. Genetics, 164(4):1645-1656. PDF (The pdf file has a page glued at the end giving a detailed derivation of the Hastings ratio of the rubber-band algorithm described in the Appendix.) (Data files are included in the MCMCcoal program)

Nielsen R, Yang Z. (2003) Estimating the distribution of selection coefficients from phylogenetic data with applications to mtDNA. Molecular Biology and Evolution, 20(8):1231-1239. PDF doi: 10.1093/molbev/msg147

Anisimova M, Nielsen R, Yang Z. (2003) Effect of recombination on the accuracy of the likelihood method for detecting positive selection at amino acid sites. Genetics, 164(3):1229-1236. PDF

Yang Z, Stephens D, Dawson KJ, Drummond A, Nicholls G, Griffiths RC, Wilkinson-Herbots HM, Beaumont MA, Baird SJE, Lascoux M, Leblois R, Estoup A, Nielsen R, Hey J, Stumpf MPH. (2003) Inference from DNA data: population histories, evolutionary processes and forensic match probabilities: Discussion. Journal of Royal Statistical Society A, 166:188-201. PDF

Bielawski JP, Yang Z. (2003) Maximum likelihood methods for detecting adaptive evolution after gene duplication. Journal of Structural and Functional Genomics, 3(1-4):201-212. PDF

Bielawski JP, Yang Z. (2003) Maximum likelihood methods for detecting adaptive evolution after gene duplication. Genome Evolution: Gene and Genome Duplications and the Origin of Novel Gene Functions, 201-212. Kluwer Academic Publishers, Dordrecht. PDF (This is the same as the journal paper above.)

Yang Z. (2003) Adaptive Molecular Evolution. Handbook of statistical genetics, 2nd Edition, 229-254. Wiley, New York.


Yang Z. (2002) Inference of selection from multiple species alignments. Current Opinion in Genetics and Development, 12(6):688-694. PDF doi: 10.1016/s0959-437x(02)00348-9

Yang J, Huang J, Gu H, Zhong Y, Yang Z. (2002) Duplication and adaptive evolution of chalcone synthase genes in the genus Dendranthema (Asteraceae). Molecular Biology and Evolution, 19(10):1752-1759. PDF doi: 10.1093/oxfordjournals.molbev.a003997

Aris-Brosou S, Yang Z. (2002) The effects of models of rate evolution on estimation of divergence dates with a special reference to the metazoan 18S rRNA phylogeny. Systematic Biology, 51(5):703-714. PDF doi: 10.1080/10635150290102375

Yang Z. (2002) Likelihood and Bayes estimation of ancestral population sizes in Hominoids using data from multiple loci. Genetics, 162(4):1811-1823. PDF The Bayes MCMC method described in this paper is superseded by the Rannala & Yang 2003 algorithm, which is implemented in the MCMCcoal program. The ML program, Ne3sML, is available in the MCMCcoal release as well; it uses Mathematica.)

Jiggins FM, Hurst GDD, Yang Z. (2002) Host-symbiont conflicts: positive selection on the outer membrane protein of parasite but not mutualistic Rickettsiaceae. Molecular Biology and Evolution, 19(8):1341-1349. PDF doi: 10.1093/oxfordjournals.molbev.a004195

Anisimova M, Bielawski JP, Yang Z. (2002) Accuracy and power of Bayes prediction of amino acid sites under positive selection. Molecular Biology and Evolution, 19(6):950-958. PDF doi: 10.1093/oxfordjournals.molbev.a004152

Yang Z, Nielsen R. (2002) Codon-substitution models for detecting molecular adaptation at individual sites along specific lineages. Molecular Biology and Evolution, 19(6):908-917. PDF doi: 10.1093/oxfordjournals.molbev.a004148

Yang Z. (2002) Molecular clock. Oxford Encyclopedia of Evolution, 747-750. Oxford University Press, Oxford. PDF book at OUP (2 volumes)

Yang Z, Swanson WJ. (2002) Codon-substitution models to detect adaptive evolution that account for heterogeneous selective pressures among site classes. Molecular Biology and Evolution, 19(1):49-57. PDF doi: 10.1093/oxfordjournals.molbev.a003981

Clote P, Naylor GJP, Yang Z. (2002) Proteins: structure, function and evolution. Pacific Symposium on BioComputing, 19(1):548-551. PDF


Jiggins CD, Linares M, Naisbit RE, Salazar C, Yang Z, Mallet J. (2001) Sex-linked hybrid sterility in a butterfly. Evolution, 55(8):1631-1638. PDF doi: 10.1111/j.0014-3820.2001.tb00682.x

Anisimova M, Bielawski JP, Yang Z. (2001) Accuracy and power of likelihood ratio test in detecting adaptive molecular evolution. Molecular Biology and Evolution, 18(8):1585-1592. PDF doi: 10.1093/oxfordjournals.molbev.a003945

Bielawski JP, Yang Z. (2001) Positive and negative selection in the DAZ gene family. Molecular Biology and Evolution, 18(8):523-529. PDF doi: 10.1093/oxfordjournals.molbev.a003831

Swanson WJ, Yang Z, Wolfner MF, Aquadro CF. (2001) Positive Darwinian selection in the evolution of mammalian female reproductive proteins. Proceedings of the National Academy of Sciences, 98(5):2509-2512. PDF (featured in New York Times 27 February 2001) doi: 10.1073/pnas.051605998

Dunn KA, Bielawski JP, Yang Z. (2001) Substitution rates in Drosophila nuclear genes: implications for translational selection. Genetics, 157(1):317-330. PDF

Yang Z. (2001) Maximum likelihood analysis of adaptive evolution in HIV-1 gp120 env gene. Pacific Symposium on BioComputing, 226-237. PDF

Yang Z. (2001) Adaptive molecular evolution. Handbook of statistical genetics, 327-350. Wiley, London. PDF


Thomas MG, Hagelberg E, Jones HB, Yang Z, Lister A. (2000) Molecular and morphological evidence on the phylogeny of the Elephantidae. Proc Biol Sci, 267(1461):2493-2500. PDF doi: 10.1098/rspb.2000.1310

Yang Z. (2000) Maximum likelihood estimation on large phylogenies and analysis of adaptive evolution in human influenza virus A. Journal of Molecular Evolution, 51(5):423-432. PDF alignment, and tree doi: 10.1007/s002390010105

Yang Z, Bielawski JP. (2000) Statistical methods for detecting molecular adaptation. Trends in Ecology and Evolution, 15(12):496-503. PDF doi: 10.1016/s0169-5347(00)01994-7

Bielawski JP, Dunn K, Yang Z. (2000) Rates of nucleotide substitution and mammalian nuclear gene evolution: approximate and maximum-likelihood methods lead to different conclusions. Genetics, 156(3):1299-1308. PDF

Yang Z, Swanson WJ, Vacquier VD. (2000) Maximum likelihood analysis of molecular adaptation in abalone sperm lysin reveals variable selective pressures among lineages and sites. Molecular Biology and Evolution, 17(10):1446-1455. PDF doi: 10.1093/oxfordjournals.molbev.a026245

Yoder AD, Yang Z. (2000) Estimation of primate speciation dates using local molecular clocks. Molecular Biology and Evolution, 17(7):1081-1090. PDF (The local clock models here are superseded by the Yang & Yoder 2003 cute-looking paper.) doi: 10.1093/oxfordjournals.molbev.a026389

Yang Z, Nielsen R, Goldman N, Pedersen AM. (2000) Codon-substitution models for heterogeneous selection pressure at amino acid sites. Genetics, 155(1):431-449. PDF (Quite a few example data sets are included in the paml release to demonstrate methods implemented in this paper; look at the readme files in the folders lysin, lysozyme, HIVNSsites. Also the paper says on page 448 that the data and list of sites will be posted at an ftp site. The ftp site is now dead, but the files are in the archive here; save it into file name YNGP2000.tar.gz)

Holbrook JD, Birdsey GM, Yang Z, Bruford MW, Danpure CJ. (2000) Molecular adaptation of alanine:glyoxylate aminotransferase targeting in primates. Molecular Biology and Evolution, 17(3):387-400. PDF doi: 10.1093/oxfordjournals.molbev.a026318

Yang Z. (2000) Complexity of the simplest phylogenetic estimation problem. Proc Biol Sci, 267(1439):109-116. PDF doi: 10.1098/rspb.2000.0974

Yang Z. (2000) Relating physicochemical properties of amino acids to variable nucleotide substitution patterns among sites. Pacific Symposium on Computational Biology, 81-92. PDF

Yang Z, Nielsen R. (2000) Estimating synonymous and nonsynonymous substitution rates under realistic evolutionary models. Molecular Biology and Evolution, 17(1):32-43. PDF (This method is implemented in the yn00 program in paml.) doi: 10.1093/oxfordjournals.molbev.a026236


Excoffier L, Yang Z. (1999) Substitution rate variation among sites in the mitochondrial hypervariable region I of humans and chimpanzees. Molecular Biology and Evolution, 16(10):1357-1368. PDF doi: 10.1093/oxfordjournals.molbev.a026046

Yang Z, Yoder AD. (1999) Estimation of the transition/transversion rate bias and species sampling. Journal of Molecular Evolution, 48(3):274-283. PDF doi: 10.1007/pl00006470


Rannala B, Huelsenbeck JP, Yang Z, Nielsen R. (1998) Taxon sampling and the accuracy of large phylogenies. Systematic Biology, 47(4):702-709. PDF doi: 10.1080/106351598260680

Yang Z, Nielsen R, Hasegawa M. (1998) Models of amino acid substitution and applications to mitochondrial protein evolution. Molecular Biology and Evolution, 15(12):1600-1611. PDF (Example data set included in the paml release in the examples/mtCDNA/ folder.) doi: 10.1093/oxfordjournals.molbev.a025888

Hasegawa M, Cao Y, Yang Z. (1998) Preponderance of slightly deleterious polymorphism in mitochondrial DNA: replacement/synonymous rate ratio is much higher within species than between species. Molecular Biology and Evolution, 15(11):1499-1505. PDF doi: 10.1093/oxfordjournals.molbev.a025877

Yang Z. (1998) Likelihood ratio tests for detecting positive selection and application to primate lysozyme evolution. Molecular Biology and Evolution, 15(5):568-573. PDF (Example data are included in the paml release in the folder examples/lysozyme/.) doi: 10.1093/oxfordjournals.molbev.a025957

Yang Z. (1998) On the best evolutionary rate for phylogenetic analysis. Systematic Biology, 47(1):125-133. PDF doi: 10.1093/sysbio/syx051

Nielsen R, Yang Z. (1998) Likelihood models for detecting positively selected amino acid sites and applications to the HIV-1 envelope gene. Genetics, 148(3):929-936. PDF

Yang Z, Nielsen R. (1998) Synonymous and nonsynonymous rate variation in nuclear genes of mammals. Journal of Molecular Evolution, 46(4):409-418. PDF doi: 10.1007/pl00006320


Yang Z. (1997) PAML: a program for package for phylogenetic analysis by maximum likelihood. CABIOS, 13(5):555-556. PDF Website doi: 10.1093/bioinformatics/13.5.555

Yang Z, Goldman N. (1997) Are big trees indeed easy?. Trends in Ecology and Evolution, 12(9):357. PDF doi: 10.1016/s0169-5347(97)83196-5

Yang Z, Rannala B. (1997) Bayesian phylogenetic inference using DNA sequences: a Markov chain Monte Carlo method. Molecular Biology and Evolution, 14(7):717-724. PDF doi: 10.1093/oxfordjournals.molbev.a025811

Yang Z. (1997) On the estimation of ancestral population sizes of modern humans. Genetical Research Cambridge, 69(2):111-116. PDF doi: 10.1017/s001667239700270x

Huelsenbeck JP, Rannala B, Yang Z. (1997) Satistical tests of host-parasite cospeciation. Evolution, 51(2):410-419. PDF doi: 10.1111/j.1558-5646.1997.tb02428.x

Yang Z. (1997) How often do wrong models produce better phylogenies?. Molecular Biology and Evolution, 14(1):105-108. PDF doi: 10.1093/oxfordjournals.molbev.a025695


Yang Z. (1996) Statistical properties of a DNA sample under the finite-sites model. Genetics, 144(4):1941-1950. PDF

Rannala B, Yang Z. (1996) Probability distribution of molecular evolutionary trees: a new method of phylogenetic inference. Journal of Molecular Evolution, 43(3):304-311. PDF doi: 10.1007/BF02338839

Yang Z. (1996) Among-site rate variation and its impact on phylogenetic analyses. Trends in Ecology and Evolution, 11(9):367-372. PDF doi: 10.1016/0169-5347(96)10041-0

Yang Z. (1996) Maximum-Likelihood models for combined analyses of multiple sequence data. Journal of Molecular Evolution, 42(5):587-596. PDF doi: 10.1007/BF02352289

Yang Z. (1996) Phylogenetic analysis using parsimony and likelihood methods. Journal of Molecular Evolution, 42(2):294-307. PDF doi: 10.1007/BF02198856

Yang Z, Kumar S. (1996) Approximate methods for estimating the pattern of nucleotide substitution and the variation of substitution rates among sites. Molecular Biology and Evolution, 13(5):650-659. PDF (Implemented in the pamp program in the paml package.) doi: 10.1093/oxfordjournals.molbev.a025625


Yang Z, Kumar S, Nei M. (1995) A new method of inference of ancestral nucleotide and amino acid sequences. Genetics, 141(4):1641-1650. PDF (Example data file included in the paml release as stewart.aa.)

Yang Z, Goldman N, Friday AE. (1995) Maximum likelihood trees from DNA sequences: a peculiar statistical estimation problem. Systematic Biology, 44(3):384-399. PDF doi: 10.1093/sysbio/44.3.384

Yang Z, Roberts D. (1995) On the use of nucleic acid sequences to infer early branchings in the tree of life. Molecular Biology and Evolution, 12(3):451-458. PDF doi: 10.1093/oxfordjournals.molbev.a040220

Yang Z, Lauder IJ, Lin HJ. (1995) Molecular evolution of the hepatitis B virus genome. Journal of Molecular Evolution, 41(5):587-596. PDF doi: 10.1007/BF00175817

Yang Z, Wang T. (1995) Mixed model analysis of DNA sequence evolution. Biometrics, 51(2):552-561. PDF

Yang Z. (1995) On the general reversible Markov-process model of nucleotide substitution: a reply to Saccone et al. Journal of Molecular Evolution, 41:254-255. PDF doi: 10.1007/BF00170682

Yang Z. (1995) Evaluation of several methods for estimating phylogenetic trees when substitution rates differ over nucleotide sites. Journal of Molecular Evolution, 40:689-697. PDF doi: 10.1007/BF00160518

Yang Z. (1995) A space-time process model for the evolution of DNA sequences. Genetics, 139(2):993-1005. PDF (This describes the auto-discrete-gamma models.) doi: 10.1007/BF00160518


Yang Z. (1994) Statistical properties of the maximum likelihood method of phylogenetic estimation and comparison with distance matrix methods. Systematic Biology, 43(3):329-342. PDF doi: 10.1093/sysbio/43.3.329

Yang Z. (1994) Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: approximate methods. Journal of Molecular Evolution, 39(3):306-314. PDF (This is the discrete-gamma paper.) doi: 10.1007/BF00160154

Yang Z. (1994) Models of DNA substitution and the discrimination of evolutionary parameters. In Proceedings of the XVIIth International Biometrics Conference, Vol. I: Invited Papers, 407-420. International Biometrics Society, Hamilton, Ontario, Canada. PDF

Goldman N, Yang Z. (1994) A codon-based model of nucleotide substitution for protein-coding DNA sequences. Molecular Biology and Evolution, 11(5):725-736. PDF doi: 10.1093/oxfordjournals.molbev.a040153

Yang Z, Goldman N, Friday AE. (1994) Comparison of models for nucleotide substitution used in maximum likelihood phylogenetic estimation. Molecular Biology and Evolution, 11(2):316-324. PDF doi: 10.1093/oxfordjournals.molbev.a040112

Yang Z. (1994) Estimating the pattern of nucleotide substitution. Journal of Molecular Evolution, 39(1):105-111. PDF doi: 10.1007/BF00178256

Yang Z, Goldman N. (1994) Evaluation and extension of Markov process models for the evolution of DNA. Acta Genetica Sinica, 21(1):17-23. (in Chinese with an English abstract) doi: 10.1007/BF00178256


Yang Z. (1993) Maximum-likelihood Estimation of Phylogeny From DNA Sequences When Substitution Rates Differ Over Sites. Molecular Biology and Evolution, 10(6):1396-1401. PDF (The continuous gamma model is implemented in the basemlg program in the paml release, not used often due to intensive computation involved.) doi: 10.1093/oxfordjournals.molbev.a040082