Computer software

Tutorials for Windows and MAC OSX/Linux/Unix command line

The best way of running the programs here is by the command line. Since some modern students have never seen the command line, I have written some notes to introduce the basics. I suggest that you go through the tutorial yourself before trying to run the programs like paml or bpp. I think the tutorial should take less than one hour. Please let me know if you find the tutorial confusing or if you have comments or suggestions.
Tutorial for Microsoft Windows command line, pdf, 4 pages
Tutorial for MAC OSX/Linux/Unix command line, pdf, 4 pages
An fairly extensive Introduction to Unix commands, written by Tim Massingham for the Workshop on Computational Molecular Evolution (CoME), pdf, 28 pages.

Phylogenetic analysis by maximum likelihood (PAML)

Yang, Z. 1997 PAML: a program package for phylogenetic analysis by maximum likelihood. Comput. Appl. Biosci. 13, 555-556.
Yang, Z. 2007 PAML 4: Phylogenetic Analysis by Maximum Likelihood. Mol. Biol. Evol. 24, 1586-1591.

BP&P: Bayesian analysis of genomic sequence data under the multispecies coalescent model

The BP&P program implements a series of Bayesian inference methods under the multispecies coalescent model with and without introgression. The analyses may include estimation of population size (theta's) and species divergence times (tau's), species tree estimation, species delimitation, and estimation of cross-species introgression intensity. This page has links for bpp version 3.4a. We suggest that you get bpp version 4, from instead. Version 4 has all the functionalities of version 3.4, but runs much faster and supports multiple threads. Also the multispecies-coalescent-with-introgression (MSci) model is implemented in bpp4 only, and not in bpp3.4. The same control file works for both versions although bpp4 does not accept a default control file: instead you run the program by

bpp -cfile=bpp.ctl
bpp -simulate=MCcoal.ctl
(This is the functionality of MCcoal in bpp3.4)

The archive bpp3.4a.tgz includes the source code for all platforms, as well as executables for Windows.
(Make sure that you save the file using the correct file name. If Internet Explorer changes the file extension to .gz, you should change it back to .tgz before double-clicking).
Download bpp3.4.macosx.tgz (source code and executables for mac osx 7 and later)

Please read the documentation in the package (bppDOC.pdf) before using the program. If you are new and want to get started with the program, you can go through the command-line tutorial at the top of this page (choose the right pdf file depending on whether you use windows, mac, or linux), and then go through the tutorial in the following paper:

Yang, Z. 2015. A tutorial of BPP for species tree estimation and species delimitation. Current Zoology 61:854-865. pdf

Flouris, T., et al. (2018). Species tree inference with BPP using genomic sequences and the multispecies coalescent. Mol. Biol. Evol. 35(10): 2585-2593. pdf

If you have questions about either bpp3.4 or bpp4, after having struggled through the tutorial, please post them at the google bpp discussion site.

BP&P replaces the old program MCMCcoal, which implements the Bayesian method of Rannala & Yang (2003) and Burgess & Yang (2008). Bo Xu has written a graphics user interface for BPP, called bppX. The compiled executables are here for Windows, Mac OSX, and linux. You install (unpack and compile) the current version of bpp first. Then unpack the GUI bppX. Then go to Function-Configuration to specify the folder name for the bpp files. I think this may not be working with the new versions of bpp, such as 3.4 and 4.

Bruce Rannala has written another BPPgui. The executables are for Windows and Mac OSX. You can use this to prepare the bpp control files, and also read in the species trees in the MCMC sample file, produced by either bpp3.4 or bpp4. Please download from Bruce's group web site.

  Source code  bppX1.2.2-src.tgz
  Windows  bppX1.2.2+bpp3.1-win-x86.tgz
  Mac OSX  bppX1.2.2+bpp3.1-osx-x86_64.dmg


The program 3s implements likelihood ratio tests to test for gene flow between two closely related species. Click on the link above for more information.

Zhu T, Yang Z. 2012. Maximum likelihood implementation of an isolation-with-migration model with three species for testing speciation with gene flow. Mol. Biol. Evol. 29:3131-3142.

Dalquen D, Zhu T, Yang Z. 2017. Maximum likelihood implementation of an isolation-with-migration model for three species. Syst. Biol. 66:379-398.

MrBayes 3.2.1 with Dirichlet priors on branch lengths

The archive name is mb3.2.1.Dir.tgz. This is a modified version of MrBayes Version 3.2.1, modified to use the Gamma-Dirichlet and Inverse Gamma-Dirichlet priors for branch lengths (Rannala et al. in press; Zhang et al. 2012). This also includes the two exponential priors on internal and external branch lengths described by Yang & Rannala (2005) and Yang (2007). There is a brief description of the changes in the file mb3.2.Dirichlet.Notes.pdf inside the archive. If you use the modified program, please cite MrBayes as well as the papers that describe the modifications.

Yang, Z 2007 Fair-balance paradox, star-tree paradox and Bayesian phylogenetics. Mol. Biol. Evol. 24, 1639-1655.
Yang, Z & Rannala, B. 2005 Branch-length prior influences Bayesian posterior probability of phylogeny. Syst. Biol. 54, 455-470.
Rannala, B., T. Zhu, and Z. Yang. 2012. Tail paradox, partial identifiability and influential priors in Bayesian branch length inference. Mol. Biol. Evol. 29:325-335.
Zhang, C., B. Rannala, and Z. Yang. 2012. Robustness of compound Dirichlet priors for Bayesian inference of branch lengths. Syst. Biol. 61:779-784.

oncoSpectrum v1.2: ML program for estimating mutation rates using cancer mutation databases.

Yang Z, Ro S, Rannala B. 2003. Likelihood models of somatic mutation and codon substitution in cancer genes. Genetics 165:695-705.

Go back to Ziheng Yang's group web page