Computer Software

BP&P: Bayesian analysis of genomic sequence data under the multispecies coalescent model

The BP&P program implements a series of Bayesian inference methods under the multispecies coalescent model with and without introgression. The analyses may include estimation of population size (theta’s) and species divergence times (tau’s), species tree estimation, species delimitation, and estimation of cross-species introgression intensity. We suggest that you get bpp version 4 as it contains all the functionalities of version 3.4, but runs much faster and supports multiple threads. Also the multispecies-coalescent-with-introgression (MSci) model is implemented in bpp4 only, and not in bpp3.4. The same control file works for both versions although bpp4.

Downloads and installation

The source code for the previous BP&P v3.4 is available in the links below:

If you have questions about either bpp3.4 or bpp4, please post them at the google bpp discussion site.

BP&P replaces the old program MCMCcoal, which implements the Bayesian method of Rannala & Yang (2003) and Burgess & Yang (2008). Bo Xu has written a graphics user interface for BP&P, called bppX. The compiled executables are here for Windows, Mac OSX, and linux. You install (unpack and compile) the current version of bpp first, unpack the GUI bppX, and then go to Function-Configuration to specify the folder name for the bpp files. This may not be working with the new versions of bpp, such as 3.4 and 4.

Bruce Rannala has written another BPPgui. The executables are for Windows and Mac OSX. You can use this to prepare the bpp control files, and also read in the species trees in the MCMC sample file, produced by either bpp3.4 or bpp4. Please download from Bruce’s group web site.

Citing BP&P

Flouri T., Jiao X., Rannala B., Yang Z. (2018) Species Tree Inference with BPP using Genomic Sequences and the Multispecies Coalescent. Molecular Biology and Evolution, 35(10):2585-2593. doi:10.1093/molbev/msy147

Please also cite the following papers depending on the method you use:

If you use the MSci model: Flouri T., Jiao X., Rannala B., Yang Z. (2020) A Bayesian Implementation of the Multispecies Coalescent Model with Introgression for Phylogenomic Analysis. Molecular Biology and Evolution, 37(4):1211-1223. doi:10.1093/molbev/msz296

etc

Tutorials


3s

The program 3s implements likelihood ratio tests to test for gene flow between two closely related species.

The latest release (Dalquen et al. 2017) includes several improvements over previous versions:

  • It allows the use of loci with arbitrary configurations (such as 123, 112, 113, 12, etc.).
  • By default, it uses an asymmetric IM model with θ1θ2 and M12M21.
  • It can take advantage of multi-core architectures to speed up computation.

The source code is available here

Downloads and Installation

You can use 3s as it is, without any additional libraries installed. Use the following command (adapted to your system) to compile the source code:

   gcc -O3 -o 3s 3s.c tools.c lfun3s.c -lm

However, we recommend installing the GNU Scientific library (GSL) as this will speed up computation of P(t) significantly, even if you run the program on a single core. Once GSL is installed, you can compile with something like this:

   gcc -O3 -DUSE_GSL -o 3s 3s.c tools.c lfun3s.c -lm -lgsl -lgslcblas

Depending on how your system is set up, you might or might not have to specify the location of GSL using the -I and -L options:

gcc -O3 -DUSE_GSL -I/usr/local/include -L/usr/local/lib -o 3s 3s.c tools.c lfun3s.c -lm -lgsl -lgslcblas

Finally, if you want to run 3s on multiple cores, you can compile with OpenMP support:

   gcc -O3 -DUSE_GSL -I/usr/local/include -L/usr/local/lib -o 3s -fopenmp 3s.c tools.c lfun3s.c -lm -lgsl -lgslcblas

This should work out-of-the-box on current Linux systems with the GNU tool chain. On Mac OS X and Windows you need a compiler with OpenMP support.

Citing 3S

Zhu T, Yang Z. 2012. Maximum likelihood implementation of an isolation-with-migration model with three species for testing speciation with gene flow.Mol. Biol. Evol. 29:3131-3142.

Dalquen D, Zhu T, Yang Z. 2017. Maximum likelihood implementation of an isolation-with-migration model for three species. Syst. Biol. 66:379-398.

Tutorials


Phylogenetic analysis by maximum likelihood (PAML)

PAML is a package of programs for phylogenetic analyses of DNA or protein sequences using maximum likelihood. It is maintained by Ziheng Yang and distributed under the GNU GPL v3. ANSI C source codes are distributed for UNIX/Linux/Mac OSX, and executables are provided for MS Windows. PAML is not good for tree making. It may be used to estimate parameters and test hypotheses to study the evolutionary process, when you have reconstructed trees using other programs such as PAUP*, PHYLIP, MOLPHY, PhyML, RaxML, etc.

PAML manual in pdf

Downloads and Installation

PAML Windows 9x/NT/2000/XP/Vista 7

Download the executable (paml4.9j.tgz) here

  1. Download and save the archive paml4.9j.tgz on your local disk. (Make sure the extension of the file did not change with the dowload)
  2. Unpack the archive into a folder (e.g. with Winzip). Remember the name of the folder.
  3. The Windows executables are in paml4.9j/bin/. I suggest that you create a folder for local prorams and move the PAML executables there. Here are some notes for doing that. Setting up a folder of local programs and changing the search path. You need to do this for your user account only once. Suppose your user folder is C:\Users\Ziheng. (Please replace this with your own user folder in the following examples.) This is the default user folder for me on Vista or Windows 7. On Windows XP, it is more unwieldly, somthing like C:\Documents and Settings\Ziheng. Use Windows Explorer to create a folder called bin inside your user folder, that is, C:\Users\Ziheng\bin. Or if you are the boss of your PC, you may prefer the folder C:\bin. Anyway, this is the folder for holding executable programs.
  4. Next we will add this folder onto the search path, which the OS uses to search for executable programs. The following is for Windows Vista (the menu may be slightly different on Win 2000/XP but similar). Open Control Panel. Choose Classic View. Double-click on System. Choose Advanced System Settings, and click on the tab Advanced. Click on the button Environment Variables. Under User variables, double-click on the variable Path to edit. Click on the variable value field and move the cursor to the beginning. Insert the name of our program folder C:\Users\Ziheng\Bin; or C:\Bin; or whatever folder you have created. Note that the semicolon separates the folder names. Be careful not to introduce any errors. Click on OK.
  5. Copy the PAML executables. Copy or move the pre-compiled executables (baseml.exe, codeml.exe, evolver.exe, chi2.exe, etc.) from the paml4.9j\bin\ folder to the local programs folder C:\Users\Ziheng\Bin\. After this, you can execute any of these programs from a command prompt whever you are. If you like, you can rename baseml.exe and codeml.exe as baseml4.exe and codeml4.exe respectively, to include the version number. (You will then run the program by the command codeml4 instead of codeml.)
  6. Running a PAML program. Avoid double-clicking the program names from Windows Explorer. That way you won’t see any error messages on the screen when the program crashes. Instead start a command prompt box. For example, Start - Programs - Accessories - Command Prompt). Or Start - Run - type cmd and OK. You can right-click on the title bar and choose Properties to change the size, font, colour of the window. cd to the folder which contains your user files, and type the command name. Here we cd to the paml folder (suppose you have extracted the archive into C:\Programs\paml4.9j\) and run program using the default files.
    C:
    cd \Programs\paml4.9j\
    codeml

Because there is no executable file called codeml.exe (or codeml.bat, etc.) in your current folder, the OS will look for it in the folders listed in the environment variable path. It will find and execute codeml.exe in the C:\Users\Ziheng\Bin folder. You can also specify the full path of the exectuable program, with something like the following:

    C:\Programs\paml4.9j\bin\codeml 

Some codeml analyses use an amino acid distance (e.g., grantham.dat) or substitution rate matrice (e.g., wag.dat). You will then need to copy the necessary file to your current folder. Otherwise the program will ask you to input the full path-name for the file.

PAML for UNIX/Linux

  1. Download the the Win32 archive from here
  2. Save and unpack it into a local folder
  3. Remove the Windows executables (.exe files) in the bin/ folder. Replace 4.9j with the appropriate version number in the following commands.
    tar -xf paml4.9j.tgz
    cd paml4.9j
    rm bin/*.exe
    cd src
    make -f Makefile
    ls -lF
    rm *.o
    mv baseml basemlg codeml pamp evolver yn00 chi2 ../bin
    cd ..
    ls -lF bin
    bin/baseml
    bin/codeml
    bin/evolver
  • Setting up a folder of local programs and change your initialization file for the shell. First check that there is a bin/ folder inside your account. If not, create one.
    cd
    mkdir bin
  • Then modify your path to include the bin/ folder in the initialization file for the shell. You can use more /etc/passwd to see which shell you run. Below are notes for the C shell and bash shell. There are other shells, but these two are commonly used.
  • If you see /bin/csh for your account in the /etc/passwd file, you are running the C shell, and the intialization file is .cshrc in your root folder. You can use more .cshrc to see its content if it is present. Use a text editor (such as emacs, vi, SimpleText, etc.) to edit (or create, if one does not exist) the file, by something like
    emacs .cshrc
  • insert the following line
     set path = ($path . ~/bin)
    
  • The different fields are separated by spaces. Here . means the current folder, and ~/ means your root folder, and ~/bin means the bin folder you created, and $path is whatever folders are already in the path.

  • If you see /bin/bash in the file /etc/passwd for your account, you are running the bash shell, and the initialization file is .bashrc. Use a text editor to open .bashrc and insert the following line
   PATH=$PATH:./:~/bin/
  • This changes the environment variable PATH. The different fields are separated by colon : and not space. If the file does not exist, create one.

  • After you have changed and saved the initialization file, every time you start a new shell, the path is automatically set for you. You can then cd to the folder which contain your data files and run paml programs there. The following moves to the paml folder (suppose you have extracted the archive into Programs/paml4.9j/ on your account) and runs the program using the default files.

   cd
   cd Programs/paml4.9j
   codeml
  • As the path is set up properly, this is equivalent to
    ~/bin/codeml

Note that Windows uses \ while Unix uses /, and Windows is case-insensitive while Unix is case-sensitive.

PAML for Mac OSX

However, if you have a G5 or if you would like to compile the programs yourself, please follow the notes here.

I understand that the Apple XCODE is now automatically installed on your mac. Otherwise you will have to download and install the mac XCODE system, which includes the C compiler. Without a C compiler, you will get a “Command not found” error when you type gcc or cc at the command terminal.

  1. Download the Windows archive provided .
  2. Open a command terminal (Applications-Utilities-Terminal).
  3. Remove the .exe files from the bin/ folder.
  4. Compile and run the programs from the terminal as follows:
  • open up the file Makefile in the src/ folder and add # at the beginning of the following line to comment it out:

CFLAGS = -O4 -funroll-loops -fomit-frame-pointer -finline-functions

  • Delete the # at the beginning of the line for either G5 or intel, depending on your machine, to uncomment the line.

For MAC OSX G5 uncomment the following line:

CFLAGS = -mcpu=G5 -O4 -funroll-loops -fomit-frame-pointer -finline-functions

For MAC OSX intel uncomment the following line:

CFLAGS = -march=pentium-m -O4 -funroll-loops -fomit-frame-pointer -finline-functions

!!!!!!!!!!!Save the file and compile after the programs are successfully compiled, delete the .o files and move the executables to the bin/ folder as follows:

   rm *.o
   mv baseml basemlg codeml pamp evolver yn00 chi2 ../bin

You may want to mv the executables into the bin/ folder on your accounts rather than the paml main folder. And finally, if your current folder is not on your search path, you will have to add ./ in front of the executable file name; that is, use ./codeml to run codeml. See the notes for unix systems above.

Graphical user interface

A graphical user interface, called PAML-X, has been written by Bo Xu of Institute of Zoology, Chinese Academy of Sciences in Beijing. This is written in Qt and should run on Windows, Mac OSX, and linux, although the versions for OSX and linux may not be well tested.

Download PAML (see above), and also PAML-X (links below). Use of pamlX1.3.1 requires paml4.9 or later. When you run PAML-X for the first time, you need to specify the PAML folder name.

Source code for pamlX

Windows executable for pamlX

Mac OSX / Linux executable for pamlX

Citing PAML

Yang, Z. 1997 PAML: a program package for phylogenetic analysis by maximum likelihood. Comput. Appl. Biosci. 13, 555-556.

Yang, Z. 2007 PAML 4: Phylogenetic Analysis by Maximum Likelihood. Mol. Biol. Evol. 24, 1586-1591.

Tutorials


Tutorials for Windows and MAC OSX/Linux/Unix command line

The best way of running the programs listed below is by the command line. Here are links for command line tutorials on different operating systems.

Tutorial for Microsoft Windows command line

Tutorial for MAC OSX/Linux/Unix command line

An fairly extensive introduction to Unix commands, written by Tim Massingham for the Workshop on Computational Molecular >Evolution (CoME), is available here.