Phylogenetic Analysis by Maximum Likelihood (PAML)

Ziheng Yang

Table of contents

Introduction

PAML is a package of programs for phylogenetic analyses of DNA or protein sequences using maximum likelihood. It is maintained and distributed for academic use free of charge by Ziheng Yang. ANSI C source codes are distributed for UNIX/Linux/Mac OSX, and executables are provided for MS Windows. PAML is not good for tree making. It may be used to estimate parameters and test hypotheses to study the evolutionary process, when you have reconstructed trees using other programs such as PAUP*, PHYLIP, MOLPHY, PhyML, RaxML, etc.

This document is about downloading and compiling PAML and getting started. See the manual (pamlDOC.pdf) for more information about running programs in the package.

Downloading and Setting up PAML

PAML-X: A GUI for PAML

A graphical user interface, called PAML-X, has been written by Bo Xu of Institute of Zoology, Chinese Academy of Sciences in Beijing. This is written in Qt and should run on Windows, Mac OSX, and linux, although the versions for OSX and linux may not be well tested. You download PAML, and also PAML-X. When you run PAML-X for the first time, you specify the PAML folder name. The links for downloading are listed below.

  Source code  pamlX-1.2-src.tgz
  Windows  pamlX-1.2-win32.tgz  pamlX-1.2+paml4.7-win32.tgz
  Mac OSX  pamlX-1.2-osx-x86_64.dmg  pamlX-1.2+paml4.7-osx-x86_64.dmg
  Linux  pamlX-1.2-x11-x86_64.tgz

The following is written for the naive user. If you know things like folders, executable files, and search path, there is no need for you to follow the instructions here.

PAML for Windows 9x/NT/2000/XP/Vista/7

Download and save the archive paml4.7a.tgz or paml4.8.tgz, on your local disk (Make sure you save the file using the correct name. Internet Explorer often changes the file name so that it cannot be opened anymore. Manually change the name back before double-clicking to extract the files). Unpack the archive into a folder, using Winzip, say. Remember the name of the folder. The Windows executables are in paml4.7a/bin/. I suggest that you create a folder for local prorams and move the PAML executables there. Here are some notes for doing that.

Setting up a folder of local programs and changing the search path. You need to do this for your user account only once. Suppose your user folder is C:\Users\Ziheng. [Please replace this with your own user folder in the following examples.] This is the default user folder for me on Vista or Windows 7. On Windows XP, it is more unwieldly, somthing like C:\Documents and Settings\Ziheng. Use Windows Explorer to create a folder called bin inside your user folder, that is, C:\Users\Ziheng\bin. Or if you are the boss of your PC, you may prefer the folder C:\bin. Anyway, this is the folder for holding executable programs.

Next we will add this folder onto the search path, which the OS uses to search for executable programs. The following is for Windows Vista. The menu may be slightly different on Win 2000/XP, but you should have no problem finding your way. Open Control Panel. Choose Classic View. Double-click on System. Choose Advanced System Settings, and click on the tab Advanced. Click on the button Environment Variables. Under User variables, double-click on the variable Path to edit. Click on the "variable value" field and move the cursor to the beginning. Insert the name of our program folder C:\Users\Ziheng\Bin; or C:\Bin; or whatever folder you have created. Note that the semicolon separates the folder names. Be careful not to introduce any errors. Click on OK.

Copy the PAML executables. Copy or move the pre-compiled executables (baseml.exe, codeml.exe, evolver.exe, chi2.exe, etc.) from the paml4.7a\bin\ folder to the local programs folder C:\Users\Ziheng\Bin\. After this, you can execute any of these programs from a command prompt whever you are. If you like, you can rename baseml.exe and codeml.exe as baseml4.exe and codeml4.exe respectively, to include the version number. (You will then run the program by the command codeml4 instead of codeml.)

You can also copy other command-line programs you downloaded into this folder, such as mb, RAxML, PhyML programs.

Running a PAML program. Avoid double-clicking the program names from Windows Explorer. That way you won't see any error messages on the screen when the program crashes. Instead start a "command prompt" box. For example, Start - Programs - Accessories - Command Prompt). Or Start - Run - type cmd and OK. You can right-click on the title bar and choose Properties to change the size, font, colour of the window. cd to the folder which contains your user files, and type the command name. Here we cd to the paml folder (suppose you have extracted the archive into C:\Programs\paml4.7a\) and run program using the default files.

C:
cd \Programs\paml4.7a\
codeml

Because there is no executable file called codeml.exe (or codeml.bat, etc.) in your current folder, the OS will look for it in the folders listed in the environment variable path. It will find and execute codeml.exe in the C:\Users\Ziheng\Bin folder. You can also specify the full path of the exectuable program, with something like the following:

C:\Programs\paml4.7a\bin\codeml

Some codeml analyses use an amino acid distance (e.g., grantham.dat) or substitution rate matrice (e.g., wag.dat). You will then need to copy the necessary file to your current folder. Otherwise the program will ask you to input the full path-name for the file.

UNIX/Linux and Mac OSX

UNIX, linux, and other systems. Download the the Win32 archive and save and unpack it into a local folder. Remove the Windows executables (.exe files) in the bin/ folder. (Replace 4.7a with the appropriate version number in the following commands.)

On a MAC, you open a command terminal by Applications-Utilities-Terminal.

tar xzf paml4.7a.tgz

Then cd to the paml folder (you have to remember where you saved the files) and again cd to the src/ folder and compile the programs.

rm bin/*.exe
cd src
make -f Makefile
ls -lF
rm *.o
mv baseml basemlg codeml pamp evolver yn00 chi2 ../bin
cd ..
ls -lF bin
bin/baseml
bin/codeml
bin/evolver
Setting up a folder of local programs and change your initialization file for the shell. You need to do this for your user account only once. First check that there is a bin/ folder inside your account. If not, create one.
cd
mkdir bin

Then modify your path to include the bin/ folder in the initialization file for the shell. You can use more /etc/passwd to see which shell you run. Below are notes for the C shell and bash shell. There are other shells, but these two are commonly used.

(1) If you see /bin/csh for your account in the /etc/passwd file, you are running the C shell, and the intialization file is .cshrc in your root folder. You can use more .cshrc to see its content if it is present. Use a text editor (such as emacs, vi, SimpleText, etc.) to edit (or create, if one does not exist) the file, by something like
emacs .cshrc
and insert the following line
set path = ($path . ~/bin)

The different fields are separated by spaces. Here '.' means the current folder, and ~/ means your root folder, and ~/bin means the bin folder you created, and $path is whatever folders are already in the path.

(2) If you see /bin/bash in the file /etc/passwd for your account, you are running the bash shell, and the initialization file is .bashrc. Use a text editor to open .bashrc and insert the following line

PATH=$PATH:./:~/bin/
This changes the environment variable PATH. The different fields are separated by colon : and not space. If the file does not exist, create one.

After you have changed and saved the initialization file, every time you start a new shell, the path is automatically set for you. You can then cd to the folder which contain your data files and run paml programs there. The following moves to the paml folder (suppose you have extracted the archive into Programs/paml4.7a/ on your account) and run program using the default files.

cd
cd Programs/paml4.7a
codeml

As the path is set up properly, this is equivalent to

~/bin/codeml

Note that Windows uses \ while Unix uses /, and Windows is case-insensitive while Unix is case-sensitive.

MAC OSX. If you have a G5 or if you would like to compile the programs yourself, please follow the notes here. I understand that the Apple XCODE is now automatically installed on your mac. Otherwise you will have to download and install the mac XCODE system, which includes the C compiler. Without a C compiler, you will get a "Command not found" error when you type gcc or cc at the command terminal.

Download the Windows archive. Open a command terminal (Applications-Utilities-Terminal) and compile and run the programs from the terminal. Remove the .exe files in the bin/ folder.

More specifically, open up the file Makefile in the src/ folder. Add # at the beginning of the following line to comment it out. CFLAGS = -O4 -funroll-loops -fomit-frame-pointer -finline-functions

Delete the # at the beginning of the line for either G5 or intel, depending on your machine, to uncomment the line.

#MAC OSX G5:
#CFLAGS = -mcpu=G5 -O4 -funroll-loops -fomit-frame-pointer -finline-functions

#MAC OSX intel:
#CFLAGS = -march=pentium-m -O4 -funroll-loops -fomit-frame-pointer -finline-functions

Save the file. At the command line, type make and hit Enter. After the programs are successfully compiled, delete the .o files and move the executables to the bin/ folder.

rm *.o
mv baseml basemlg codeml pamp evolver yn00 chi2 ../bin

You may want to mv the executables into the bin/ folder on your accounts rather than the paml main folder. And finally, if your current folder is not on your search path, you will have to add ./ in front of the executable file name; that is, use ./codeml instead of codeml to run codeml. See the notes for unix systems above.

Some notes about running programs in PAML

A number of example datasets are included in the package. They are typically datasets analyzed in the original papers that described the methods. I suggest that you get a copy of the paper, and run the example datasets to reproduce our results first, before analyzing your own data. This should serve to identify errors in the program, help you to get familiar with the format of the data file and the interpertation of results.

Most programs in the PAML package have control files that specify the names of the sequence data file, the tree structure file, and models and options for the analysis. The default control files are baseml.ctl for baseml and basemlg, codeml.ctl for codeml, pamp.ctl for pamp, mcmctree.ctl for mcmctree. The progam evolver does not have a control file, and uses a simple user interface. All you do is to type evolver and then choose the options. For other programs, you should prepare a sequence data file and a tree structure file, and modify the appropriate control files before running the programs. The formats of those files are detailed in the documentation.

PAML Resources on the web

Questions and Bug Reports

(a) If you discover a bug, please post an item at the discussion group or send me a message. I will try to visit the discussion group every week or every two weeks. When describing the problem, please mention the version number, what you did and what happened. In particular copy any error message on the screen into the message. Please try to make it easy for me to duplicate the problem on my own computers.

(b) If you have questions about using the programs, please try to find answers by reading the manual (doc/pamlDOC.pdf), the paml FAQ page (doc/pamlFAQ.pdf), or the postings at the Google discussion site. Use example data files included in the package to get to know the normal behavior of the programs. Often you should be able to tell from the screen output whether the program reads the sequence and tree files correctly.

If none of these is helpful, please post your question at the discussion group, for me or other users of paml to answer. The discussion group was set up to reduce the amount of time I have to spend in answering user questions. Please do not send me messages. I will most likely ignore emails asking questions about how to use the programs. I apologize for the inadequate support.


Webmaster: Ziheng Yang