Data in Published Papers
Back
to Ziheng's home
- Zhu, T., Z. Wang and Z. Yang, 2025 The power of coalescent methods
for inferring recent and ancient gene flow in endangered Bactrian
camels. Proc. Natl. Acad. Sci. U.S.A. (data)
- Kornai D, Jiao X, Ji, Jiayi, Flouri T, Yang Z. 2024. Hierarchical
heuristic species delimitation under the multispecies coalescent model
with migration. Syst Biol. 10.1093/sysbio/syae050 (data and Readme)
- Flouri T, Jiao X, Huang J, Rannala B, Yang Z. 2023. Efficient Bayesian inference under the multispecies coalescent with migration. Proc Nat Acad Sci USA 120:e2310708120.
The Anopheles datasets are here (AnophelesData2020.tgz), and the README file.
Two datasets simulated using a 3-species species tree under the saturated
migration model with 8 migration rates, including bpp and g-phocs
files for the 2000-loci dataset and the bpp files for the 16,000-loci dataset (Flouri2023-simulation-3s-mscm-saturated.tgz).
-
Flouri T, Huang J, Jiao X, Kapli P, Rannala B, Yang Z. 2022. Bayesian phylogenetic inference using relaxed-clocks and the multispecies coalescent.
Gibbon and ratite data files, and control files for simulation (Flouri2022MBE-data.tgz).
-
Yang Z, Flouri T. 2022. Estimation of cross-species introgression rates using genomic data despite model unidentifiability. Mol Biol Evol. 10.1093/molbev/msac083
Data files for the Heliconius datasets (YangFlouri2022MBE-data.tgz).
-
Flouri T, Jiao X, Rannala B, Yang Z. 2020. A Bayesian implementation of the multispecies coalescent model with introgression for phylogenomic analysis. Mol Biol Evol 37:1211-1223.
Data files for the Anopheles datasets (AnophelesData2020.tgz).
and README file.
-
Flouri, T., B. Rannala, and Z. Yang. 2020. A tutorial on the use of BPP for species tree estimation and species delimitation, in Phylogenetics in the Genomic Era (N. Galtier, F. Delsuc, and C. Scornavacca, eds.). Creative Commons License.
Data files for the horned lizards dataset (HornedLizardsData.tgz).
-
Thawornwattana Y, Dalquen DA, Yang Z. 2018. Coalescent analysis of phylogenomic data confidently resolves the species relationships in the Anopheles gambiae species complex. Mol. Biol. Evol. 35:2512-2527
Sequence alignments and python scripts for the anopheles genomic data analyzed in the paper (AnophelesData2018.tgz).
-
Shi CM and Yang Z. 2018. Coalescent-based analyses of genomic sequence data provide a robust resolution of phylogenetic relationships among major groups of gibbons. Mol Biol Evol. 35:159-179.
Six full datasets analyzed in the paper (two real and four simulated) .
-
dos Reis, M., J. Inoue, M. Hasegawa, R. Asher, P. C. Donoghue, and Z. Yang. 2012. Phylogenomic data sets provide both precision and accuracy in estimating the timescale of placental mammal evolution. Proc. R. Soc. Lond. B. Biol. Sci. 279:3491-3500.
Tree of 36 mammal species with fossil calibrations,
Alignment of ~14K genes, concatenated into 20 partitions, 1st and 2nd codon positions for the 36 ENSEMBL mammal genomes,
Tree of 274 mammal species with calibrations (these are the posterior from the 36 species analysis),
Alignment of 12 mitochondria protein-coding genes (1st and 2nd codon positions) from 274 mammal species,
A MCMCtree control file mcmctree.ctl (Note the notes about the rate prior rgene_gamma).
-
Zhang, C., D.-X. Zhang, T. Zhu, and Z. Yang. 2011. Evaluation of a
Bayesian coalescent method of species delimitation. Syst. Biol. 60:747-761.
data files for the butterfly dataset (Zhang2011demeter.tgz).
-
Yang, Z., and B. Rannala. 2010. Bayesian species delimitation using multilocus sequence data. Proc. Natl. Acad. Sci. U.S.A. 107:9264-9269.
the rotifer and forest lizard data files (YangRannala2010PNAS.tgz).
-
Inoue J, Donoghue PCH, Yang Z. 2010. The impact of the representation of fossil calibrations on Bayesian estimation of species divergence times. Syst Biol 59:74-89.
Data file Inoue2010SB.tar.gz
-
Burgess R, Yang Z. 2008. Estimation of hominoid ancestral population
sizes under Bayesian coalescent models incorporating mutation rate
variation and sequencing errors. Mol Biol Evol 25:1979-1994.
Data file (ApeNeBY2008.tar.gz)
(note: I have changed the data file format so that the files don't
work with MCMCcoal 1.2. They will work with the new version of the
program, called bpp. Please look at the software page to download a
copy. The data file names should be self-explanatory with reference
to table 1 in the paper.
-
Yang, Z. & Nielsen, R. 2008 Mutation-selection models of codon substitution and their use to estimate selective strengths on codon usage. Mol. Biol. Evol. 25, 568-579. Data file (YangNielsen2008MBE.MutSel.tar.gz)
-
Furlong, R. F. & Yang, Z. 2008 Diversifying and purifying selection in the peptide binding region of DRB in mammals. J. Mol. Evol. 66: 384-394. Data file (FurlongYang2008JMEdata.tar.gz)
-
Rannala B, Yang Z. 2007. Inferring speciation times under an episodic
molecular clock. Syst Biol 56:453-466. cat alignment file Johnson2006.AXY38s.txt
and tree file Johnson2006.38s.trees
- Bielawski, J.P. and Z. Yang, 2005 Maximum
likelihood methods for detecting adaptive protein evolution, in
Statistical Methods in Molecular Evolution, R. Nielsen, Editor.
Springer-Verlag: New York. p. 103-124.
Data files in this book chapter are example1and2_GstD1.zip, example3_Ldh.zip, example4_HIV2nef.zip. Also a PAML Demo, prepared by Joe Bielawski.
- Yang, W., J. Bielawski and Z. Yang 2003. Widespread adaptive
evolution in the human immunodeficiency virus type 1
genome. J. Mol. Evol. 57:212-221.
The HIV data analyzed in this paper are here HIVdata.YBY2003.tar.gz.
-
Yang, Z. and W.J. Swanson. 2002. Codon-substitution models to detect
adaptive evolution that account for heterogeneous selective pressures
among site classes. Mol. Biol. Evol. 19: 49-57.
Yang, Z., W.S.W. Wong, and R. Nielsen. 2005. Bayes empirical Bayes inference
of amino acid sites under positive selection. Mol. Biol. Evol. 22: 1107-1118.
The MHC data files: bigmhc.codeml.ctl, bigmhc.phy and bigmhc.trees.
- Yang Z, Nielsen R. 2002. Codon-substitution models for detecting molecular adaptation at individual sites along specific lineages. Mol. Biol. Evol. 19:908-917.
Zhang J, Nielsen R, Yang Z. 2005. Evaluation of an improved branch-site likelihood method for detecting positive selection at the molecular level. Mol. Biol. Evol. 22:2472-2479.
BRCA1: BRCA1.codeml.ctl, BRCA1.8s.txt, BRCA1.trees
phyACF: phyACF.codeml.ctl, phyACF.txt, phyACF.trees
phyBDE: phyBDE.codeml.ctl, phyBDE.txt, phyBDE.trees
- Yang, Nielsen, Goldman, and Pedersen. 2000. Codon-substitution models for heterogeneous
selection pressure at amino acid sites. Genetics 155:431-449.
The data and result files are here YNGP2000data.tgz.
- Yang Z. 2000. Maximum likelihood estimation on large phylogenies and analysis of adaptive evolution in human influenza virus A. J Mol Evol 51:423-432. fluHAcdc349.nuc and fluHAcdc349.trees
- Yang and Rannala. 1997. Bayesian phylogenetic inference using DNA sequences: a Markov Chain Monte Carlo Method. Mol. Biol. Evol. 14:717-724.
The 9-species 888-bp primate dataset: mtprim9.txt.
A Mathematica notebook for calculating the kernel density of
coalescent times under the birth-death process model with species
sampling is BirthDeathSample.nb. This can be
used to draw figure 2 in the paper. See also Rannala, Huelsenbeck,
Yang, and Nielsen. 1998. Taxon sampling and the accuracy of large
phylogenies. Syst. Biol. 47:702-709.
-
Yang, Z., and D. Roberts. 1995. On the use of nucleic acid sequences
to infer early branchings in the tree of life. Mol. Biol. Evol. 12:451-458.
The four-species sequence alignment is here.
-
Yang Z. 1994. Estimating the pattern of nucleotide substitution. J. Mol. Evol. 39:105-111.
The psi-eta globin pseudogene dataset: psieta.nuc
The 9-species 888-bp primate dataset: mtprim9.txt.