Lab 15: Phylogenetic tree reconstruction#

Preparation#

  • Install the R package phybase hosted at Github. The devtools package provides install_github() that enables installing R packages from GitHub.

    install.packages("devtools")
    library(devtools)
    install_github("lliu1871/phybase") 

Data files#

Download the data files (lab15_primates.nex and lab15_primates.phy) of DNA alignments in Phylip and Nexus format. You may convert a format to another format in R using the packages ape and phybase.

library(phybase)
data = read.dna.seq("https://book.phylolab.net/binf8441/data/lab15_primates.nex")
write.dna.seq(data$seq, data$name, file="primates.phy", format="phylip")
Loading required package: ape
Loading required package: Matrix
Attaching package: ‘phybase’
The following objects are masked from ‘package:ape’:

    dist.dna, node.height
1

Distance methods#

  1. Given the DNA alignments, calculate the pairwise distances in phylip using the subroutine dnadist (read the instruction)

  2. Reconstructing the UPGMA and NJ trees from pairwise distances in phylip using the subroutine neighbor (read the instruction)

  3. Performing the bootstrap analysis using the subroutine seqboot (read the instruction)

  4. building a consensus tree from the bootstrap trees using the subroutine consense (read the instruction)

Parsimony methods#

Given the DNA alignments, find the most parsimonious tree in phylip using the subroutine dnapars (read the instruction)

Maximum likelihood#

Find the maximum likelihood tree using RAxML with the command line

raxmlHPC.exe  -p7635673  -s primates.phy  -mGTRGAMMA -n outputfile

Perform the bootstrap analysis with 100 replicates

raxmlHPC.exe   -b14635  -p7635673 -N100  -s primates.phy -mGTRGAMMA -n outputfile

Build the majority rule consensus tree from bootstrap trees

raxmlHPC.exe   -J MRE  -z bootstraptreefile  -mGTRGAMMA  -n outputfile

Summarizing bootstrap trees#

The python program sumtrees.py, which is available in the package dendropy, can build a majority rule consensus tree from a set of trees. The command line is

sumtrees.py -f 0 -o output.con.tre -p  input_treefile --to-phylip  --no-summary-metadata

Tree distance#

Calculate tree distances using the R function dist.topo() in ape

library(phybase)
trees = read.tree("https://book.phylolab.net/binf8441/data/lab15_trees.tre")
dist.topo(trees)
Warning message in dist.topo(trees):
“Some trees were rooted: topological distances may be spurious.”
      tree1 tree2 tree3 tree4
tree2     6                  
tree3     6     2            
tree4     6     6     4      
tree5     6     2     0     4