Placing a known WGD on a phylogeny

Below are the inputs, commands, and outputs to do an analysis with GRAMPA to place a known WGD on a phylogeny. The inputs are based on simulated data. For more detailed info on the simulations check our paper.

Note

The examples below call GRAMPA as grampa assuming it has been installed from bioconda. If you installed from source, see usage details in the README.

Inputs

Suppose you have a set of species, of which you have evidence that a some may be the result of a polyploidization event. You also may have an idea about the parental lineages of the polyploid species. So you build a species tree of your taxa and, since species tree reconstruction programs output singly-labeled trees, you get this as the result:

Your hypothesis is that species x, y, and z may be the result of polyploidization (but you're not sure if all of them are). The singly-labeled tree implicitly identifies one of the parental lineages of the suspected polyploids by placing the polyploid species sister to it (in this case lineage B seems to be one of the parents). You also think this tree may be the result of an allopolyploidy, and you think some lineage sister to C, D, the C,D clade, or at the root of the tree may have hybridized with some lineage related to B to form the x,y,z clade. Working with this prior knowledge GRAMPA can test your hypotheses of polyploidization.

The input files you would need are:

  1. Singly-labeled species tree: spec_tree_3a.tre
  2. Gene trees from your set of species (in this case 1000 gene trees simulated with gain and loss) : gene_trees_3a.txt

GRAMPA command

Since we have some idea of the lineages involved in the polyploidization event, we would want to limit GRAMPA's search to those lineages with the -h1 and -h2 search parameters.

grampa -s spec_tree_3a.tre -g gene_trees_3a.txt -h1 "x 1 2" -h2 "C D 5 6" -o ex2_output -f ex2_test

Above we have specified -h1 and -h2 by using the node labels in the tree. Alternatively, we could specify an equivalent -h1 and -h2 search by defining the labels based on the sets of tips that define them:

grampa -s spec_tree_3a.tre -g gene_trees_3a.txt -h1 "x x,y x,y,z" -h2 "C D A,x,y,z,B,C A,x,y,z,B,C,D" -o ex2_output -f ex2_test

These two commands are equivalent. The second method is slightly more cumbersome, but does not require you to have internal labels on your tree. Although, GRAMPA can easily add internal labels to your input tree with the --labeltree command.

Outputs

The above command would create the directory ex2-output with five output files

Since we are trying to determine the mode of polyploidy, we are interested in the ex2-test-scores.txt file. This file contains the total reconciliation scores for each MUL-tree considered, sorted from lowest scoring tree to highest scoring tree, and looks something like this:

mul.tree        h1.node h2.node score   labeled.tree
9       <2>     C       5242    ((((((x+,y+)<1>,z+)<2>,B)<3>,A)<4>,(C,((x*,y*)<5>,z*)<6>)<7>)<8>,D)<9>
11      <2>     <5>     7598    (((((((x+,y+)<1>,z+)<2>,B)<3>,A)<4>,C)<5>,((x*,y*)<6>,z*)<7>)<8>,D)<9>
0       NA      NA      8312    ((((((x,y)<1>,z)<2>,B)<3>,A)<4>,C)<5>,D)<6>
5       <1>     C       8506    ((((((x+,y+)<1>,z)<2>,B)<3>,A)<4>,(C,(x*,y*)<5>)<6>)<7>,D)<8>
10      <2>     D       8618    ((((((x+,y+)<1>,z+)<2>,B)<3>,A)<4>,C)<5>,(D,((x*,y*)<6>,z*)<7>)<8>)<9>
12      <2>     <6>     8845    (((((((x+,y+)<1>,z+)<2>,B)<3>,A)<4>,C)<5>,D)<6>,((x*,y*)<7>,z*)<8>)<9>
6       <1>     D       8854    ((((((x+,y+)<1>,z)<2>,B)<3>,A)<4>,C)<5>,(D,(x*,y*)<6>)<7>)<8>
7       <1>     <5>     8939    (((((((x+,y+)<1>,z)<2>,B)<3>,A)<4>,C)<5>,(x*,y*)<6>)<7>,D)<8>
2       x       D       8949    ((((((x+,y)<1>,z)<2>,B)<3>,A)<4>,C)<5>,(D,x*)<6>)<7>
1       x       C       9109    ((((((x+,y)<1>,z)<2>,B)<3>,A)<4>,(C,x*)<5>)<6>,D)<7>
8       <1>     <6>     9202    (((((((x+,y+)<1>,z)<2>,B)<3>,A)<4>,C)<5>,D)<6>,(x*,y*)<7>)<8>
3       x       <5>     9259    (((((((x+,y)<1>,z)<2>,B)<3>,A)<4>,C)<5>,x*)<6>,D)<7>
4       x       <6>     9304    (((((((x+,y)<1>,z)<2>,B)<3>,A)<4>,C)<5>,D)<6>,x*)<7>

GRAMPA tells us MUL-tree 9 is the lowest scoring tree:

((((((x+,y+)<1>,z+)<2>,B)<3>,A)<4>,(C,((x*,y*)<5>,z*)<6>)<7>)<8>,D)<9>

In other words, GRAMPA has identified the x,y,z clade as the polyploid clade and has identified the C lineage as the second parental lineage!