Count duplications and losses in the presence of polyploidy

Below are the inputs, commands, and outputs to do an analysis with GRAMPA to count the number of duplications and losses in a set of gene trees in the presence of polyploidy. The inputs are based on simulated data. For more detailed info on the simulations check our paper.

Note

The examples below call GRAMPA as grampa assuming it has been installed from bioconda. If you installed from source, see usage details in the README.

Inputs

Suppose a hybridization event between the B and C lineages leading to the allopolyploid x,y,z clade. The MUL-tree representing this scenario is:

GRAMPA can accurately count duplications and losses with this MUL-tree as input.

  1. MUL-tree: mul_tree_74_3a.tre
  2. Gene trees from your set of species (in this case 1000 gene trees simulated with gain and loss) : gene_trees_3a.txt

GRAMPA command

With a known polyploidy scenario (MUL-tree), we can simply reconcile to that MUL-tree:

grampa -s mul_tree_74_3a.tre -g gene_trees_3a.txt -o ex3-output -f count-test --multree

The --multree flag is required in this case to let GRAMPA know that the input species tree is a MUL-tree.

Outputs

The above command would create the directory ex3-output with five output files

Since we are trying to count duplications and losses, we are interested in the count-test-detailed.txt file. This file contains reconciliation scores for each gene tree to the lowest scoring MUL-tree (in this case, the only MUL-tree). The contents of the file look something like this:

mul.tree        gene.tree       dups    losses  total.score
1       1       5       2       7
1       2       8       4       12
1       3       3       0       3
1       4       1       4       5

Here you can see the exact number of duplications and losses for each gene tree when mapped to the lowest scoring MUL-tree, while properly counting multiple copies of the polyploid species (x,y,z) as either paralogs or homoeologs. Additionally, you could add the --maps flag to the command above to add another column to the detailed output file that shows the maps and duplication nodes in each gene tree (see the README section --maps for more info).

Additionally, GRAMPA counts the total number of duplications across all gene trees for each node in the six lowest scoring trees. In this example, we only searched one tree so it only provides the duplication counts for that tree. These counts can be found in the count-test-dup-counts.txt file:

mul.tree        node    dups
1       x*      102
1       y*      88
1       z*      193
1       B       234
1       A       344
1       C       232
1       x+      104
1       y+      87
1       z+      212
1       D       386
1       <1>     94
1       <2>     82
1       <3>     102
1       <4>     73
1       <5>     110
1       <6>     101
1       <7>     143
1       <8>     108
1       <9>     0