The synergistic effect of concatenation in phylogenomics : the case in Pantoea
Loading...
Date
Authors
Palmer, Marike
Venter, S.N. (Stephanus Nicolaas)
McTaggart, Alistair R.
Coetzee, Martin Petrus Albertus
Van Wyk, Stephanie
Avontuur, Juanita Rayleen
Beukes, Chrizelle Winsie
Fourie, Gerda
Santana, Quentin C.
Van der Nest, Magrieta Aletta
Journal Title
Journal ISSN
Volume Title
Publisher
PeerJ
Abstract
With the increased availability of genome sequences for bacteria, it has become
routine practice to construct genome-based phylogenies. These phylogenies have
formed the basis for various taxonomic decisions, especially for resolving problematic
relationships between taxa. Despite the popularity of concatenating shared genes to
obtain well-supported phylogenies, various issues regarding this combined-evidence
approach have been raised. These include the introduction of phylogenetic error
into datasets, as well as incongruence due to organism-level evolutionary processes,
particularly horizontal gene transfer and incomplete lineage sorting. Because of the
huge effect that this could have on phylogenies, we evaluated the impact of phylogenetic
conflict caused by organism-level evolutionary processes on the established species
phylogeny for Pantoea, a member of the Enterobacterales. We explored the presence
and distribution of phylogenetic conflict at the gene partition and nucleotide levels, by
identifying putative inter-lineage recombination events that might have contributed
to such conflict. Furthermore, we determined whether smaller, randomly constructed
datasets had sufficient signal to reconstruct the current species tree hypothesis or
if they would be overshadowed by phylogenetic incongruence. We found that no
individual gene tree was fully congruent with the species phylogeny of Pantoea,
although many of the expected nodes were supported by various individual genes
across the genome. Evidence of recombination was found across all lineages within
Pantoea, and provides support for organism-level evolutionary processes as a potential
source of phylogenetic conflict. The phylogenetic signal from at least 70 random
genes recovered robust, well-supported phylogenies for the backbone and most
species relationships of Pantoea, and was unaffected by phylogenetic conflict within
the dataset. Furthermore, despite providing limited resolution among taxa at the
level of single gene trees, concatenated analyses of genes that were identified as
having no signal resulted in a phylogeny that resembled the species phylogeny of
Pantoea. This distribution of signal and noise across the genome presents the ideal
situation for phylogenetic inference, as the topology from a ≥70-gene concatenated
species phylogeny is not driven by single genes, and our data suggests that this
finding may also hold true for smaller datasets. We thus argue that, by using a concatenation-based approach in phylogenomics, one can obtain robust phylogenies
due to the synergistic effect of the combined signal obtained from multiple genes.
Description
Supplementary Information 1: Multi-species coalescent model phylogeny
DOI: 10.7717/peerj.6698/supp-1
Supplementary Information 2: Consensus network of 1,357 gene trees DOI: 10.7717/peerj.6698/supp-2
Supplementary Information 3: Neighbour-Joining phylogeny from ANI-based distances DOI: 10.7717/peerj.6698/supp-3
Supplementary Information 4: AML concatenated phylogenies constructed after the exclusion of backbone supporting genes and genes with no signal DOI: 10.7717/peerj.6698/supp-4
Supplementary Information 5: AML concatenated phylogenies of backbone supporting genes and genes with no signal DOI: 10.7717/peerj.6698/supp-5
Supplementary Information 6: Strict consensus trees of subset datasets DOI: 10.7717/peerj.6698/supp-6
Supplementary Information 7: Python script Raw Data: FastTree python script for the construction of individual gene trees DOI: 10.7717/peerj.6698/supp-7
Supplementary Information 8: NeighborNet Network Raw Data: A nexus file for the NeighborNet Network constructed from the concatenated nucleotide data matrix (Fig. 2). DOI: 10.7717/peerj.6698/supp-8
Supplementary Information 9: Consensus Network Raw data: A nexus file for the consensus network constructed from the individual gene trees (Fig. S2). DOI: 10.7717/peerj.6698/supp-9
Supplementary Information 10: Backbone supporting and no signal gene trees Datasets and trees for the individual gene trees marked as supporting the backbone and those with no signal. DOI: 10.7717/peerj.6698/supp-10
Supplementary Information 11: Nucleotides with conflicting signal Nucleotide positions with conflicting signal as determined from the NeighborNet. DOI: 10.7717/peerj.6698/supp-11
Supplementary Information 12: Recombination detection data Results obtained from the recombination detection program regarding potential recombination breakpoints. DOI: 10.7717/peerj.6698/supp-12
Supplementary Information 13: Randomised subset datasets Data pertaining to the randomised subset datasets constructed from 20, 50, 60, 70, 80, 90, 100, 110 and 120 randomly selected genes. DOI: 10.7717/peerj.6698/supp-13
Supplementary Information 2: Consensus network of 1,357 gene trees DOI: 10.7717/peerj.6698/supp-2
Supplementary Information 3: Neighbour-Joining phylogeny from ANI-based distances DOI: 10.7717/peerj.6698/supp-3
Supplementary Information 4: AML concatenated phylogenies constructed after the exclusion of backbone supporting genes and genes with no signal DOI: 10.7717/peerj.6698/supp-4
Supplementary Information 5: AML concatenated phylogenies of backbone supporting genes and genes with no signal DOI: 10.7717/peerj.6698/supp-5
Supplementary Information 6: Strict consensus trees of subset datasets DOI: 10.7717/peerj.6698/supp-6
Supplementary Information 7: Python script Raw Data: FastTree python script for the construction of individual gene trees DOI: 10.7717/peerj.6698/supp-7
Supplementary Information 8: NeighborNet Network Raw Data: A nexus file for the NeighborNet Network constructed from the concatenated nucleotide data matrix (Fig. 2). DOI: 10.7717/peerj.6698/supp-8
Supplementary Information 9: Consensus Network Raw data: A nexus file for the consensus network constructed from the individual gene trees (Fig. S2). DOI: 10.7717/peerj.6698/supp-9
Supplementary Information 10: Backbone supporting and no signal gene trees Datasets and trees for the individual gene trees marked as supporting the backbone and those with no signal. DOI: 10.7717/peerj.6698/supp-10
Supplementary Information 11: Nucleotides with conflicting signal Nucleotide positions with conflicting signal as determined from the NeighborNet. DOI: 10.7717/peerj.6698/supp-11
Supplementary Information 12: Recombination detection data Results obtained from the recombination detection program regarding potential recombination breakpoints. DOI: 10.7717/peerj.6698/supp-12
Supplementary Information 13: Randomised subset datasets Data pertaining to the randomised subset datasets constructed from 20, 50, 60, 70, 80, 90, 100, 110 and 120 randomly selected genes. DOI: 10.7717/peerj.6698/supp-13
Keywords
Phylogenetic conflict, Phylogenetic signal, Phylogenetics, Super trees, Concatenate, Phylogenomics
Sustainable Development Goals
Citation
Palmer M, Venter SN, McTaggart AR, Coetzee MPA, Van Wyk S, Avontuur JR, Beukes CW, Fourie G, Santana QC, Van Der Nest MA, Blom J, Steenkamp ET. 2019. The synergistic effect of concatenation in phylogenomics: the case in Pantoea. PeerJ
7:e6698 http://doi.org/10.7717/peerj.6698.