Abstract:
The early diversification of angiosperms is thought to have been a rapid process, which may complicate phylogenetic analyses of early angiosperm relationships. Plastid and nuclear phylogenomic studies have raised several conflicting hypotheses regarding overall angiosperm phylogeny, but mitochondrial genomes have been largely ignored as a relevant source of information. Here we sequenced mitochondrial genomes from 18 angiosperms to fill taxon-sampling gaps in Austrobaileyales, magnoliids, Chloranthales, Ceratophyllales, and major lineages of eudicots and monocots. We assembled a data matrix of 38 mitochondrial genes from 107 taxa to assess how well mitochondrial genomic data address current uncertainties in angiosperm relationships. Although we recovered conflicting phylogenies based on different data sets and analytical methods, we also observed congruence regarding deep relationships of several major angiosperm lineages: Chloranthales were always inferred to be the sister group of Ceratophyllales, Austrobaileyales to mesangiosperms, and the unplaced Dilleniales was consistently resolved as the sister to superasterids. Substitutional saturation, GC compositional heterogeneity, and codon-usage bias are possible reasons for the noise/conflict that may impact phylogenetic reconstruction; and angiosperm mitochondrial genes may not be substantially affected by these factors. The third codon positions of the mitochondrial genes appear to contain more parsimony-informative sites than the first and second codon positions, and therefore produced better resolved phylogenetic relationships with generally strong support. The relationships among these major lineages remain incompletely resolved, perhaps as a result of the rapidity of early radiations. Nevertheless, data from mitochondrial genomes provide additional evidence and alternative hypotheses for exploring the early evolution and diversification of the angiosperms.
Description:
SUPPLEMENTARY MATERIAL 1 : FIG. S1. Percentage of variable sites and informative sites of 38 mt genes in 107 taxa. The colored lines under the gene names indicate the categories of the genes. FIG. S2. The ML tree inferred by RAxML based on concatenated nt sequences of 38 mt genes of 107 species. Numbers on branches are bootstrap values. FIG. S3. The ML tree inferred by RAxML based on concatenated aa sequences of 38 mt genes of 107 species. Numbers on branches are bootstrap values. FIG. S4. The ML tree inferred by RAxML based on the combined first and second codon positions of 38 mt genes of 107 species. Numbers on branches are bootstrap values. FIG. S5. The ML tree inferred by RAxML based on the third codon positions of 38 mt genes of 107 species. Numbers on branches are bootstrap values. FIG. S6. The Bayesian tree inferred by MrBayes based on the concatenated nt sequences of 38 mt genes of 107 species. Numbers on branches are posterior probabilities. FIG. S7. The Bayesian tree inferred by Phylobayes with the CAT model based on the concatenated amino acid sequences of 38 mt genes of 107 species. Numbers on branches are posterior probabilities. FIG. S8. The Bayesian tree inferred by MrBayes based on the combined first and second codon positions of 38 mt genes of 107 species. Numbers on branches are posterior probabilities. FIG. S9. The Bayesian tree inferred by MrBayes based on the third codon positions of 38 mt genes of 107 species. Numbers on branches are posterior probabilities. FIG. S10. The MP tree inferred by PAUP based on concatenated nt sequences of 38 mt genes of 107 species. Numbers on branches are bootstrap values. FIG. S11. The MP tree inferred by PAUP based on concatenated aa sequences of 38 mt genes of 107 species. Numbers on branches are bootstrap values. FIG. S12. The MP tree inferred by PAUP based on the combined first and second codon positions of 38 mt genes of 107 species. Numbers on branches are bootstrap values. FIG. S13. The MP tree inferred by PAUP based on the third codon positions of 38 mt genes of 107 species. Numbers on branches are bootstrap values. FIG. S14. Comparison of phylogenetic relationships of major angiosperm lineages between mt genomes (based on the nt data) and Angiosperm Phylogeny Group (APG) IV (APG IV, 2016). FIG. S15. Comparison of eudicots phylogenetic relationships between mt genomes (based on the nt data) and APG IV (APG IV, 2016). FIG. S16. Phylogenetic tree with concordance factors annotated. Three numbers beside each node represent bootstrap value, gCF value, and sCF value. TABLE S1. Eighteen topologies regarding the relationships of the five mesangiosperm lineages. TABLE S2. List of 107 taxa sampled for the mitochondrial genomic data set in this study. TABLE S3. Partition finder results for the concatenated nucleotide data set. TABLE S4. Characteristics of 38 mitochondrial genes, including the number of taxa sampled in the data matrix, the number of total aligned characters, the percentage of gaps or missing data, and variable and informative sites. TABLE S5. Nucleotide GC compositional contents and codon-usage bias of all mt protein-coding genes in 107 taxa. TABLE S6. Nucleotide GC content and codon-usage bias of 38 mt genes, 79 pt genes, and 59 nuclear genes in 14 taxa. TABLE S7. Genes with erroneous placings or extremely long branches in single-gene trees, or not being able to align to other taxa.
SUPPLEMENTARY MATERIAL 2 : TABLE S8. Information of assembled mitochondrial contigs.