Gene duplicability of core genes is highly consistent across all angiosperms

Gene duplication is an important mechanism for adding to genomic novelty. Hence, which genes undergo duplication and are preserved following duplication is an important question. It has been observed that gene duplicability, or the ability of genes to be retained following duplication, is a nonrandom process, with certain genes being more amenable to survive duplication events than others. Primarily, gene essentiality and the type of duplication (small-scale versus large-scale) have been shown in different species to influence the (long-term) survival of novel genes. However, an overarching view of "gene duplicability" is lacking, mainly due to the fact that previous studies usually focused on individual species and did not account for the influence of genomic context and the time of duplication. Here, we present a large-scale study in which we investigated duplicate retention for 9178 gene families shared between 37 flowering plant species, referred to as angiosperm core gene families. For most gene families, we observe a strikingly consistent pattern of gene duplicability across species, with gene families being either primarily single-copy or multicopy in all species. An intermediate class contains gene families that are often retained in duplicate for periods extending to tens of millions of years after whole-genome duplication, but ultimately appear to be largely restored to singleton status, suggesting that these genes may be dosage balance sensitive. The distinction between single-copy and multicopy gene families is reflected in their functional annotation, with single-copy genes being mainly involved in the maintenance of genome stability and organelle function and multicopy genes in signaling, transport, and metabolism. The intermediate class was overrepresented in regulatory genes, further suggesting that these represent putative dosage-balance-sensitive genes

Description

SUPPLEMENTAL DATA: SUPPLEMENTAL FIGURE 1. Motivation for the 32 out of 37 species cut-off to define core gene families. SUPPLEMENTAL FIGURE 2. The distribution of single-copy percentages (SCPs) for all core gene families, with SCPs calculated upon removing the highly duplicated genomes of Glycine max, Linum usitatissimum, Brassica rapa, and Zea mays. SUPPLEMENTAL FIGURE 3. Classification of species tree nodes as SSD or WGD. SUPPLEMENTAL FIGURE 4. Core gene families mainly duplicate through WGD. SUPPLEMENTAL FIGURE 5. Comparison of the number of duplications for core and noncore gene families at WGD and SSD nodes on a gene family base. SUPPLEMENTAL FIGURE 6. Ks distributions of duplicated pairs from core and noncore gene families in 12 species. SUPPLEMENTAL FIGURE 7. Duplicate gene retention in function of time since WGD. SUPPLEMENTAL FIGURE 8. Criteria that we used to choose the optimal number of clusters for k-means clustering of the copy-number matrix. SUPPLEMENTAL FIGURE 9. Consensus matrices obtained for different number of clusters k. SUPPLEMENTAL FIGURE 10. Polar diagrams depicting the fraction of duplication events in each gene family group belonging to either the “recent,” “K-Pg boundary,” “ancient,” or “SSD” duplication classes. SUPPLEMENTAL FIGURE 11. Over- and underrepresentation of an independent set of 2090 nuclear-encoded chloroplast-targeted genes obtained from The Chloroplast Function Database. SUPPLEMENTAL FIGURE 12. Over- and underrepresentation of an independent set of 1795 putative transcription factors. SUPPLEMENTAL FIGURE 13. Mapping of the whole-genome duplications and triplications on the species tree. SUPPLEMENTAL FIGURE 14. Conflicting clades between the species tree used in this paper and which we inferred from 107 core gene families and the APGIII tree. SUPPLEMENTAL FIGURE 15. Explanation of how duplications were inferred for gene families with at least two species but no more than three genes or gene families that are only present in one species. SUPPLEMENTAL FIGURE 16. The change in the total number of predicted duplication events in core gene families in function of the threshold on the duplication consistency score. SUPPLEMENTAL FIGURE 17. Gaussian mixture models were fit to the Ks distribution of each species. SUPPLEMENTAL FIGURE 18. Comparison of power-law fit and exponential fit to the data obtained from the Gaussian Mixture Modeling of Ksbased age distributions. SUPPLEMENTAL TABLE 1. Comparison of the numbers of interacting protein pairs in each group to those obtained from randomized networks. SUPPLEMENTAL TABLE 2. Description of all identified peaks inferred from the Ks-based age distributions. SUPPLEMENTAL TABLE 3. Comparison of the power-law and the exponential fit. SUPPLEMENTAL DATA SET 1. Concatenated multiple sequence alignment for 107 genes to reconstruct the species tree. SUPPLEMENTAL DATA SET 2. Data source and accession numbers of 107 genes used for reconstruction of the species tree.

Keywords

Gene duplication, Nonrandom process, Essentiality, Angiosperms

Citation

Li, Z, Defoort, J, Tasdighian, S, Maere, S, Van de Peer, Y & De Smet, R 2016, 'Gene duplicability of core genes is highly consistent across all angiosperms', Plant Cell, vol. 28, no. 2, pp. 326-344.

URI

http://hdl.handle.net/2263/53093

Collections

Research Articles (Genetics)
Research Articles (University of Pretoria)

Full item page

Gene duplicability of core genes is highly consistent across all angiosperms

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Sustainable Development Goals

Citation

URI

Collections