Gene duplicability of core genes is highly consistent across all angiosperms

Show simple item record

dc.contributor.author Li, Zhen
dc.contributor.author Defoort, Jonas
dc.contributor.author Tasdighian, Setareh
dc.contributor.author Maere, Steven
dc.contributor.author Van de Peer, Yves
dc.contributor.author De Smet, Riet
dc.date.accessioned 2016-06-14T06:02:34Z
dc.date.available 2016-06-14T06:02:34Z
dc.date.issued 2016-02-07
dc.description SUPPLEMENTAL DATA: SUPPLEMENTAL FIGURE 1. Motivation for the 32 out of 37 species cut-off to define core gene families. SUPPLEMENTAL FIGURE 2. The distribution of single-copy percentages (SCPs) for all core gene families, with SCPs calculated upon removing the highly duplicated genomes of Glycine max, Linum usitatissimum, Brassica rapa, and Zea mays. SUPPLEMENTAL FIGURE 3. Classification of species tree nodes as SSD or WGD. SUPPLEMENTAL FIGURE 4. Core gene families mainly duplicate through WGD. SUPPLEMENTAL FIGURE 5. Comparison of the number of duplications for core and noncore gene families at WGD and SSD nodes on a gene family base. SUPPLEMENTAL FIGURE 6. Ks distributions of duplicated pairs from core and noncore gene families in 12 species. SUPPLEMENTAL FIGURE 7. Duplicate gene retention in function of time since WGD. SUPPLEMENTAL FIGURE 8. Criteria that we used to choose the optimal number of clusters for k-means clustering of the copy-number matrix. SUPPLEMENTAL FIGURE 9. Consensus matrices obtained for different number of clusters k. SUPPLEMENTAL FIGURE 10. Polar diagrams depicting the fraction of duplication events in each gene family group belonging to either the “recent,” “K-Pg boundary,” “ancient,” or “SSD” duplication classes. SUPPLEMENTAL FIGURE 11. Over- and underrepresentation of an independent set of 2090 nuclear-encoded chloroplast-targeted genes obtained from The Chloroplast Function Database. SUPPLEMENTAL FIGURE 12. Over- and underrepresentation of an independent set of 1795 putative transcription factors. SUPPLEMENTAL FIGURE 13. Mapping of the whole-genome duplications and triplications on the species tree. SUPPLEMENTAL FIGURE 14. Conflicting clades between the species tree used in this paper and which we inferred from 107 core gene families and the APGIII tree. SUPPLEMENTAL FIGURE 15. Explanation of how duplications were inferred for gene families with at least two species but no more than three genes or gene families that are only present in one species. SUPPLEMENTAL FIGURE 16. The change in the total number of predicted duplication events in core gene families in function of the threshold on the duplication consistency score. SUPPLEMENTAL FIGURE 17. Gaussian mixture models were fit to the Ks distribution of each species. SUPPLEMENTAL FIGURE 18. Comparison of power-law fit and exponential fit to the data obtained from the Gaussian Mixture Modeling of Ksbased age distributions. SUPPLEMENTAL TABLE 1. Comparison of the numbers of interacting protein pairs in each group to those obtained from randomized networks. SUPPLEMENTAL TABLE 2. Description of all identified peaks inferred from the Ks-based age distributions. SUPPLEMENTAL TABLE 3. Comparison of the power-law and the exponential fit. SUPPLEMENTAL DATA SET 1. Concatenated multiple sequence alignment for 107 genes to reconstruct the species tree. SUPPLEMENTAL DATA SET 2. Data source and accession numbers of 107 genes used for reconstruction of the species tree. en_ZA
dc.description.abstract Gene duplication is an important mechanism for adding to genomic novelty. Hence, which genes undergo duplication and are preserved following duplication is an important question. It has been observed that gene duplicability, or the ability of genes to be retained following duplication, is a nonrandom process, with certain genes being more amenable to survive duplication events than others. Primarily, gene essentiality and the type of duplication (small-scale versus large-scale) have been shown in different species to influence the (long-term) survival of novel genes. However, an overarching view of "gene duplicability" is lacking, mainly due to the fact that previous studies usually focused on individual species and did not account for the influence of genomic context and the time of duplication. Here, we present a large-scale study in which we investigated duplicate retention for 9178 gene families shared between 37 flowering plant species, referred to as angiosperm core gene families. For most gene families, we observe a strikingly consistent pattern of gene duplicability across species, with gene families being either primarily single-copy or multicopy in all species. An intermediate class contains gene families that are often retained in duplicate for periods extending to tens of millions of years after whole-genome duplication, but ultimately appear to be largely restored to singleton status, suggesting that these genes may be dosage balance sensitive. The distinction between single-copy and multicopy gene families is reflected in their functional annotation, with single-copy genes being mainly involved in the maintenance of genome stability and organelle function and multicopy genes in signaling, transport, and metabolism. The intermediate class was overrepresented in regulatory genes, further suggesting that these represent putative dosage-balance-sensitive genes en_ZA
dc.description.department Genetics en_ZA
dc.description.librarian am2016 en_ZA
dc.description.sponsorship Y.V.d.P acknowledges the Multidisciplinary Research Partnership “Bioinformatics: from nucleotides to networks” Project (01MR0310W) of Ghent University and the European Union Seventh Framework Programme( FP7/2007-2013) under European Research Council Advanced Grant Agreement 322739-DOUBLE-UP. This project is supported by The Research Foundation-Flanders (FWO) (G008812N). en_ZA
dc.description.uri http://www.plantcell.org en_ZA
dc.identifier.citation Li, Z, Defoort, J, Tasdighian, S, Maere, S, Van de Peer, Y & De Smet, R 2016, 'Gene duplicability of core genes is highly consistent across all angiosperms', Plant Cell, vol. 28, no. 2, pp. 326-344. en_ZA
dc.identifier.issn 1040-4651 (print)
dc.identifier.issn 1532-298X (online)
dc.identifier.other 10.1105/tpc.15.00877
dc.identifier.uri http://hdl.handle.net/2263/53093
dc.language.iso en en_ZA
dc.publisher American Society of Plant Biologists en_ZA
dc.rights © 2016 American Society of Plant Biologists. All Rights Reserved. en_ZA
dc.subject Gene duplication en_ZA
dc.subject Nonrandom process en_ZA
dc.subject Essentiality en_ZA
dc.subject Angiosperms en_ZA
dc.title Gene duplicability of core genes is highly consistent across all angiosperms en_ZA
dc.type Article en_ZA


Files in this item

This item appears in the following Collection(s)

Show simple item record