Abstract:
BACKGROUND : The expanding world population is expected to double the worldwide demand for food by 2050. Eighty-eight
percent of countries currently face a serious burden of malnutrition, especially in Africa and south and southeast Asia.
About 95% of the food energy needs of humans are fulfilled by just 30 species, of which wheat, maize, and rice provide the
majority of calories. Therefore, to diversify and stabilize the global food supply, enhance agricultural productivity, and
tackle malnutrition, greater use of neglected or underutilized local plants (so-called orphan crops, but also including a few plants of special significance to agriculture, agroforestry, and nutrition) could be a partial solution. Results: Here, we
present draft genome information for five agriculturally, biologically, medicinally, and economically important
underutilized plants native to Africa: Vigna subterranea, Lablab purpureus, Faidherbia albida, Sclerocarya birrea, and Moringa
oleifera. Assembled genomes range in size from 217 to 654 Mb. In V. subterranea, L. purpureus, F. albida, S. birrea, and M. oleifera,
we have predicted 31,707, 20,946, 28,979, 18,937, and 18,451 protein-coding genes, respectively. By further analyzing the
expansion and contraction of selected gene families, we have characterized root nodule symbiosis genes, transcription
factors, and starch biosynthesis-related genes in these genomes. Conclusions: These genome data will be useful to identify
and characterize agronomically important genes and understand their modes of action, enabling genomics-based,
evolutionary studies, and breeding strategies to design faster, more focused, and predictable crop improvement programs.
Description:
Availability of supporting data:
The raw data from our genome project was deposited in the NCBI Sequence Read Archive database with Bioproject IDs PRJNA453822 and PRJNA474418. Assembly and annotation of the five genomes and other supporting data, including BUSCO results, are available in the GigaDB repository [85], and the data reported in this study are also available in the CNGB Nucleotide Sequence Archive (CNSA: https://db.cngb.org/cnsa; accession number CNP0000096). All genome annotations described here are also available at http://bioinformatics.psb.ugent.be/orcae/AOCC.
Additional files
Additional files
Figure S1: K-mer (K = 17) analysis of five genomes.
Figure S2: Distribution of sequencing depths of the assembly data.
Figure S3: The GC content.
Figure S4: Comparison of GC content across closely related species.
Figure S5: Statistics of gene models in Vigna subterranea, Lablab purpureus, Faidherbia albida, Moringa oleifera and Sclerocarya birrea.
Figure S6: Expansion and contraction of gene families.
Table S1: Statistics of the raw and clean data of DNA sequencing.
Table S2: Summary statistics of the transcriptome data in four species.
Table S3: Estimation of genome size based on k-mer statistics in five species.
Table S4: BUSCO evaluation of the annotated protein-coding genes in five species.
Table S5: Analysis of gene families of different species.
Table S6: Enriched pathways of unique paralogs genes in families.
Table S7: Enriched GO terms (level 3) of unique paralogs genes in families.
Table S8: Enriched GO terms (level 3) of genes in families with expansion.
Table S9: Enriched pathways of genes in families with expansion.
Table S10: The copy numbers of protein biosynthesis-related genes in each species.
Table S11: The copy numbers of starch biosynthesis-related genes in each species.
Table S12: The copy numbers of fatty acid synthesis and storage-related genes in each species.
Table S13: The copy numbers of fatty acid degradation-related genes in each species.
Table S14: Numbers of transcription factors in the studied species.