Whole-genome sequencing for an enhanced understanding of genetic variation among South Africans

Loading...
Thumbnail Image

Authors

Choudhury, Ananyo
Ramsay, Michele
Hazelhurst, Scott
Aron, Shaun
Bardien, Soraya
Botha, Gerrit
Chimusa, Emile R.
Christoffels, Alan
Gamieldien, Junaid
Sefid-Dashti, Mahjoubeh J.

Journal Title

Journal ISSN

Volume Title

Publisher

Nature Publishing Group

Abstract

The Southern African Human Genome Programme is a national initiative that aspires to unlock the unique genetic character of southern African populations for a better understanding of human genetic diversity. In this pilot study the Southern African Human Genome Programme characterizes the genomes of 24 individuals (8 Coloured and 16 black southeastern Bantu-speakers) using deep whole-genome sequencing. A total of ~16 million unique variants are identified. Despite the shallow time depth since divergence between the two main southeastern Bantu-speaking groups (Nguni and Sotho-Tswana), principal component analysis and structure analysis reveal significant (p < 10−6) differentiation, and FST analysis identifies regions with high divergence. The Coloured individuals show evidence of varying proportions of admixture with Khoesan, Bantu-speakers, Europeans, and populations from the Indian sub-continent. Whole-genome sequencing data reveal extensive genomic diversity, increasing our understanding of the complex and region-specific history of African populations and highlighting its potential impact on biomedical research and genetic susceptibility to disease.

Description

M.R. and M.S.P. co-lead the SAHGP initiative, and the project was designed and coordinated by the core working group including M.R., M.S.P., S.B., H.S., R.R., J.R., K.S., P.V., N.M., F.J., S.H., and L.V. M.R. and H.S. obtained ethics approval for the study. The data analysis team was led by S.H. (PCA; STRUCTURE and Y chromosome analysis) and included A.C. (novel SNV characterization, LOF variant, f2, FST, SFS, and ROH analysis), N.M. (functional analysis), F.J. (variant calling), S.A. (variant calling), G.B. (functional annotation and data curation), E.R.C. (admixture), J.G. (functional annotation), M.J.S.D. (functional annotation), A.M. (functional annotation, SNV characterization, data curation, and mtDNA analysis), and D.S. (regional FST analysis, data visualization). All authors wrote the Methods section and notes on their analyses. M.R. and A.C. drafted the manuscript, and A.C. was responsible for coordinating Tables and Figures (including the Supplement).

Keywords

Whole-genome sequencing (WGS), Genetic variation, Human genetic diversity, Southern African population

Sustainable Development Goals

Citation

Choudhury, A., Ramsay, M., Hazelhurst, S. et al. 2017, 'Whole-genome sequencing for an enhanced understanding of genetic variation among South Africans', Nature Communications, vol. 8, pp. 1-12.