Abstract:
BACKGROUND : Lack of HLA data in southern African populations hampers disease
association studies and our understanding of genetic diversity in these populations. We
aimed to determine HLA diversity in South African populations using high resolution HLA ~A,
~B, ~C, ~DRB1, ~DQA1 and ~DQB1 data, from 3005 previously typed individuals.
METHODS : We determined allele and haplotype frequencies, deviations from HardyWeinberg equilibrium (HWE), linkage disequilibrium (LD) and neutrality test. South African
HLA class I data was additionally compared to other global populations using non-metrical
multidimensional scaling (NMDS), genetic distances and principal component analysis (PCA).
RESULTS : All loci strongly (p < 0.0001) deviated from HWE, coupled with excessive
heterozygosity in most loci. Two of the three most frequent alleles, HLA ~DQA1*05:02
(0.2584) and HLA ~C*17:01 (0.1488) were previously reported in South African
populations at lower frequencies. NMDS showed genetic distinctness of South African
populations. Phylogenetic analysis and PCA clustered our current dataset with previous
South African studies. Additionally, South Africans seem to be related to other subSaharan populations using HLA class I allele frequencies.
DISSCUSSION AND CONCLUSION : Despite the retrospective nature of the study, data
missingness, the imbalance of sample sizes for each locus and haplotype pairs, and
induced methodological difficulties, this study provides a unique and large HLA dataset of
South Africans, which might be a useful resource to support anthropological studies,
disease association studies, population based vaccine development and donor
recruitment programs. We additionally provide simulated high resolution HLA class I
data to augment the mixed resolution typing results generated from this study.