ADDITIONAL FILE 1: TABLE S1.
Populations with whole-genome sequence data. Number of animals, with whole-genome sequence data per breed/population used in this study, indicating sources and/or project accession numbers. TABLE S2. Accession numbers of the Boran samples used in this study, which are publicly available.
ADDITIONAL FILE 2: FIGURE S1.
Variant Quality Score Recalibration for WGS variants using GATK: tranches plot (a) and specificity versus tranche truth sensitivity (b). Quality metrics of the WGS data. Tranche-specific TP are true-positive calls gained when adding a slice to the plate. Cumulative TP are true-positive calls contained in all the slices already added. Thus, this differentiation allows to evaluate how many more TP are gained vs. the additional false positives (FP) that have to be taken on, when going to the next tranche up. The ratio of transition (Ti) to transversion (Tv) SNPs (i.e., Ti/Tv ratio) is a useful diagnostic tool to measure the quality of the WGS data generated. A high Ti/Tv ratio (> 2.0) often indicates a high-accuracy SNP set, whereas a low value (~ 0.5) implies low-quality SNP calling.
ADDITIONAL FILE 3: FIGURE S2.
Principal component plot of the individuals included in the WGS data. Plot for principal component (PC) 1 and PC2 as well as PC1 and PC3 for the 289 distinct individuals included in the WGS data. The data spanned a diverse range of breeds and geographic locations (55 populations, among which 13 European, 12 African, 28 Asian, and 2 Middle Eastern). Coloured by population and location [34].
ADDITIONAL FILE 4: TABLE S3.
Populations represented in the combined HD data. Number of animals genotyped with the Illumina HD array per breed/population used in this study, and data source.
ADDITIONAL FILE 5: TABLE S4.
Coordinates (chromosome and position in bp) of the variants retained after lift-over on both UMD3.1 and ARS-UCD1.2 assemblies. Variants retained after combining and lifting-over the different HD array data. Coordinates (chromosome and position in bp) on both UMD3.1 and ARS-UCD1.2 assemblies are provided.
ADDITIONAL FILE 6: FIGURE S3.
Comparison between imputation accuracies (ER2) when phasing was done with either BEAGLE or SHAPEIT4. Comparison between imputation accuracies (ER2, as estimated in Minimac4) when phasing was done with either BEAGLE or SHAPEIT4. Since the imputation accuracies were similar, BEAGLE phased data were used for all subsequent analyses.
ADDITIONAL FILE 7: FIGURE S4.
Linkage disequilibrium (r2) decay in European and African cattle breeds. Comparison of linkage disequilibrium decay in taurine (both European and African) and African indicine breeds.
ADDITIONAL FILE 8: FIGURE S5.
Imputation accuracy (ER2) with the leave-one-out cross-validation, using 100 of the 289 animals from the WGS data, for all bovine genotyping arrays considered. In this procedure, one individual was removed from the reference panel and its genotypes imputed using the remaining animals as the reference panel. This was repeated for each of the 100 animals, randomly selected from the WGS data and for each array. The results are presented only for 16 arrays (i.e., those retaining more than 10,000 variants after QC), for which imputation was successful. The number of variants retained from the WGS data for each array is between brackets.
ADDITIONAL FILE 9: FIGURE S6.
Imputation accuracies ER2 (A) and dosage R2 (B) for the BOS1 array when using the Global Reference Panel and four target sets (i.e., Global, African, Asian and European). The target sets were created by retaining only the WGS genotypes that overlapped with the variants of the BOS1 array, from the Global Reference Panel as well as its subsets, generated according to the continent of origin (African (87 individuals), Asian (106 individuals) and European (77 individuals) subsets). These target sets (i.e. Global, African, Asian, and European) were then used to impute to WGS level using the Global Reference Panel. The results for the HD array when using the Global target set are also reported. FIGURE S7. Imputation accuracies ER2 (A) and dosage R2 (B) for the GGPHDV3 array when using the Global Reference Panel and four target sets (i.e., Global, African, Asian and European). The target sets were created by retaining only the WGS genotypes that overlapped with the variants of the GGPHDV3 array, from the Global Reference Panel as well as its subsets, generated according to the continent of origin (African (87 individuals), Asian (106 individuals) and European (77 individuals) subsets). These target sets (i.e. Global, African, Asian, and European) were then used to impute to WGS level using the Global Reference Panel. The results for the HD array when using the Global target set are also reported. FIGURE S8. Imputation accuracies ER2 (A) and dosage R2 (B) for the GGPF250 array when using the Global Reference Panel and four target sets (i.e., Global, African, Asian and European). The target sets were created by retaining only the WGS genotypes that overlapped with the variants of the GGPF250 array, from the Global Reference Panel as well as its subsets, generated according to the continent of origin (African (87 individuals), Asian (106 individuals) and European (77 individuals) subsets). These target sets (i.e. Global, African, Asian, and European) were then used to impute to WGS level using the Global Reference Panel. The results for the HD array when using the Global target set are also reported. FIGURE S9. Imputation accuracies ER2 (A) and dosage R2 (B) for the IND90KH array when using the Global Reference Panel and four target sets (i.e., Global, African, Asian and European). The target sets were created by retaining only the WGS genotypes that overlapped with the variants of the IND90KH array, from the Global Reference Panel as well as its subsets, generated according to the continent of origin (African (87 individuals), Asian (106 individuals) and European (77 individuals) subsets). These target sets (i.e. Global, African, Asian, and European) were then used to impute to WGS level using the Global Reference Panel. The results for the HD array when using the Global target set are also reported. FIGURE S10. Imputation accuracies ER2 (A) and dosage R2 (B) for the GGP90KT array when using the Global Reference Panel and four target sets (i.e., Global, African, Asian and European). The target sets were created by retaining only the WGS genotypes that overlapped with the variants of the GGP90KT array, from the Global Reference Panel as well as its subsets, generated according to the continent of origin (African (87 individuals), Asian (106 individuals) and European (77 individuals) subsets). These target sets (i.e. Global, African, Asian, and European) were then used to impute to WGS level using the Global Reference Panel. The results for the HD array when using the Global target set are also reported. FIGURE S11. Imputation accuracies ER2 (A) and dosage R2 (B) for the ZMD2 array when using the Global Reference Panel and four target sets (i.e., Global, African, Asian and European). The target sets were created by retaining only the WGS genotypes that overlapped with the variants of the ZMD2 array, from the Global Reference Panel as well as its subsets, generated according to the continent of origin (African (87 individuals), Asian (106 individuals) and European (77 individuals) subsets). These target sets (i.e. Global, African, Asian, and European) were then used to impute to WGS level using the Global Reference Panel. The results for the HD array when using the Global target set are also reported. FIGURE S12. Imputation accuracies ER2 (A) and dosage R2 (B) for the ZOETIS1 array when using the Global Reference Panel and four target sets (i.e., Global, African, Asian and European). The target sets were created by retaining only the WGS genotypes that overlapped with the variants of the ZOETIS1 array, from the Global Reference Panel as well as its subsets, generated according to the continent of origin (African (87 individuals), Asian (106 individuals) and European (77 individuals) subsets). These target sets (i.e. Global, African, Asian, and European) were then used to impute to WGS level using the Global Reference Panel. The results for the HD array when using the Global target set are also reported. FIGURE S13. Imputation accuracies ER2 (A) and dosage R2 (B) for the BOVMD array when using the Global Reference Panel and four target sets (i.e., Global, African, Asian and European). The target sets were created by retaining only the WGS genotypes that overlapped with the variants of the BOVMD array, from the Global Reference Panel as well as its subsets, generated according to the continent of origin (African (87 individuals), Asian (106 individuals) and European (77 individuals) subsets). These target sets (i.e. Global, African, Asian, and European) were then used to impute to WGS level using the Global Reference Panel. The results for the HD array when using the Global target set are also reported. FIGURE S14. Imputation accuracies ER2 (A) and dosage R2 (B) for the IDBV3 array when using the Global Reference Panel and four target sets (i.e., Global, African, Asian and European). The target sets were created by retaining only the WGS genotypes that overlapped with the variants of the IDBV3 array, from the Global Reference Panel as well as its subsets, generated according to the continent of origin (African (87 individuals), Asian (106 individuals) and European (77 individuals) subsets). These target sets (i.e. Global, African, Asian, and European) were then used to impute to WGS level using the Global Reference Panel. The results for the HD array when using the Global target set are also reported. FIGURE S15. Imputation accuracies ER2 (A) and dosage R2 (B) for the SNP50V3 array when using the Global Reference Panel and four target sets (i.e., Global, African, Asian and European). The target sets were created by retaining only the WGS genotypes that overlapped with the variants of the SNP50V3 array, from the Global Reference Panel as well as its subsets, generated according to the continent of origin (African (87 individuals), Asian (106 individuals) and European (77 individuals) subsets). These target sets (i.e. Global, African, Asian, and European) were then used to impute to WGS level using the Global Reference Panel. The results for the HD array when using the Global target set are also reported. FIGURE S16. Imputation accuracies ER2 (A) and dosage R2 (B) for the ANGGS array when using the Global Reference Panel and four target sets (i.e., Global, African, Asian and European). The target sets were created by retaining only the WGS genotypes that overlapped with the variants of the ANGGS array, from the Global Reference Panel as well as its subsets, generated according to the continent of origin (African (87 individuals), Asian (106 individuals) and European (77 individuals) subsets). These target sets (i.e. Global, African, Asian, and European) were then used to impute to WGS level using the Global Reference Panel. The results for the HD array when using the Global target set are also reported. FIGURE S17. Imputation accuracies ER2 (A) and dosage R2 (B) for the BOVG50V1 array when using the Global Reference Panel and four target sets (i.e., Global, African, Asian and European). The target sets were created by retaining only the WGS genotypes that overlapped with the variants of the BOVG50V1 array, from the Global Reference Panel as well as its subsets, generated according to the continent of origin (African (87 individuals), Asian (106 individuals) and European (77 individuals) subsets). These target sets (i.e. Global, African, Asian, and European) were then used to impute to WGS level using the Global Reference Panel. The results for the HD array when using the Global target set are also reported. FIGURE S18. Imputation accuracies ER2 (A) and dosage R2 (B) for the GGPIND35 array when using the Global Reference Panel and four target sets (i.e., Global, African, Asian and European). The target sets were created by retaining only the WGS genotypes that overlapped with the variants of the GGPIND35 array, from the Global Reference Panel as well as its subsets, generated according to the continent of origin (African (87 individuals), Asian (106 individuals) and European (77 individuals) subsets). These target sets (i.e. Global, African, Asian, and European) were then used to impute to WGS level using the Global Reference Panel. The results for the HD array when using the Global target set are also reported. FIGURE S19. Imputation accuracies ER2 (A) and dosage R2 (B) for the GGPLDV4 array when using the Global Reference Panel and four target sets (i.e., Global, African, Asian and European). The target sets were created by retaining only the WGS genotypes that overlapped with the variants of the GGPLDV4 array, from the Global Reference Panel as well as its subsets, generated according to the continent of origin (African (87 individuals), Asian (106 individuals) and European (77 individuals) subsets). These target sets (i.e. Global, African, Asian, and European) were then used to impute to WGS level using the Global Reference Panel. The results for the HD array when using the Global target set are also reported. FIGURE S20. Imputation accuracies ER2 (A) and dosage R2 (B) for the GGPLDV3 array when using the Global Reference Panel and four target sets (i.e., Global, African, Asian and European). The target sets were created by retaining only the WGS genotypes that overlapped with the variants of the GGPLDV3 array, from the Global Reference Panel as well as its subsets, generated according to the continent of origin (African (87 individuals), Asian (106 individuals) and European (77 individuals) subsets). These target sets (i.e. Global, African, Asian, and European) were then used to impute to WGS level using the Global Reference Panel. The results for the HD array when using the Global target set are also reported.
ADDITIONAL FILE 10: FIGURE S21. Dosage R2 for all imputed variants and for functional variants for the BOS1 array when using the Global Reference Panel and four target sets (i.e., Global, African, Asian and European). Dosage R2 for all imputed variants and for functional variants (as annotated by the Ensembl VEP software (LOW, MODERATE and HIGH)) when imputing from the BOS1 array to WGS level. The target sets were created by retaining only the WGS genotypes that overlapped with the variants of the BOS1 array, from the Global Reference Panel as well as its subsets, generated according to the continent of origin (African (87 individuals), Asian (106 individuals) and European (77 individuals) subsets). These target sets (i.e. Global, African, Asian, and European) were then used to impute to WGS level using the Global Reference Panel. FIGURE S22. Dosage R2 for all imputed variants and for functional variants for the GGPF250 array when using the Global Reference Panel and four target sets (i.e., Global, African, Asian and European). Dosage R2 for all imputed variants and for functional variants (as annotated by the Ensembl VEP software (LOW, MODERATE and HIGH)) when imputing from the GGPF250 array to WGS level. The target sets were created by retaining only the WGS genotypes that overlapped with the variants of the GGPF250 array, from the Global Reference Panel as well as its subsets, generated according to the continent of origin (African (87 individuals), Asian (106 individuals) and European (77 individuals) subsets). These target sets (i.e. Global, African, Asian, and European) were then used to impute to WGS level using the Global Reference Panel. FIGURE S23. Dosage R2 for all imputed variants and for functional variants for the GGPHDV3 array when using the Global Reference Panel and four target sets (i.e., Global, African, Asian and European). Dosage R2 for all imputed variants and for functional variants (as annotated by the Ensembl VEP software (LOW, MODERATE and HIGH)) when imputing from the GGPHDV3 array to WGS level. The target sets were created by retaining only the WGS genotypes that overlapped with the variants of the GGPHDV3 array, from the Global Reference Panel as well as its subsets, generated according to the continent of origin (African (87 individuals), Asian (106 individuals) and European (77 individuals) subsets). These target sets (i.e. Global, African, Asian, and European) were then used to impute to WGS level using the Global Reference Panel. FIGURE S24. Dosage R2 for all imputed variants and for functional variants for the GGP90KT array when using the Global Reference Panel and four target sets (i.e., Global, African, Asian and European). Dosage R2 for all imputed variants and for functional variants (as annotated by the Ensembl VEP software (LOW, MODERATE and HIGH)) when imputing from the GGP90KT array to WGS level. The target sets were created by retaining only the WGS genotypes that overlapped with the variants of the GGP90KT array, from the Global Reference Panel as well as its subsets, generated according to the continent of origin (African (87 individuals), Asian (106 individuals) and European (77 individuals) subsets). These target sets (i.e. Global, African, Asian, and European) were then used to impute to WGS level using the Global Reference Panel. FIGURE S25. Dosage R2 for all imputed variants and for functional variants for the IND90KH array when using the Global Reference Panel and four target sets (i.e., Global, African, Asian and European). Dosage R2 for all imputed variants and for functional variants (as annotated by the Ensembl VEP software (LOW, MODERATE and HIGH)) when imputing from the IND90KH array to WGS level. The target sets were created by retaining only the WGS genotypes that overlapped with the variants of the IND90KH array, from the Global Reference Panel as well as its subsets, generated according to the continent of origin (African (87 individuals), Asian (106 individuals) and European (77 individuals) subsets). These target sets (i.e. Global, African, Asian, and European) were then used to impute to WGS level using the Global Reference Panel. FIGURE S26. Dosage R2 for all imputed variants and for functional variants for the ZMD2 array when using the Global Reference Panel and four target sets (i.e., Global, African, Asian and European). Dosage R2 for all imputed variants and for functional variants (as annotated by the Ensembl VEP software (LOW, MODERATE and HIGH)) when imputing from the ZMD2 array to WGS level. The target sets were created by retaining only the WGS genotypes that overlapped with the variants of the ZMD2 array, from the Global Reference Panel as well as its subsets, generated according to the continent of origin (African (87 individuals), Asian (106 individuals) and European (77 individuals) subsets). These target sets (i.e. Global, African, Asian, and European) were then used to impute to WGS level using the Global Reference Panel. FIGURE S27. Dosage R2 for all imputed variants and for functional variants for the ZOETIS1 array when using the Global Reference Panel and four target sets (i.e., Global, African, Asian and European). Dosage R2 for all imputed variants and for functional variants (as annotated by the Ensembl VEP software (LOW, MODERATE and HIGH)) when imputing from the ZOETIS1 array to WGS level. The target sets were created by retaining only the WGS genotypes that overlapped with the variants of the ZOETIS1 array, from the Global Reference Panel as well as its subsets, generated according to the continent of origin (African (87 individuals), Asian (106 individuals) and European (77 individuals) subsets). These target sets (i.e. Global, African, Asian, and European) were then used to impute to WGS level using the Global Reference Panel. FIGURE S28. Dosage R2 for all imputed variants and for functional variants for the BOVMD array when using the Global Reference Panel and four target sets (i.e., Global, African, Asian and European). Dosage R2 for all imputed variants and for functional variants (as annotated by the Ensembl VEP software (LOW, MODERATE and HIGH)) when imputing from the BOVMD array to WGS level. The target sets were created by retaining only the WGS genotypes that overlapped with the variants of the BOVMD array, from the Global Reference Panel as well as its subsets, generated according to the continent of origin (African (87 individuals), Asian (106 individuals) and European (77 individuals) subsets). These target sets (i.e. Global, African, Asian, and European) were then used to impute to WGS level using the Global Reference Panel. FIGURE S29. Dosage R2 for all imputed variants and for functional variants for the IDBV3 array when using the Global Reference Panel and four target sets (i.e., Global, African, Asian and European). Dosage R2 for all imputed variants and for functional variants (as annotated by the Ensembl VEP software (LOW, MODERATE and HIGH)) when imputing from the IDBV3 array to WGS level. The target sets were created by retaining only the WGS genotypes that overlapped with the variants of the IDBV3 array, from the Global Reference Panel as well as its subsets, generated according to the continent of origin (African (87 individuals), Asian (106 individuals) and European (77 individuals) subsets). These target sets (i.e. Global, African, Asian, and European) were then used to impute to WGS level using the Global Reference Panel. FIGURE S30. Dosage R2 for all imputed variants and for functional variants for the SNP50V3 array when using the Global Reference Panel and four target sets (i.e., Global, African, Asian and European). Dosage R2 for all imputed variants and for functional variants (as annotated by the Ensembl VEP software (LOW, MODERATE and HIGH)) when imputing from the SNP50V3 array to WGS level. The target sets were created by retaining only the WGS genotypes that overlapped with the variants of the SNP50V3 array, from the Global Reference Panel as well as its subsets, generated according to the continent of origin (African (87 individuals), Asian (106 individuals) and European (77 individuals) subsets). These target sets (i.e. Global, African, Asian, and European) were then used to impute to WGS level using the Global Reference Panel. FIGURE S31. Dosage R2 for all imputed variants and for functional variants for the ANGGS array when using the Global Reference Panel and four target sets (i.e., Global, African, Asian and European). Dosage R2 for all imputed variants and for functional variants (as annotated by the Ensembl VEP software (LOW, MODERATE and HIGH)) when imputing from the ANGGS array to WGS level. The target sets were created by retaining only the WGS genotypes that overlapped with the variants of the ANGGS array, from the Global Reference Panel as well as its subsets, generated according to the continent of origin (African (87 individuals), Asian (106 individuals) and European (77 individuals) subsets). These target sets (i.e. Global, African, Asian, and European) were then used to impute to WGS level using the Global Reference Panel. FIGURE S32. Dosage R2 for all imputed variants and for functional variants for the BOVG50V1 array when using the Global Reference Panel and four target sets (i.e., Global, African, Asian and European). Dosage R2 for all imputed variants and for functional variants (as annotated by the Ensembl VEP software (LOW, MODERATE and HIGH)) when imputing from the BOVG50V1 array to WGS level. The target sets were created by retaining only the WGS genotypes that overlapped with the variants of the BOVG50V1 array, from the Global Reference Panel as well as its subsets, generated according to the continent of origin (African (87 individuals), Asian (106 individuals) and European (77 individuals) subsets). These target sets (i.e. Global, African, Asian, and European) were then used to impute to WGS level using the Global Reference Panel. FIGURE S33. Dosage R2 for all imputed variants and for functional variants for the GGPIND35 array when using the Global Reference Panel and four target sets (i.e., Global, African, Asian and European). Dosage R2 for all imputed variants and for functional variants (as annotated by the Ensembl VEP software (LOW, MODERATE and HIGH)) when imputing from the GGPIND35 array to WGS level. The target sets were created by retaining only the WGS genotypes that overlapped with the variants of the GGPIND35 array, from the Global Reference Panel as well as its subsets, generated according to the continent of origin (African (87 individuals), Asian (106 individuals) and European (77 individuals) subsets). These target sets (i.e. Global, African, Asian, and European) were then used to impute to WGS level using the Global Reference Panel. FIGURE S34. Dosage R2 for all imputed variants and for functional variants for the GGPLDV4 array when using the Global Reference Panel and four target sets (i.e., Global, African, Asian and European). Dosage R2 for all imputed variants and for functional variants (as annotated by the Ensembl VEP software (LOW, MODERATE and HIGH)) when imputing from the GGPLDV4 array to WGS level. The target sets were created by retaining only the WGS genotypes that overlapped with the variants of the GGPLDV4 array, from the Global Reference Panel as well as its subsets, generated according to the continent of origin (African (87 individuals), Asian (106 individuals) and European (77 individuals) subsets). These target sets (i.e. Global, African, Asian, and European) were then used to impute to WGS level using the Global Reference Panel. FIGURE S35. Dosage R2 for all imputed variants and for functional variants for the GGPLDV3 array when using the Global Reference Panel and four target sets (i.e., Global, African, Asian and European). Dosage R2 for all imputed variants and for functional variants (as annotated by the Ensembl VEP software (LOW, MODERATE and HIGH)) when imputing from the GGPLDV3 array to WGS level. The target sets were created by retaining only the WGS genotypes that overlapped with the variants of the GGPLDV3 array, from the Global Reference Panel as well as its subsets, generated according to the continent of origin (African (87 individuals), Asian (106 individuals) and European (77 individuals) subsets). These target sets (i.e. Global, African, Asian, and European) were then used to impute to WGS level using the Global Reference Panel.
ADDITIONAL FILE 11: FIGURE S36. Dosage R2 for all imputed indels and for functional indels for the HD array when using the Global Reference Panel and four target sets (i.e., Global, African, Asian and European). Dosage R2 for all imputed variants and for functional indels (as annotated by the Ensembl VEP software (LOW, MODERATE and HIGH)) when imputing from the HD array to WGS level. The target sets were created by retaining only the WGS genotypes that overlapped with the variants of the HD array, from the Global Reference Panel as well as its subsets, generated according to the continent of origin (African (87 individuals), Asian (106 individuals) and European (77 individuals) subsets). These target sets (i.e. Global, African, Asian, and European) were then used to impute to WGS level using the Global Reference Panel. FIGURE S37. Dosage R2 for all imputed indels and for functional indels for the BOS1 array when using the Global Reference Panel and four target sets (i.e., Global, African, Asian and European). Dosage R2 for all imputed variants and for functional indels (as annotated by the Ensembl VEP software (LOW, MODERATE and HIGH)) when imputing from the BOS1 array to WGS level. The target sets were created by retaining only the WGS genotypes that overlapped with the variants of the BOS1 array, from the Global Reference Panel as well as its subsets, generated according to the continent of origin (African (87 individuals), Asian (106 individuals) and European (77 individuals) subsets). These target sets (i.e. Global, African, Asian, and European) were then used to impute to WGS level using the Global Reference Panel. FIGURE S38. Dosage R2 for all imputed indels and for functional indels for the GGPF250 array when using the Global Reference Panel and four target sets (i.e., Global, African, Asian and European). Dosage R2 for all imputed variants and for functional indels (as annotated by the Ensembl VEP software (LOW, MODERATE and HIGH)) when imputing from theGGPF250 array to WGS level. The target sets were created by retaining only the WGS genotypes that overlapped with the variants of the GGPF250 array, from the Global Reference Panel as well as its subsets, generated according to the continent of origin (African (87 individuals), Asian (106 individuals) and European (77 individuals) subsets). These target sets (i.e. Global, African, Asian, and European) were then used to impute to WGS level using the Global Reference Panel. FIGURE S39. Dosage R2 for all imputed indels and for functional indels for the GGPHDV3 array when using the Global Reference Panel and four target sets (i.e., Global, African, Asian and European). Dosage R2 for all imputed variants and for functional indels (as annotated by the Ensembl VEP software (LOW, MODERATE and HIGH)) when imputing from theGGPHDV3 array to WGS level. The target sets were created by retaining only the WGS genotypes that overlapped with the variants of the GGPHDV3 array, from the Global Reference Panel as well as its subsets, generated according to the continent of origin (African (87 individuals), Asian (106 individuals) and European (77 individuals) subsets). These target sets (i.e. Global, African, Asian, and European) were then used to impute to WGS level using the Global Reference Panel. FIGURE S40. Dosage R2 for all imputed indels and for functional indels for the GGP90KT array when using the Global Reference Panel and four target sets (i.e., Global, African, Asian and European). Dosage R2 for all imputed variants and for functional indels (as annotated by the Ensembl VEP software (LOW, MODERATE and HIGH)) when imputing from theGGP90KT array to WGS level. The target sets were created by retaining only the WGS genotypes that overlapped with the variants of the GGP90KT array, from the Global Reference Panel as well as its subsets, generated according to the continent of origin (African (87 individuals), Asian (106 individuals) and European (77 individuals) subsets). These target sets (i.e. Global, African, Asian, and European) were then used to impute to WGS level using the Global Reference Panel. FIGURE S41. Dosage R2 for all imputed indels and for functional indels for the IND90KH array when using the Global Reference Panel and four target sets (i.e., Global, African, Asian and European). Dosage R2 for all imputed variants and for functional indels (as annotated by the Ensembl VEP software (LOW, MODERATE and HIGH)) when imputing from theIND90KH array to WGS level. The target sets were created by retaining only the WGS genotypes that overlapped with the variants of the IND90KH array, from the Global Reference Panel as well as its subsets, generated according to the continent of origin (African (87 individuals), Asian (106 individuals) and European (77 individuals) subsets). These target sets (i.e. Global, African, Asian, and European) were then used to impute to WGS level using the Global Reference Panel. FIGURE S42. Dosage R2 for all imputed indels and for functional indels for the ZMD2 array when using the Global Reference Panel and four target sets (i.e., Global, African, Asian and European). Dosage R2 for all imputed variants and for functional indels (as annotated by the Ensembl VEP software (LOW, MODERATE and HIGH)) when imputing from the ZMD2 array to WGS level. The target sets were created by retaining only the WGS genotypes that overlapped with the variants of the ZMD2 array, from the Global Reference Panel as well as its subsets, generated according to the continent of origin (African (87 individuals), Asian (106 individuals) and European (77 individuals) subsets). These target sets (i.e. Global, African, Asian, and European) were then used to impute to WGS level using the Global Reference Panel. FIGURE S43. Dosage R2 for all imputed indels and for functional indels for the ZOETIS1 array when using the Global Reference Panel and four target sets (i.e., Global, African, Asian and European). Dosage R2 for all imputed variants and for functional indels (as annotated by the Ensembl VEP software (LOW, MODERATE and HIGH)) when imputing from theZOETIS1 array to WGS level. The target sets were created by retaining only the WGS genotypes that overlapped with the variants of the ZOETIS1 array, from the Global Reference Panel as well as its subsets, generated according to the continent of origin (African (87 individuals), Asian (106 individuals) and European (77 individuals) subsets). These target sets (i.e. Global, African, Asian, and European) were then used to impute to WGS level using the Global Reference Panel. FIGURE S44. Dosage R2 for all imputed indels and for functional indels for the BOVMD array when using the Global Reference Panel and four target sets (i.e., Global, African, Asian and European). Dosage R2 for all imputed variants and for functional indels (as annotated by the Ensembl VEP software (LOW, MODERATE and HIGH)) when imputing from the BOVMD array to WGS level. The target sets were created by retaining only the WGS genotypes that overlapped with the variants of the BOVMD array, from the Global Reference Panel as well as its subsets, generated according to the continent of origin (African (87 individuals), Asian (106 individuals) and European (77 individuals) subsets). These target sets (i.e. Global, African, Asian, and European) were then used to impute to WGS level using the Global Reference Panel. FIGURE S45. Dosage R2 for all imputed indels and for functional indels for the IDBV3 array when using the Global Reference Panel and four target sets (i.e., Global, African, Asian and European). Dosage R2 for all imputed variants and for functional indels (as annotated by the Ensembl VEP software (LOW, MODERATE and HIGH)) when imputing from the IDBV3 array to WGS level. The target sets were created by retaining only the WGS genotypes that overlapped with the variants of the IDBV3 array, from the Global Reference Panel as well as its subsets, generated according to the continent of origin (African (87 individuals), Asian (106 individuals) and European (77 individuals) subsets). These target sets (i.e. Global, African, Asian, and European) were then used to impute to WGS level using the Global Reference Panel. FIGURE S46. Dosage R2 for all imputed indels and for functional indels for the SNP50V3 array when using the Global Reference Panel and four target sets (i.e., Global, African, Asian and European). Dosage R2 for all imputed variants and for functional indels (as annotated by the Ensembl VEP software (LOW, MODERATE and HIGH)) when imputing from the SNP50V3 array to WGS level. The target sets were created by retaining only the WGS genotypes that overlapped with the variants of the SNP50V3 array, from the Global Reference Panel as well as its subsets, generated according to the continent of origin (African (87 individuals), Asian (106 individuals) and European (77 individuals) subsets). These target sets (i.e. Global, African, Asian, and European) were then used to impute to WGS level using the Global Reference Panel. FIGURE S47. Dosage R2 for all imputed indels and for functional indels for the ANGGS array when using the Global Reference Panel and four target sets (i.e., Global, African, Asian and European). Dosage R2 for all imputed variants and for functional indels (as annotated by the Ensembl VEP software (LOW, MODERATE and HIGH)) when imputing from theANGGS array to WGS level. The target sets were created by retaining only the WGS genotypes that overlapped with the variants of the ANGGS array, from the Global Reference Panel as well as its subsets, generated according to the continent of origin (African (87 individuals), Asian (106 individuals) and European (77 individuals) subsets). These target sets (i.e. Global, African, Asian, and European) were then used to impute to WGS level using the Global Reference Panel. FIGURE S48. Dosage R2 for all imputed indels and for functional indels for the BOVG50V1 array when using the Global Reference Panel and four target sets (i.e., Global, African, Asian and European). Dosage R2 for all imputed variants and for functional indels (as annotated by the Ensembl VEP software (LOW, MODERATE and HIGH)) when imputing from the BOVG50V1 array to WGS level. The target sets were created by retaining only the WGS genotypes that overlapped with the variants of the BOVG50V1 array, from the Global Reference Panel as well as its subsets, generated according to the continent of origin (African (87 individuals), Asian (106 individuals) and European (77 individuals) subsets). These target sets (i.e. Global, African, Asian, and European) were then used to impute to WGS level using the Global Reference Panel. FIGURE S49. Dosage R2 for all imputed indels and for functional indels for the GGPIND35 array when using the Global Reference Panel and four target sets (i.e., Global, African, Asian and European). Dosage R2 for all imputed variants and for functional indels (as annotated by the Ensembl VEP software (LOW, MODERATE and HIGH)) when imputing from the GGPIND35 array to WGS level. The target sets were created by retaining only the WGS genotypes that overlapped with the variants of the GGPIND35 array, from the Global Reference Panel as well as its subsets, generated according to the continent of origin (African (87 individuals), Asian (106 individuals) and European (77 individuals) subsets). These target sets (i.e. Global, African, Asian, and European) were then used to impute to WGS level using the Global Reference Panel. FIGURE S50. Dosage R2 for all imputed indels and for functional indels for the GGPLDV4 array when using the Global Reference Panel and four target sets (i.e., Global, African, Asian and European). Dosage R2 for all imputed variants and for functional indels (as annotated by the Ensembl VEP software (LOW, MODERATE and HIGH)) when imputing from the GGPLDV4 array to WGS level. The target sets were created by retaining only the WGS genotypes that overlapped with the variants of the GGPLDV4 array, from the Global Reference Panel as well as its subsets, generated according to the continent of origin (African (87 individuals), Asian (106 individuals) and European (77 individuals) subsets). These target sets (i.e. Global, African, Asian, and European) were then used to impute to WGS level using the Global Reference Panel. FIGURE S51. Dosage R2 for all imputed indels and for functional indels for the GGPLDV3 array when using the Global Reference Panel and four target sets (i.e., Global, African, Asian and European). Dosage R2 for all imputed variants and for functional indels (as annotated by the Ensembl VEP software (LOW, MODERATE and HIGH)) when imputing from the GGPLDV3 array to WGS level. The target sets were created by retaining only the WGS genotypes that overlapped with the variants of the GGPLDV3 array, from the Global Reference Panel as well as its subsets, generated according to the continent of origin (African (87 individuals), Asian (106 individuals) and European (77 individuals) subsets). These target sets (i.e. Global, African, Asian, and European) were then used to impute to WGS level using the Global Reference Panel.
ADDITIONAL FILE 12: FIGURE S52. Plot for principal component (PC) 1 and PC2 for the individuals collected across four African countries, genotyped with the Geneseek 50 k array. Plot for principal component (PC) 1 and PC2 for the individuals collected across four African countries (namely Tanzania, Ghana, Nigeria, and Burkina Faso), genotyped with the Geneseek 50 k array. Only individuals unrelated were used (relatedness value from vcftools -relatedness2 > 0.0625). Coloured by country. FIGURE S53. Plot for principal component (PC) 1 and PC2 for the individuals collected across four African countries, genotyped with the Illumina HD array. Plot for principal component (PC) 1 and PC2 for the individuals collected across four African countries (namely Tanzania, Ghana, Nigeria, and Burkina Faso), genotyped with the Illumina HD array. Only individuals unrelated were used (relatedness value from vcftools -relatedness2 > 0.0625). Coloured by country. FIGURE S54. Plot for principal component (PC) 1 and PC2 for the combined data of 2,481 individuals genotyped with the Illumina HD array. Plot for principal component (PC) 1 and PC2 for combined data of 2,481 individuals, genotyped with the Illumina HD array. Only individuals unrelated were used (relatedness value from vcftools –relatedness2 > 0.0625). Coloured by population, as reported in Additional file 4: Table S3.
DATA AND MATERIALS AVAILABILITY : Some of the sequence data used in this study are from public databases,
as detailed in Additional fle 1: Tables S1 and S2. Whole-genome sequence
variants (i.e., 35,842,537 SNPs) from the 120 samples of Boran, N’Dama and
Holstein cattle (i.e., 40 samples per breed) and raw Illumina HD genotypes
(i.e., 777,962 SNPs) mapped to the bovine UMD3.1 genome assembly for 3092
cattle from the four African countries (namely Burkina Faso, Ghana, Nigeria,
and Tanzania) have been uploaded on Zenodo with https://doi.org/10.5281/
zenodo.6855979 and https://doi.org/10.5281/zenodo.6791394, respectively.