Identification of multiple loci associated with social parasitism in honeybees

In colonies of the honeybee Apis mellifera, the queen is usually the only reproductive female, which produces new females (queens and workers) by laying fertilized eggs. However, in one subspecies of A. mellifera, known as the Cape bee (A. m. capensis), worker bees reproduce asexually by thelytoky, an abnormal form of meiosis where two daughter nucleii fuse to form single diploid eggs, which develop into females without being fertilized. The Cape bee also exhibits a suite of phenotypes that facilitate social parasitism whereby workers lay such eggs in foreign colonies so their offspring can exploit their resources. The genetic basis of this switch to social parasitism in the Cape bee is unknown. To address this, we compared genome variation in a sample of Cape bees with other African populations. We find genetic divergence between these populations to be very low on average but identify several regions of the genome with extreme differentiation. The regions are strongly enriched for signals of selection in Cape bees, indicating that increased levels of positive selection have produced the unique set of derived phenotypic traits in this subspecies. Genetic variation within these regions allows unambiguous genetic identification of Cape bees and likely underlies the genetic basis of social parasitism. The candidate loci include genes involved in ecdysteroid signaling and juvenile hormone and dopamine biosynthesis, which may regulate worker ovary activation and others whose products localize at the centrosome and are implicated in chromosomal segregation during meiosis. Functional analysis of these loci will yield insights into the processes of reproduction and chemical signaling in both parasitic and non-parasitic populations and advance understanding of the process of normal and atypical meiosis.

Description

Conceived and designed the experiments: MTW. Performed the experiments: AWMTW. Analyzed the data: AW. Contributed reagents/materials/analysis tools: CWP MHA. Wrote the paper: AW MTW.
S1 Fig. Low-FST SNPs dominate the dataset and the western subpopulation shows more fixation against other populations. (A) FST was computed for every SNP segregating between the Cape bee (capensis, n = 10) and the scutellata + adansonii (SA; n = 20) background population (n = 6,245,176) and counted for FST bins of 0.05. The distribution is dominated by variants segregating at similar frequencies between the two groups: 93.3% of SNPs (n = 5.89 x 106) have FST values below 0.1 (blue area) whereas 0.33% of SNPs (n = 20,460) have FST values above 0.3 (green area). 45 SNPs in the dataset are fixed between the two groups. (B) FST was computed for every SNP between the western Cape Town subpopulation (CT; n = 5) and SA (n = 20) (n = 5,934,995; black line) and the eastern Port Elizabeth subpopulation (PE; n = 5) and SA (n = 5,907,010; grey line) individually, and binned as in (A). The PE vs SA comparison has significant overrepresentation of low FST SNPs (FST = 0.00–0.005; 4,880,183 vs 4,810,196; p<1e-5, chi-squared test), whereas the CT vs SA comparison have significantly more SNPs segregating at every bin with FST>0.05 (p<1e-5, chi-squared test). (C) The individual counts from (B) were combined and the proportion of CT vs SA variants segregating at each FST interval was computed. For low FST variants (FST<0.3), CT SNPs make up less than 60% of SNPs (proportion CT SNPs per bin ranges from 0.496 to 0.549), whereas they are increasingly overrepresented for higher FST variants (FST>0.5; proportion CT SNPs per bin 0.694–0.856).
S2 Fig. Extreme haplotype homozygosity and divergence is more pronounced in Cape bees than the background population and associate with high-FST SNPs. (A) The XP-EHH was estimated for SNPs with MAF>0.02 (n = 6,196,550) using the program selscan ([28]) and binned (units of 0.01; blue distribution). The average XP-EHH for was also computed for 1kbp windows and binned (n = 188,088; units of 0.01; red distribution). The distribution is centered around 0 (mean XP-EHH = 0.007). The upper and lower 99.9% and 99.5% percentiles were identified from the empirical distribution of SNP and window XP-EHH estimates. These are more extreme when associated with the Cape bees (XP-EHH>0), compared to the background population (scutellata + adansonii; XP-EHH<0). 99.9% percentiles for SNPs (blue solid lines): 4.12 vs -3.46. 99.5% percentiles for SNPs (blue dashed lines): 2.9 vs -2.74. 99.9% percentiles for windows (red solid lines): 4.19 vs -2.72. 99.5% percentiles for SNPs (blue dashed lines): 2.53 vs -2.06. (B) The population branch statistic (PBS) was estimated for 1kbp windows across the genome (n = 189,053). The distribution is slightly skewed towards higher PBS in the Cape bees than the background population (mean PBS = 0.039). Upper and lower 99.9% and 99.5% percentiles were identified from the empirical distribution of PBS estimates. These are more extreme when associated with the Cape bees (PBS>0), compared to the background population (PBS<0). 99.9% percentiles (green solid lines): 1.27 vs -0.34. 99.5% percentiles (blue dotted lines): 0.51 vs -0.16. (C) The XP-EHH of SNPs was cross-referenced with the FST computed from the allele frequency differences between the Cape bees and the scutellata + adansonii background population. SNPs were binned for FST (units of 0.05). For every FST class, the mean XP-EHH was computed and 95% confidence intervals were computed from 200 bootstrap replicates. Each class was compared to the upper percentiles retrieved from the empirical XP-EHH distribution compiled in (A), which are associated with long haplotypes in the Cape bees. The 99.9% percentile is 4.12 (grey solid line); the 99.5% percentile is 2.9 (grey dashed line); and the 99% percentile is 2.49 (grey dotted line). SNPs with FST>0.85 have haplotype homozygosity patterns that are strongly biased towards long haplotypes in the Cape bees and have XP-EHH values that exceed, on average, those of the top 0.5% of SNPs (p<0.01). (D) Cross-reference between the FST of SNPs and the average XP-EHH for the 1kbp window in which the SNP was located. FST classes and bootstraps used as in (C) and empirical percentiles from (A). The 99.9% percentile is 4.19 (grey solid line); the 99.5% percentile is 2.53 (grey dashed line); and the 99% percentile is 2 (grey dotted line). The result is similar to (C). SNPs with FST>0.85 occur in regions with long haplotypes in the Cape bees, exceeding, on average, those of the top 0.5% regions across the genome (p<0.01). (E) Cross-reference between the FST of SNPs and the PBS for the 1kbp window in which the SNP was located. FST classes and bootstraps used as in (C) and empirical percentiles from (B). The 99.9% percentile is 1.27 (grey solid line); the 99.5% percentile is 0.51 (grey dashed line); and the 99% percentile is 0.34 (grey dotted line). SNPs with FST>0.7 occur in regions with divergence patterns that are biased towards a long branch leading to the Cape bees, exceeding, on average, the divergence detected in the top 0.5% regions across the genome (p<0.01).
S3 Fig. Extreme haplotype divergence and homozygosity signals centers around outlier SNPs (FST>0.9) and decay after 40–60kbp. (A) Every SNP with FST>0.2 and minimal evidence for extended haplotype homozygosity (XP-EHH) in the Cape bees (XP-EHH>0) was put in FST bins of 0.10. The linked divergence was traced around each SNP by computing FST across 1kbp windows for up to 500kbp to either side of each SNP. 95% confidence intervals were computed from 200 bootstrap replicates for every SNP FST class and distance. High FST SNPs appear to be clustered together in peaks: 3kbp away from the most highly differentiated SNPs (FST>0.9), the window-based divergence is higher (0.6) than for SNPs with FST = 0.7–0.8 (0.3; p<0.01), indicative of higher density of highly differentiated SNPs around the former. The decay of linked signals was compared against genome-wide percentiles: top 0.1% (solid line), top 0.5% (dashed line), 1% (dotted line). Linked divergence signals extend longest around high FST SNPs: divergence drops to the top 1% level after about 50kbp for SNPs at FST>0.9, compared to about 20kbp for SNPs at FST = 0.8–0.9 and 4kbp for SNPs at 0.7–0.8. The most differentiated SNPs therefore appear to be located in regions with the highest and widest linked divergence. (B) Tracing the decay of the population branch statistic (PBS) using the same procedure and SNPs as in (A). SNPs with high FST are clustered for the PBS and correlated with the widest linked signals: 3kbp away from SNPs with FST>0.9, the PBS in linked regions is significantly elevated compared to other SNPs with FST = 0.7–0.8 (1.1 vs 0.45; p<0.01) and it takes about 60kbp for the PBS to drop to the top 1% level around FST>0.9 SNPs, compared to 40kbp for SNPs at FST = 0.8–0.9 and 15kbp for SNPs at 0.7–0.8. The most differentiated SNPs are hence located in regions that with a long branch against both other African and European bees. (C) Tracing the decay of the XP-EHH statistic using the same procedure and SNPs as in (A). XP-EHH was estimated for SNPs with MAF>0.02. The average XP-EHH was computed for every 1kbp window up to 500kbp away from each SNP. The general pattern is consistent with the estimates for divergence. SNPs with high FST typically occur in regions with the widest haplotype homozygosity signals: 3kbp away from SNPs with FST>0.9, the XP-EHH in linked regions is significantly elevated compared to SNPs with FST = 0.7–0.8 (4 vs 1.8; p<0.01). It takes about 40kbp for the XP-EHH to drop to the top 1% level around FST>0.9 SNPs, compared to 20kbp for SNPs at FST = 0.8–0.9 and only 2kbp for SNPs at 0.7–0.8.
S4 Fig. The Cape bees have a larger number of divergent regions across the genome than the other two subspecies. The fixation index (FST) was computed for every SNP segregating between each subspecies and a combined population consisting of the other two African subspecies (further described in Fig 2). (A) A. m. capensis vs adansonii + scutellata. The main Cape bee scan has FST peaks (FST>0.8) distributed across 13 chromosomes. (B) A. m. scutellata vs adansonii + capensis. The scutellata scan detects a single FST peak (FST>0.8) on chromosome 1. (C) A. m. adansonii vs capensis + scutellata. The adansonii scan detects FST peaks (FST>0.8) distributed across 6 chromosomes. Most of these SNPs are clustered on chromosomes 7 and 11.
S5 Fig. The western subpopulation is associated with more extensive selection signals than the eastern subpopulation. (A) SNPs segregating between the Cape bees and the African background population (scutellata + adansonii) were sorted for FST and the top 0.1% (n = 6245) were filtered for minimal evidence for extended haplotype homozygosity in the Cape bees (XP-EHH>0) to make a candidate set of variants with trending evidence for selection in the Cape population (n = 2917). The decay of the population branch statistic (PBS) and XP-EHH, respectively, was traced in 1kbp windows for the western Cape town (CT; black line) and eastern Port Elizabeth (PE; grey line) subpopulations for up to 500kbp to either side of every such SNP. 95% confidence intervals (grey) were computed from 200 bootstrap replicates at every distance. The mean PBS and XP-EHH signals are distinctly stronger in CT compared to PE close to these SNPs: at 3kbp away from a SNP, PBSCT is almost twice as high as PBSPE (0.70 vs 0.38; p<0.01) and XP-EHHCT is almost 2.5x times higher than for XP-EHHPE (1.80 vs 0.72; p<0.01). The linked signatures are significantly stronger in CT than in PE around these SNPs for hundreds of kilobasepairs: PBSCT>PBSPE for over 500kbp (p<0.05) and XP-EHHCT>XP- EHHPE for 350kbp (p<0.05). These haplotype patterns are consistent with stronger signatures of selection in the CT subpopulation. (B) Analysis as in (A) but tracing the PBS and XP-EHH for CT and PE around outliers (n = 3820) identified specifically between the CT subpopulation and the SA population. The pattern is consistent with (A) but the difference between the CT and PE subpopulation for linked signatures of selection is further amplified when focused around these outlier SNPs: at 3kbp away from a SNP, the mean PBS for CT is 0.66 vs 0.27 for PE (p<0.01) and the mean XP-EHH for CT is 1.76 vs 0.28 for PE (p<0.01). These haplotype patterns indicate that some signatures of selection in the CT subpopulation are considerably weaker in the PE subpopulation. (C) Analysis as in (A) but tracing the PBS and XP-EHH for CT and PE around outliers (n = 1684) identified specifically between the PE subpopulation and the SA population. Also in this comparison and set of outliers, the CT subpopulation have significantly stronger signatures of selection with regards to the PBS and XP-EHH statistics: at 3kbp away from a SNP, the mean PBS for CT is 0.62 vs 0.48 for PE (p<0.01) and the mean XP-EHH for CT is 1.36 vs 1.13 for PE (p<0.01). These haplotype patterns indicate that the signatures of selection in the PE subpopulation may have substantial overlap with or be a subset of those detected in the CT subpopulation. (D) Analysis as in (A) but tracing the PBS and XP-EHH for CT and PE around 3000 randomly sampled SNPs segregating between the Cape bees and the SA population. PBS and XP-EHH decay patterns around these SNPs are mostly absent and indistinguishable between the CT and PE subpopulations. The PBS is significantly biased towards slightly higher values in the CT subpopulation: at 3kbp away from a SNP, the mean PBS for CT is 0.083 vs 0.069 for PE (p<0.01). For XP-EHH, the 95% confidence intervals estimated for the CT and PE subpopulations overlap throughout the full range.
S6 Fig. Genes with putatively selected variants have elevated selection signals compared to the genomic background. (A) Genetic distance (FST estimator of Reynolds et al. [104]), the population branch statistic (PBS) and cross population extended haplotype heterozygosity (XP-EHH) between the Cape bees and the African background population were estimated for 1kbp windows (scutellata + adansonii) across the full genome and joined into a Composite Selection Score (CSS; [24]). The window-based CSS estimates were cross-referenced with accessions by taking the full gene-body coordinates adjusted to include 2kbp of upstream and downstream sequence. Across all 13,281 accessions, the mean CSS score is 0.383. The 99% percentile for CSS is 1.546 and the 99.5% percentile for CSS is 1.912. The 97 accessions identified by taking the top 1000 SNPs ranked for their CSS (FST and XP-EHH) have significantly elevated CSS across the full gene body (CSS = 1.654; p<0.05). This set was reduced to include only accessions with SNPs above a threshold level of fixation. As these thresholds became increasingly strict, we enriched for accessions with high overall CSS scores. The 25 accessions that include putative causative variants with FST>0.8 (99.99% percentile for FST) and have a mean CSS score of 2.175, significantly higher than the top 1% of genes (p<0.05). 95% confidence intervals were generated from 200 bootstrap replicates of each class of genes.
S7 Fig. Significantly enriched gene ontology (GO) terms among Drosophila orthologues of the Cape bee candidates. The top 1000 SNPs sorted for their Composite Selection Signal (CSS; FST+XP-EHH) located within 8kbp from the closest gene body were associated with 97 accessions in the honeybee genome (S3 Table), 73 of which had previously been matched to 68 unique Drosophila accessions using BLASTx with an e-value <0.5 (Wallberg et al. [21]). 61 of these accessions and a background set of 6253 honeybee-fly orthologues were recognized by the GOrilla platform and the candidate set was queried for significantly enriched GO-terms. (A) We detected 13 significantly enriched biological processes (p<0.05), 10 of which had a qvalue <0.05 after correcting for multiple testing. The significantly enriched terms with no nested GO-terms below them are “Ecdysteroid metabolic process” (GO:0045455) and “AcylcoA biosynthetic process” (GO:0071616). On the left: graph showing the interrelationships among GO-terms. On the right: table of GO-terms and their associated values and counts. Enrichment abbreviations are as follows: N = the total number of genes; B = the total number of genes associated with a specific GO term; n = the number of genes in in the target set; b = the number of genes in the intersection. Enrichment is taken as (b/n) / (B/N). (B) We detected 8 significantly enriched molecular functions (p<0.05), 5 of which had a q-value <0.05 after correcting for multiple testing. The significantly enriched terms with no nested GO-terms below them are “Flavin adenine dinucleotide binding” (GO:0050660), “Transmembrane signaling receptor activity” (GO:0004888), “UDP-glycosyltransferase activity”, (GO:0008194) and “Choline dehydrogenase activity” (GO:0008812).
S8 Fig. Twelve selective sweeps with high-FST variants detected in the Cape bees. Among the 97 candidate accessions associated with the top 1000 SNPs ranked for the composite selection score (CSS) were 25 accessions that include at least one SNP above the 99.99% FST percentile (FST>0.80). These are located in 12 selective sweeps (labeled from A to L) across the genome. (A) Sweep A; around accession GB40769 (LOC412458; Dehydrogenase/reductase SDR family member 11-like). Top plot: FST of all SNPs in the region (grey dots); FST of non-synonymous SNPs (red dots), if detected; blue line is the population branch statistic PBS measured over 1kbp non-overlapping windows; gene bodies in grey above graph; the main accession is highlighted in green. Middle plot: XP-EHH of all SNPs in the region (grey dots); XP-EHH of non-synonymous SNPs (red dots); blue line is the mean XP-EHH measured over 1kbp nonoverlapping windows. Genes as in the top plot. Bottom plot: the Composite Selection Score (CSS; from [24]) based on both FST and XP-EHH of all SNPs in the region (grey dots); CSS of non-synonymous SNPs (red dots); blue line is the mean CSS measured over 1kbp non-overlapping windows. Genes as in the top plot. (B) Sweep B; around accession GB46500 (Ethr; Ecdysis triggering hormone receptor). Plots as in (A). (C) Sweep C; around accession GB50742 (LOC102655146; Bardet-biedl syndrome 1 protein-like). Plots as in (A). (D) Sweep D; around accession GB54486 (LOC411978; β-glucosidase). Plots as in (A). (E) Sweep E; around accession GB54634 (LOC725260; Uncharacterized LOC725260). Plots as in (A). (F) Sweep F; around accession GB43519 (LOC411614; Neuropeptide y receptor-like). Plots as in (A). (G) Sweep G; around accession GB44980 (LOC409260; Epidermal retinol dehydrogenase 2-like). Plots as in (A). (H) Sweep H; around accession GB45239 (LOC100576557; Uncharacterized protein LOC100576557). Plots as in (A). (I) Sweep I; around accession GB40077 (LOC726040; Probable 4-coumarate—coA ligase 3-like). Plots as in (A). (J) Sweep J; around accession GB49919 (LOC724687; Uncharacterized LOC724687). Plots as in (A). (K) Sweep K; around accession 14 GB52757 (LOC408473; Uncharacterized LOC408473). Plots as in (A). (L) Sweep L; around accession GB45973 (Ddc; Dopa decarboxylase). Plots as in (A).
S9 Fig. The gemini deletion is not fixed in the Cape bees and is detected in other African subspecies. Short read data was mapped across the putative 9bp deletion (the thelytoky associated element 1; tae1) in one of the introns of the transcription factor Gemini and visualized using samtools console output. (A) Read data for 10 capensis samples. The deletion element is located to position 1,532,307–1,532,315 on chromosome 13 (5’-TTCCATCGT-3’ on the plus strand; highlighted in black). Samples #4 and #5 have tae1-consistent reads without the motif mapped across the region. These are aligned with -symbols indicating gaps in the alignment against the reference sequence. Samples #2 and #3 do not have well mapped data across the region and may also lack the element. (B) Read data for 10 scutellata samples. Samples #2 and #3 have tae1-consistent reads without the motif mapped across the region. Sample #10 does not have well mapped data across the region and may lack the element. (C) Read data for 10 adansonii samples. Samples #5 and #10 tae1-consistent reads without the motif mapped across the region. Samples #1 and #7 does not have well mapped data across the region and may lack the element. (D) Table summarizing the short read evidence for the occurrence of tae1 across the three subspecies. Reads aligned with gaps across the deletion motif were detected in 2 out 10 samples in all three subspecies. Additional samples may also have the deletion. The deletion is not fixed in the Cape bees.
S1 Table. Fixed SNPs between Cape bees and the African background population.
S2 Table. Top 1000 SNPs ranked for the Composite Selection Score (CSS) between Cape bees and the African background population.
S3 Table. Information about the 97 accessions associated with the top 1000 CSS SNPs.
S4 Table. Significantly enriched Gene Ontology terms detected among the Drosophilia orthologs of the candidate accessions.

Keywords

Cape bees, African populations, Thelytoky, Genetic variation, Honeybees(Apis mellifera)

Citation

Wallberg A, Pirk CW, Allsopp MH, Webster MT (2016) Identification of Multiple Loci Associated with Social Parasitism in Honeybees. PLoS Genet 12 (6): e1006097. DOI: 10.1371/journal.pgen.1006097.

URI

http://hdl.handle.net/2263/56460

Collections

Research Articles (Zoology and Entomology)

Full item page

Identification of multiple loci associated with social parasitism in honeybees

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Sustainable Development Goals

Citation

URI

Collections