MethylToSNP : identifying SNPs in Illumina DNA methylation array data

dc.contributor.authorLaBarre, Brenna A.
dc.contributor.authorGoncearenco, Alexander
dc.contributor.authorPetrykowska, Hanna M.
dc.contributor.authorJaratlerdsiri, Weerachai
dc.contributor.authorBornman, Maria S. (Riana)
dc.contributor.authorHayes, Vanessa M.
dc.contributor.authorElnitski, Laura
dc.date.accessioned2020-04-17T07:35:17Z
dc.date.available2020-04-17T07:35:17Z
dc.date.issued2019-12-20
dc.descriptionAdditional file 1. Supplemental Methods. Additional materials are provided for the determination of default thresholds (Figure. S1), assessment of false negative rates (Figure. S2), and inverse quantile weighting (Figure. S3).en_ZA
dc.description.abstractBACKGROUND : Current array-based methods for the measurement of DNA methylation rely on the process of sodium bisulfite conversion to differentiate between methylated and unmethylated cytosine bases in DNA. In the absence of genotype data this process can lead to ambiguity in data interpretation when a sample has polymorphisms at a methylation probe site. A common way to minimize this problem is to exclude such potentially problematic sites, with some methods removing as much as 60% of array probes from consideration before data analysis. RESULTS: Here, we present an algorithm implemented in an R Bioconductor package, MethylToSNP, which detects a characteristic data pattern to infer sites likely to be confounded by polymorphisms. Additionally, the tool provides a stringent reliability score to allow thresholding on SNP predictions. We calibrated parameters and thresholds used by the algorithm on simulated and real methylation data sets. We illustrate findings using methylation data from YRI (Yoruba in Ibadan, Nigeria), CEPH (European descent) and KhoeSan (southern African) populations. Our polymorphism predictions made using MethylToSNP have been validated through SNP databases and bisulfite and genomic sequencing. CONCLUSIONS : The benefits of this method are threefold. First, it prevents extensive data loss by considering only SNPs specific to the individuals in the study. Second, it offers the possibility to identify new polymorphisms in samples for which there is little known about the genetic landscape. Third, it identifies variants as they exist in functional regions of a genome, such as in CTCF (transcriptional repressor) sites and enhancers, that may be common alleles or personal mutations with potential to deleteriously affect genomic regulatory activities. We demonstrate that MethylToSNP is applicable to the Illumina 450K and Illumina 850K EPIC array data and is also backwards compatible to the 27K methylation arrays. Going forward, this kind of nuanced approach can increase the amount of information derived from precious data sets by considering samples of the project individually to enable more informed decisions about data cleaning.en_ZA
dc.description.departmentSchool of Health Systems and Public Health (SHSPH)en_ZA
dc.description.librarianam2020en_ZA
dc.description.sponsorshipIntramural Program of the National Human Genome Research Institute to LE (Grant No. 1ZIAHG200323-14). This work was also supported by an Australian Research Council (ARC) Discovery Project Grant awarded to VMH (DP170103071) and sampling contributed by the Cancer Association of South Africa (CANSA) to MSRB and VMH. VMH is supported by the University of Sydney Foundation in a Petre Foundation chair position.en_ZA
dc.description.urihttps://epigeneticsandchromatin.biomedcentral.comen_ZA
dc.identifier.citationLabarre, B.A., Goncearenco, A., Petrykowska, H.M. et al. 2019, 'MethylToSNP : identifying SNPs in Illumina DNA methylation array data', Epigenetics & Chromatin, vol. 12, art. 79, pp. 1-14.en_ZA
dc.identifier.issn1756-8935 (online)
dc.identifier.other10.1186/s13072-019-0321-6
dc.identifier.urihttp://hdl.handle.net/2263/74198
dc.language.isoenen_ZA
dc.publisherBioMed Centralen_ZA
dc.rights© The Author(s) 2019. This article is licensed under a Creative Commons Attribution 4.0 International License.en_ZA
dc.subjectBisulfite sequencingen_ZA
dc.subjectIllumina methylation arrayen_ZA
dc.subjectData analysisen_ZA
dc.subjectMethylation probesen_ZA
dc.subjectPolymorphismsen_ZA
dc.subjectEnhancersen_ZA
dc.subjectCTCF sitesen_ZA
dc.subjectSingle nucleotide polymorphism (SNP)en_ZA
dc.titleMethylToSNP : identifying SNPs in Illumina DNA methylation array dataen_ZA
dc.typeArticleen_ZA

Files

Original bundle

Now showing 1 - 2 of 2
Loading...
Thumbnail Image
Name:
LaBarre_MethylToSNP_2019.pdf
Size:
1.2 MB
Format:
Adobe Portable Document Format
Description:
Article
Loading...
Thumbnail Image
Name:
LaBarre_MethylToSNPAddfile_2019.docx
Size:
535.17 KB
Format:
Microsoft Word XML
Description:
Additional File 1

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.75 KB
Format:
Item-specific license agreed upon to submission
Description: