Illumina error correction near highly repetitive DNA regions improves de novo genome assembly

dc.contributor.authorHeydari, Mahdi
dc.contributor.authorMiclotte, Giles
dc.contributor.authorVan de Peer, Yves
dc.contributor.authorFostier, Jan
dc.date.accessioned2020-07-11T07:14:02Z
dc.date.available2020-07-11T07:14:02Z
dc.date.issued2019-06-03
dc.description.abstractBACKGROUND : Several standalone error correction tools have been proposed to correct sequencing errors in Illumina data in order to facilitate de novo genome assembly. However, in a recent survey, we showed that state-of-the-art assemblers often did not benefit from this pre-correction step. We found that many error correction tools introduce new errors in reads that overlap highly repetitive DNA regions such as low-complexity patterns or short homopolymers, ultimately leading to a more fragmented assembly. RESULTS : We propose BrownieCorrector, an error correction tool for Illumina sequencing data that focuses on the correction of only those reads that overlap short DNA patterns that are highly repetitive in the genome. BrownieCorrector extracts all reads that contain such a pattern and clusters them into different groups using a community detection algorithm that takes into account both the sequence similarity between overlapping reads and their respective paired-end reads. Each cluster holds reads that originate from the same genomic region and hence each cluster can be corrected individually, thus providing a consistent correction for all reads within that cluster. CONCLUSIONS : BrownieCorrector is benchmarked using six real Illumina datasets for different eukaryotic genomes. The prior use of BrownieCorrector improves assembly results over the use of uncorrected reads in all cases. In comparison with other error correction tools, BrownieCorrector leads to the best assembly results in most cases even though less than 2% of the reads within a dataset are corrected. Additionally, we investigate the impact of error correction on hybrid assembly where the corrected Illumina reads are supplemented with PacBio data. Our results confirm that BrownieCorrector improves the quality of hybrid genome assembly as well. BrownieCorrector is written in standard C++11 and released under GPL license. BrownieCorrector relies on multithreading to take advantage of multi-core/multi-CPU systems. The source code is available at https://github.com/biointec/browniecorrector.en_ZA
dc.description.departmentGeneticsen_ZA
dc.description.librarianam2020en_ZA
dc.description.sponsorshipThe Research Foundation - Flanders (FWO) (G0C3914N). Computational resources and services were provided by the Flemish Supercomputer Center, funded by Ghent University, the Hercules Foundation and the Flemish Government – EWIen_ZA
dc.description.urihttps://bmcbioinformatics.biomedcentral.comen_ZA
dc.identifier.citationHeydari, M., Miclotte, G., Van De Peer, Y. et al. 2019, 'Illumina error correction near highly repetitive DNA regions improves de novo genome assembly', BMC Bioinformatics, vol. 20, art. 298, pp. 1-13.en_ZA
dc.identifier.issn1471-2105 (online)
dc.identifier.other10.1186/s12859-019-2906-2
dc.identifier.urihttp://hdl.handle.net/2263/75146
dc.language.isoenen_ZA
dc.publisherBioMed Centralen_ZA
dc.rights© The Author(s). 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License.en_ZA
dc.subjectIllumina sequencing dataen_ZA
dc.subjectDe novo genome assemblyen_ZA
dc.subjectError correctionen_ZA
dc.subjectDe Bruijn graphen_ZA
dc.subjectCommunity detectionen_ZA
dc.titleIllumina error correction near highly repetitive DNA regions improves de novo genome assemblyen_ZA
dc.typeArticleen_ZA

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Heydari_Illumina_2019.pdf
Size:
1.64 MB
Format:
Adobe Portable Document Format
Description:
Article

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.75 KB
Format:
Item-specific license agreed upon to submission
Description: