Comparison of SNP-based subtyping workflows for bacterial isolates using WGS data, applied to Salmonella enterica serotype Typhimurium and serotype 1,4,[5],12:i:-

dc.contributor.authorSaltykova, Assia
dc.contributor.authorWuyts, Veronique
dc.contributor.authorMattheus, Wesley
dc.contributor.authorBertrand, Sophie
dc.contributor.authorRoosens, Nancy H.C.
dc.contributor.authorMarchal, Kathleen
dc.contributor.authorDe Keersmaecker, Sigrid C.J.
dc.date.accessioned2018-03-28T09:39:12Z
dc.date.available2018-03-28T09:39:12Z
dc.date.issued2018-02-06
dc.descriptionS1 Fig. Comparison of the sequencing samples based on the read mapping statistics. Read mapping statistics were obtained from Qualimap reports of the raw reads mapped on LT2 and SL1344 reference genomes, and re-plotted in R to improve visualization.en_ZA
dc.descriptionS2 Fig. Original genome coverage plots generated by Qualimap with LT2 and SL1344 reference genomes.en_ZA
dc.descriptionS3 Fig. Comparison of CFSAN and PHEnix variant selection procedures.en_ZA
dc.descriptionS4 Fig. Phylogenetic trees generated with the tested SNP-based subtyping workflows using high-coverage dataset and LT2 as a reference genome. (A) CSI-based workflow, (B) PHEnix- based workflow, (C) adapted PHEnix-based workflow, (D) CFSAN-based workflow, (E) adapted CFSAN-based workflow. Isolates are coloured according to the MLVA-profile. The minimal and maximal SNP distances observed between the five outbreak isolates and the three isolates obtained from the same patient are indicated near the clusters. The trees are drawn to scale, with branch lengths measured in the number of substitutions per site. The scale axis is provided below each tree. BS: bootstrap values.en_ZA
dc.descriptionS5 Fig. Phylogenetic trees generated with the successful SNP-based subtyping workflows using down-sampled dataset and LT2 as a reference genome. (A) CSI-based workflow, (B) PHEnix-based workflow, (C) CFSAN-based workflow, (D) adapted CFSAN-based workflow. Isolates are coloured according to the MLVA-profile. The minimal and maximal SNP distances observed between the five outbreak isolates and the three isolates obtained from the same patient are indicated near the clusters. The trees are drawn to scale, with branch lengths measured in the number of substitutions per site. The scale axis is provided below each tree. BS: bootstrap values.en_ZA
dc.descriptionS6 Fig. Phylogenetic trees generated with the successful SNP-based subtyping workflows using down-sampled dataset supplemented with replicate data and LT2 as a reference genome. (A) CSI-based workflow, (B) PHEnix-based workflow, (C) CFSAN-based workflow, (D) adapted CFSAN-based workflow. The minimal and maximal SNP distances observed between the five outbreak isolates and the three isolates obtained from the same patient are indicated near the clusters. The trees are drawn to scale, with branch lengths measured in the number of substitutions per site. The scale axis is provided below each tree. BS: bootstrap values.en_ZA
dc.descriptionS7 Fig. SNP distance matrices generated with the tested SNP-based subtyping workflows using high-coverage dataset and LT2 as a reference genome. (A) CSI-based workflow, (B) PHEnix-based workflow, (C) adapted PHEnix-based workflow, (D) CFSAN-based workflow, (E) adapted CFSAN-based workflow. Values and colour codes in the SNP distance matrices indicate pairwise SNP distances between isolates. Outbreak isolates are shown in bold and isolates obtained from the same patient are underlined.en_ZA
dc.descriptionS8 Fig. SNP distance matrices generated with the successful SNP-based subtyping workflows using down-sampled dataset and LT2 as a reference genome. (A) CSI-based workflow, (B) PHEnix-based workflow, (C) CFSAN-based workflow, (D) adapted CFSAN-based workflow. Values and colour codes in the SNP distance matrices indicate pairwise SNP distances between isolates. Outbreak isolates are shown in bold and isolates obtained from the same patient are underlined. For the CSI-based workflow, the distances between isolates 12±3582 and 12±3583 versus isolates 12±2984, 12±2998, 12±3067 and 12±3558 dropped from 10±12 SNP positions observed with the normal (high-coverage) dataset to 4±6 positions with the down-sampled dataset. For the CFSAN-based workflow, the distances between isolates 12± 2984, 12±2998, 12±3067 and 12±3558 increased strongly (as far as from 3 to 17 SNPs) with the down-sampled dataset compared to the original data.en_ZA
dc.descriptionS9 Fig. SNP distance matrices generated with the successful SNP-based subtyping workflows using down-sampled dataset supplemented with replicate data and LT2 as a reference genome. (A) CSI-based workflow, (B) PHEnix-based workflow, (C) CFSAN-based workflow, (D) adapted CFSAN-based workflow. Values and colour codes in the SNP distance matrices indicate pairwise SNP distances between isolates. Outbreak isolates are shown in bold and isolates obtained from the same patient are underlined.en_ZA
dc.descriptionS10 Fig. Phylogenetic trees generated with the tested SNP-based subtyping workflows using high-coverage dataset and Sl1344 as a reference genome. (A) CSI-based workflow, (B) PHEnix-based workflow, (C) adapted PHEnix-based workflow, (D) CFSAN-based workflow, (E) adapted CFSAN-based workflow. The minimal and maximal SNP distances observed between the five outbreak isolates and the three isolates obtained from the same patient are indicated near the clusters. The trees are drawn to scale, with branch lengths measured in the number of substitutions per site. The scale axis is provided below each tree. BS: bootstrap values.en_ZA
dc.descriptionS11 Fig. Phylogenetic trees generated with the successful SNP-based subtyping workflows using down-sampled dataset and SL1344 as a reference genome. (A) CSI-based workflow, (B) PHEnix-based workflow, (C) CFSAN-based workflow, (D) adapted CFSAN-based workflow. The minimal and maximal SNP distances observed between the five outbreak isolates and the three isolates obtained from the same patient are indicated near the clusters. The trees are drawn to scale, with branch lengths measured in the number of substitutions per site. The scale axis is provided below each tree. BS: bootstrap values.en_ZA
dc.descriptionS12 Fig. Phylogenetic trees generated with the successful SNP-based subtyping workflows using down-sampled dataset supplemented with replicate data and SL1344 as a reference genome. (A) CSI-based workflow, (B) PHEnix-based workflow, (C) CFSAN-based workflow, (D) adapted CFSAN-based workflow. The minimal and maximal SNP distances observed between the five outbreak isolates and the three isolates obtained from the same patient are indicated near the clusters. The trees are drawn to scale, with branch lengths measured in the number of substitutions per site. The scale axis is provided below each tree. BS: bootstrap values.en_ZA
dc.descriptionS13 Fig. SNP distance matrices generated with the tested SNP-based subtyping workflows using high-coverage dataset and SL1344 as a reference genome. (A) CSI-based workflow, (B) PHEnix-based workflow, (C) adapted PHEnix-based workflow, (D) CFSAN-based workflow, (E) adapted CFSAN-based workflow. Values and colour codes in the SNP distance matrices indicate pairwise SNP distances between isolates. Outbreak isolates are shown in bold and isolates obtained from the same patient are underlined.en_ZA
dc.descriptionS14 Fig. SNP distance matrices generated with the successful SNP-based subtyping workflows using down-sampled dataset and SL1344 as a reference genome. (A) CSI-based workflow, (B) PHEnix-based workflow, (C) CFSAN-based workflow, (D) adapted CFSAN-based workflow. Values and colour codes in the SNP distance matrices indicate pairwise SNP distances between isolates. Outbreak isolates are shown in bold and isolates obtained from the same patient are underlined.en_ZA
dc.descriptionS15 Fig. SNP distance matrices generated with the successful SNP-based subtyping workflows using down-sampled dataset supplemented with replicate data and SL1344 as a reference genome. (A) CSI-based workflow, (B) PHEnix-based workflow, (C) CFSAN-based workflow, (D) adapted CFSAN-based workflow. Values and colour codes in the SNP distance matrices indicate pairwise SNP distances between isolates. Outbreak isolates are shown in bold and isolates obtained from the same patient are underlined.en_ZA
dc.descriptionS1 Table. Performance metrics describing the output of tested SNP-based subtyping workflows and combinations thereof assessed using LT2 as a reference genome. Performance metrics of the workflows were measured using original dataset (OD) and dataset down-sampled to a 30X coverage (30X), with LT2 as a reference genome. ad. CFSAN-based workflow: adapted CFSAN-based workflow. PHEnix + CSI, PHEnix + CFSAN, etc.: refer to a combination of the variant calling rules from the first mentioned workflow with the SNP matrix construction rules of the second mentioned workflow. DP: discriminative power.en_ZA
dc.descriptionS2 Table. Performance metrics describing the output of tested SNP-based subtyping workflows and combinations thereof assessed using SL1344 as a reference genome. Performance metrics of the workflows were measured using original dataset (OD) and dataset down-sampled to a 30X coverage (30X), with SL1344 as a reference genome. ad. CFSAN-based workflow: adapted CFSAN-based workflow. PHEnix + CSI, PHEnix + CFSAN, etc.: refer to a combination of the variant calling rules from the first mentioned workflow with the SNP matrix construction rules of the second mentioned workflow. DP: discriminative power.en_ZA
dc.descriptionS1 File. Perl script used for down-sampling of the sequencing data.en_ZA
dc.description.abstractWhole genome sequencing represents a promising new technology for subtyping of bacterial pathogens. Besides the technological advances which have pushed the approach forward, the last years have been marked by considerable evolution of the whole genome sequencing data analysis methods. Prior to application of the technology as a routine epidemiological typing tool, however, reliable and efficient data analysis strategies need to be identified among the wide variety of the emerged methodologies. In this work, we have compared three existing SNP-based subtyping workflows using a benchmark dataset of 32 Salmonella enterica subsp. enterica serovar Typhimurium and serovar 1,4,[5],12:i:- isolates including five isolates from a confirmed outbreak and three isolates obtained from the same patient at different time points. The analysis was carried out using the original (high-coverage) and a down-sampled (low-coverage) datasets and two different reference genomes. All three tested workflows, namely CSI Phylogeny-based workflow, CFSAN-based workflow and PHEnix-based workflow, were able to correctly group the confirmed outbreak isolates and isolates from the same patient with all combinations of reference genomes and datasets. However, the workflows differed strongly with respect to the SNP distances between isolates and sensitivity towards sequencing coverage, which could be linked to the specific data analysis strategies used therein. To demonstrate the effect of particular data analysis steps, several modifications of the existing workflows were also tested. This allowed us to propose data analysis schemes most suitable for routine SNP-based subtyping applied to S. Typhimurium and S. 1,4,[5],12:i:-. Results presented in this study illustrate the importance of using correct data analysis strategies and to define benchmark and fine-tune parameters applied within routine data analysis pipelines to obtain optimal results.en_ZA
dc.description.departmentGeneticsen_ZA
dc.description.librarianam2018en_ZA
dc.description.sponsorshipRP/PJ WIV- ISP (NeXSplorer.iph), the Federal Public Service of Health, Food Chain Safety and Environment. The National Reference Centre for Salmonella and Shigella is partially supported by the Belgian Ministry of Social Affairs through a fund within the Health Insurance System.en_ZA
dc.description.urihttp://www.plosone.orgen_ZA
dc.identifier.citationSaltykova A, Wuyts V, Mattheus W, Bertrand S, Roosens NHC, Marchal K, et al. (2018) Comparison of SNP-based subtyping workflows for bacterial isolates using WGS data, applied to Salmonella enterica serotype Typhimurium and serotype 1,4,[5],12:i:-. PLoS ONE 13(2): e0192504. https://DOI.org/ 10.1371/journal.pone.0192504.en_ZA
dc.identifier.issn1932-6203 (online)
dc.identifier.other10.1371/journal.pone.0192504
dc.identifier.urihttp://hdl.handle.net/2263/64330
dc.language.isoenen_ZA
dc.publisherPublic Library of Scienceen_ZA
dc.rights© 2018 Saltykova et al. This is an open access article distributed under the terms of the Creative Commons Attribution License.en_ZA
dc.subjectBacterial pathogensen_ZA
dc.subjectBacterium isolateen_ZA
dc.subjectPhylogenyen_ZA
dc.subjectSalmonella enterica serovar Typhimuriumen_ZA
dc.subjectSNP-based subtyping workflowen_ZA
dc.subjectWhole genome sequencing (WGS)en_ZA
dc.titleComparison of SNP-based subtyping workflows for bacterial isolates using WGS data, applied to Salmonella enterica serotype Typhimurium and serotype 1,4,[5],12:i:-en_ZA
dc.typeArticleen_ZA

Files

Original bundle

Now showing 1 - 5 of 19
Loading...
Thumbnail Image
Name:
Saltykova_Comparison_2018.pdf
Size:
4.51 MB
Format:
Adobe Portable Document Format
Description:
Article
Loading...
Thumbnail Image
Name:
Saltykova_ComparisonFigS1_2018.tiff
Size:
621.94 KB
Format:
Tag Image File Format
Description:
Figure S1
Loading...
Thumbnail Image
Name:
Saltykova_ComparisonFigS2_2018.tif
Size:
2.92 MB
Format:
Tag Image File Format
Description:
Figure S2
Loading...
Thumbnail Image
Name:
Saltykova_ComparisonFigS3_2018.tiff
Size:
195.77 KB
Format:
Tag Image File Format
Description:
Figure S3
Loading...
Thumbnail Image
Name:
Saltykova_ComparisonFigS4_2018.tif
Size:
1.3 MB
Format:
Tag Image File Format
Description:
Figure S4

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.75 KB
Format:
Item-specific license agreed upon to submission
Description: