S1 Fig. Comparison of the sequencing samples based on the read mapping statistics. Read
mapping statistics were obtained from Qualimap reports of the raw reads mapped on LT2 and
SL1344 reference genomes, and re-plotted in R to improve visualization.
S2 Fig. Original genome coverage plots generated by Qualimap with LT2 and SL1344 reference
genomes.
S3 Fig. Comparison of CFSAN and PHEnix variant selection procedures.
S4 Fig. Phylogenetic trees generated with the tested SNP-based subtyping workflows using
high-coverage dataset and LT2 as a reference genome. (A) CSI-based workflow, (B) PHEnix-
based workflow, (C) adapted PHEnix-based workflow, (D) CFSAN-based workflow, (E)
adapted CFSAN-based workflow. Isolates are coloured according to the MLVA-profile. The
minimal and maximal SNP distances observed between the five outbreak isolates and the three
isolates obtained from the same patient are indicated near the clusters. The trees are drawn to
scale, with branch lengths measured in the number of substitutions per site. The scale axis is provided below each tree. BS: bootstrap values.
S5 Fig. Phylogenetic trees generated with the successful SNP-based subtyping workflows
using down-sampled dataset and LT2 as a reference genome. (A) CSI-based workflow, (B)
PHEnix-based workflow, (C) CFSAN-based workflow, (D) adapted CFSAN-based workflow.
Isolates are coloured according to the MLVA-profile. The minimal and maximal SNP distances
observed between the five outbreak isolates and the three isolates obtained from the
same patient are indicated near the clusters. The trees are drawn to scale, with branch lengths
measured in the number of substitutions per site. The scale axis is provided below each tree.
BS: bootstrap values.
S6 Fig. Phylogenetic trees generated with the successful SNP-based subtyping workflows
using down-sampled dataset supplemented with replicate data and LT2 as a reference
genome. (A) CSI-based workflow, (B) PHEnix-based workflow, (C) CFSAN-based workflow,
(D) adapted CFSAN-based workflow. The minimal and maximal SNP distances observed
between the five outbreak isolates and the three isolates obtained from the same patient are
indicated near the clusters. The trees are drawn to scale, with branch lengths measured in the
number of substitutions per site. The scale axis is provided below each tree. BS: bootstrap values.
S7 Fig. SNP distance matrices generated with the tested SNP-based subtyping workflows
using high-coverage dataset and LT2 as a reference genome. (A) CSI-based workflow, (B)
PHEnix-based workflow, (C) adapted PHEnix-based workflow, (D) CFSAN-based workflow,
(E) adapted CFSAN-based workflow. Values and colour codes in the SNP distance matrices
indicate pairwise SNP distances between isolates. Outbreak isolates are shown in bold and isolates
obtained from the same patient are underlined.
S8 Fig. SNP distance matrices generated with the successful SNP-based subtyping workflows
using down-sampled dataset and LT2 as a reference genome. (A) CSI-based workflow,
(B) PHEnix-based workflow, (C) CFSAN-based workflow, (D) adapted CFSAN-based workflow.
Values and colour codes in the SNP distance matrices indicate pairwise SNP distances
between isolates. Outbreak isolates are shown in bold and isolates obtained from the same
patient are underlined. For the CSI-based workflow, the distances between isolates 12±3582
and 12±3583 versus isolates 12±2984, 12±2998, 12±3067 and 12±3558 dropped from 10±12
SNP positions observed with the normal (high-coverage) dataset to 4±6 positions with the
down-sampled dataset. For the CFSAN-based workflow, the distances between isolates 12±
2984, 12±2998, 12±3067 and 12±3558 increased strongly (as far as from 3 to 17 SNPs) with the
down-sampled dataset compared to the original data.
S9 Fig. SNP distance matrices generated with the successful SNP-based subtyping workflows
using down-sampled dataset supplemented with replicate data and LT2 as a reference
genome. (A) CSI-based workflow, (B) PHEnix-based workflow, (C) CFSAN-based
workflow, (D) adapted CFSAN-based workflow. Values and colour codes in the SNP distance
matrices indicate pairwise SNP distances between isolates. Outbreak isolates are shown in bold
and isolates obtained from the same patient are underlined.
S10 Fig. Phylogenetic trees generated with the tested SNP-based subtyping workflows
using high-coverage dataset and Sl1344 as a reference genome. (A) CSI-based workflow, (B)
PHEnix-based workflow, (C) adapted PHEnix-based workflow, (D) CFSAN-based workflow,
(E) adapted CFSAN-based workflow. The minimal and maximal SNP distances observed
between the five outbreak isolates and the three isolates obtained from the same patient are
indicated near the clusters. The trees are drawn to scale, with branch lengths measured in the
number of substitutions per site. The scale axis is provided below each tree. BS: bootstrap values.
S11 Fig. Phylogenetic trees generated with the successful SNP-based subtyping workflows
using down-sampled dataset and SL1344 as a reference genome. (A) CSI-based workflow,
(B) PHEnix-based workflow, (C) CFSAN-based workflow, (D) adapted CFSAN-based workflow.
The minimal and maximal SNP distances observed between the five outbreak isolates
and the three isolates obtained from the same patient are indicated near the clusters. The trees
are drawn to scale, with branch lengths measured in the number of substitutions per site. The
scale axis is provided below each tree. BS: bootstrap values.
S12 Fig. Phylogenetic trees generated with the successful SNP-based subtyping workflows
using down-sampled dataset supplemented with replicate data and SL1344 as a reference
genome. (A) CSI-based workflow, (B) PHEnix-based workflow, (C) CFSAN-based workflow,
(D) adapted CFSAN-based workflow. The minimal and maximal SNP distances observed
between the five outbreak isolates and the three isolates obtained from the same patient are indicated
near the clusters. The trees are drawn to scale, with branch lengths measured in the number
of substitutions per site. The scale axis is provided below each tree. BS: bootstrap values.
S13 Fig. SNP distance matrices generated with the tested SNP-based subtyping workflows
using high-coverage dataset and SL1344 as a reference genome. (A) CSI-based workflow,
(B) PHEnix-based workflow, (C) adapted PHEnix-based workflow, (D) CFSAN-based workflow,
(E) adapted CFSAN-based workflow. Values and colour codes in the SNP distance matrices
indicate pairwise SNP distances between isolates. Outbreak isolates are shown in bold and
isolates obtained from the same patient are underlined.
S14 Fig. SNP distance matrices generated with the successful SNP-based subtyping workflows
using down-sampled dataset and SL1344 as a reference genome. (A) CSI-based workflow,
(B) PHEnix-based workflow, (C) CFSAN-based workflow, (D) adapted CFSAN-based
workflow. Values and colour codes in the SNP distance matrices indicate pairwise SNP distances
between isolates. Outbreak isolates are shown in bold and isolates obtained from the
same patient are underlined.
S15 Fig. SNP distance matrices generated with the successful SNP-based subtyping workflows
using down-sampled dataset supplemented with replicate data and SL1344 as a reference
genome. (A) CSI-based workflow, (B) PHEnix-based workflow, (C) CFSAN-based
workflow, (D) adapted CFSAN-based workflow. Values and colour codes in the SNP distance
matrices indicate pairwise SNP distances between isolates. Outbreak isolates are shown in bold
and isolates obtained from the same patient are underlined.
S1 Table. Performance metrics describing the output of tested SNP-based subtyping workflows
and combinations thereof assessed using LT2 as a reference genome. Performance
metrics of the workflows were measured using original dataset (OD) and dataset down-sampled
to a 30X coverage (30X), with LT2 as a reference genome. ad. CFSAN-based workflow:
adapted CFSAN-based workflow. PHEnix + CSI, PHEnix + CFSAN, etc.: refer to a combination
of the variant calling rules from the first mentioned workflow with the SNP matrix construction
rules of the second mentioned workflow. DP: discriminative power.
S2 Table. Performance metrics describing the output of tested SNP-based subtyping workflows
and combinations thereof assessed using SL1344 as a reference genome. Performance
metrics of the workflows were measured using original dataset (OD) and dataset down-sampled
to a 30X coverage (30X), with SL1344 as a reference genome. ad. CFSAN-based workflow:
adapted CFSAN-based workflow. PHEnix + CSI, PHEnix + CFSAN, etc.: refer to a combination
of the variant calling rules from the first mentioned workflow with the SNP matrix construction
rules of the second mentioned workflow. DP: discriminative power.
S1 File. Perl script used for down-sampling of the sequencing data.