Abstract:
BACKGROUND : Metagenomic approaches have revealed the complexity of environmental microbiomes with the
advancement in whole genome sequencing displaying a significant level of genetic heterogeneity on the species
level. It has become apparent that patterns of superior bioactivity of bacteria applicable in biotechnology as well as
the enhanced virulence of pathogens often requires distinguishing between closely related species or sub-species.
Current methods for binning of metagenomic reads usually do not allow for identification below the genus level
and generally stops at the family level.
RESULTS : In this work, an attempt was made to improve metagenomic binning resolution by creating genome
specific barcodes based on the core and accessory genomes. This protocol was implemented in novel software
tools available for use and download from http://bargene.bi.up.ac.za/. The most abundant barcode genes from the
core genomes were found to encode for ribosomal proteins, certain central metabolic genes and ABC transporters.
Performance of metabarcode sequences created by this package was evaluated using artificially generated and
publically available metagenomic datasets. Furthermore, a program (Barcoding 2.0) was developed to align reads
against barcode sequences and thereafter calculate various parameters to score the alignments and the individual
barcodes. Taxonomic units were identified in metagenomic samples by comparison of the calculated barcode
scores to set cut-off values. In this study, it was found that varying sample sizes, i.e. number of reads in a metagenome
and metabarcode lengths, had no significant effect on the sensitivity and specificity of the algorithm. Receiver
operating characteristics (ROC) curves were calculated for different taxonomic groups based on the results of
identification of the corresponding genomes in artificial metagenomic datasets. The reliability of distinguishing
between species of the same genus or family by the program was nearly perfect.
CONCLUSIONS : The results showed that the novel online tool BarcodeGenerator (http://bargene.bi.up.ac.za/) is
an efficient approach for generating barcode sequences from a set of complete genomes provided by users.
Another program, Barcoder 2.0 is available from the same resource to enable an efficient and practical use of
metabarcodes for visualization of the distribution of organisms of interest in environmental and clinical samples.