Abstract:
Metagenomic approaches have revealed the complexity of environmental microbiomes and the advancement in whole genome sequencing showed a significant level of genetic heterogeneity on species level. It has become clear that a superior pattern of bioactivity of bacteria applicable in biotechnology, as well as the enhanced virulence of pathogens, often requires distinguishing between closely related species or sub-species. Current methods for binning of metagenomic reads usually do not allow identification below the genus level and very often, stop at the level of families.
In this work, an attempt was made to improve metagenome binning resolution by creating genome-specific barcodes, based on the core and accessory gene sequences. This protocol was implemented in novel software tools available for use and download from http://bargene.bi.up.ac.za/. The most abundant barcode genes from the core genomes were found to encode for ribosomal proteins, some other central metabolic genes and ABC transporters. The performance of the created metabarcode sequences was evaluated using artificially generated and publicly available metagenomic datasets. Furthermore, a program, Barcoding 2.0, was developed to align reads against barcode sequences and calculate various parameters for scoring the alignment results and individual barcodes. Taxonomic units were identified in metagenomic samples by comparison of the calculated barcode scores to set cut-off values. In the study, it was found that varying sample sizes, i.e. the number of reads in a metagenome and metabarcode lengths had no significant effect on the sensitivity and specificity of the algorithm. Receiver operating characteristics curves were calculated for different taxonomic groups based on the results of identification of the corresponding genomes in artificial metagenomic datasets and the reliability of distinguishing between species of the same genus or family by the program was close to 100%.
The results showed that the novel online tool, BarcodeGenerator (http://bargene.bi.up.ac.za/), was an efficient approach to generating barcode sequences from a set of complete genomes provided by users. Another program, Barcoder 2.0, was made available from the same resource to enable efficient and practical use of metabarcodes for visualisation of distribution of organisms of interest in environmental and clinical samples.