Abstract:
BACKGROUND: Data mining in large DNA sequences is a major challenge in microbial genomics and
bioinformatics. Oligonucleotide usage (OU) patterns provide a wealth of information for large scale sequence analysis and visualization. The purpose of this research was to make OU statistical analysis available as a novel web-based tool for functional genomics and annotation. The tool is also available as a downloadable package.
RESULTS: The SeqWord Genome Browser (SWGB) was developed to visualize the natural
compositional variation of DNA sequences. The applet is also used for identification of divergent genomic regions both in annotated sequences of bacterial chromosomes, plasmids, phages and viruses, and in raw DNA sequences prior to annotation by comparing local and global OU patterns.
The applet allows fast and reliable identification of clusters of horizontally transferred genomic islands, large multi-domain genes and genes for ribosomal RNA. Within the majority of genomic fragments (also termed genomic core sequence), regions enriched with housekeeping genes, ribosomal proteins and the regions rich in pseudogenes or genetic vestiges may be contrasted.
CONCLUSION: The SWGB applet presents a range of comprehensive OU statistical parameters
calculated for a range of bacterial species, plasmids and phages.