Metagenomic Approach towards Bioprospection of Novel Biomolecule(s) and Environmental Bioremediation

Microorganisms have developed several physiological adaptations to survive within extreme ecological niches including environments contaminated with heavy metals, pesticides, polycyclic aromatic hydrocarbons, and nuclear wastes. Microorganisms in extreme habitat are potential source of “novel biomolecule(s)” such as whole microbial cells, extremozymes and extremolytes, significantly required for environmental, industrial, and red medical/pharmaceutical biotechnology. These novel biomolecule(s) are valuable resources and may help improve economic development. The scanty information about the factors governing the microbial growth within stressed Review Article Tripathi et al.; ARRB, 22(2): 1-12, 2018; Article no.ARRB.38385 2 environments is the major constraint in the recovery of novel biomolecule(s) from extreme habitats. Understanding the structure, metabolic capabilities, microbial physiology, and factors governing the composition and role of indigenous microorganism is the key to success of any study. In recent past the problems associated with classical cultivation techniques have been resolved by an emerging approach referred to as “metagenomics”. Metagenomic studies give an insight into details of the structure, metabolic and physiological capabilities of indigenous microbial communities. High-throughput sequencing technologies in conjunction with metagenomics has aided in the identification and characterization of novel culturable and uncultured microorganisms with unique capabilities. Metagenomic studies have been used for isolation and characterization of novel biomolecule(s) relevant for white, grey, and red biotechnologies. The major objective of this review is to discuss the applications of metagenomic approach for bioprospection of novel biomolecule(s) and environmental bioremediation.


INTRODUCTION
The best indicator for the existence of uncultured microorganisms is "great plate count anomaly", which is defined as the variance between the estimated populations of microbial communities determined using serial dilution method and microscopic observations [1].Since more than 99% of microorganisms in the environment are unculturable, the microbiologists focused mainly on culturable microorganisms for the discovery of novel biomolecule(s).This reflects that a substantial portion of microbial community has remained unexplored due to our inabilities to grow them in artificial culture media.Lane et al. [2] described a new method based on 16S rRNA gene sequence analysis for phylogenetic study of microbial community.A new method based on direct analysis of universal markers of taxonomy including 5S and 16S rRNA gene has been described for identification of microbial communities within an environment without culturing the microorganism [3].Moreover, it's not always easy to isolate and culture some of the microorganisms (unculturable) due to their unknown physiological requirements.In addition to evolutionary studies, sequencing of 16S rRNA gene provides information regarding physiological requirements for culturing the unculturable microorganisms.Recent advancement in metagenomics has revolutionized the studies in microbiology.Metagenomic approach, whereby the whole DNA is directly extracted from sample and sequenced using the next generation sequencing platforms (Illumina, SOLiD, PacBio, Oxford NanoPore) can be used to circumvent the problem associated with the isolation of unculturable microorganisms [4].The next generation sequencing technologies produces large amount of data with high accuracy, seamlessly suitable for the shotgun metagenomic sequencing projects.The longest reads produced by Illumina platform are up to 250 x 2 (paired end) with the accuracy more than 99.9% [5].Conversely, relatively new and developing sequencing technologies produce much longer reads such as PacBio (read length 10-15Kb) and Oxford Nanopore (read length up to 900 Kb) with high error rates up to 15% [5,6].
The next generation sequencing and application of bioinformatics enabled researchers to reveal untapped microbial diversity (uncultivable) using environmental DNA (Fig. 1).The insight of metabolic pathways of indigenous microbial community derived from analysis of metagenome can further be applied to design the special media and conditions for enrichment and isolation of novel uncultured microorganisms possessing unique capabilities [7,8].In addition to novel biocatalysts, novel microorganisms can be potential sources of protective organic biomolecule(s) (extremolytes) and other bioproducts (exopolysacccharide, biosurfactant, peptide and biopolymers) that enable them to survive under harsh environmental conditions.Metagenomics can reveal information about the identity and physiological requirements by applying the sequence based strategy i.e. "gene centric" or "genome centric" analysis of metagenomics DNA [9].The gene centric approach is to study the metagenomics data for the presence of different taxa and metabolic and functional genes in the environmental DNA, whereas genome centric approach is to reconstruct the complete or near complete genomes from the environmental sequence data [9].Moreover, metagenomics has been used for acceleration of the process of in situ bioremediation.In the process of bioremediation, the decontamination rate is accelerated either by addition of whole cells (bioaugmentaion) or ratelimiting nutrients such as electron acceptors (biostimulation) [11,12].Novel uncultured microorganisms (biomolecule) isolated from contaminated sites can be used for bioaugmentation.
On the other hand, physiological as well as nutritional requirements of indigenous microorganisms derived from metagenomic analysis of contaminated sites can be used for biostimulation of indigenous microbial community.
The development of novel biomolecule(s) is the primary requirement for the economic growth of white (industrial) biotechnological industries in a sustainable manner.In addition metagenomics can be successfully applied to solve the practical challenges in the field of medicinal and agricultural sector.Besides the application of metagenomics for tapping hidden microbial diversity (discovery of novel biocatalysts from uncultivabale microorganisms), it also has wide applications in the field of environmental microbiology such as bioremediation and metabolism of xenobiotics.

METAGENOMICS-BIOPROSPECTION OF NOVEL BIOMOLECULE(S)
The growth of white (industrial) biotechnology requires the development of novel enzymes and different other products which are relevant to mankind.Application of metagenomics promises the production of new molecules with diverse functions.With the advent of metagenomics we can target microorganisms which are recalcitrant to isolation on common laboratory medium due to the lack of nutrients or inappropriate incubation temperature and/or pressure and composition of atmospheric gases [13].
Phylogenetic studies based on molecular tools have confirmed that representation of cultured bacteria is only a miniature (< 1%) of bacterial diversity, as culturing conditions do not imitate the natural habitat conditions [14,15].The great majority of microorganisms that cannot be grown in laboratory are known as 'unculturables' but are now often referred to as 'yet to be cultured' or 'yet uncultured' since it is likely that when provided right conditions we would be able to culture these organisms.Consequently, the conventional methods of cultivation are insufficient for deciphering the substantial reservoir of unseen natural diversity of 'yet to be cultured' microorganisms.To overcome the natural loss of diversity due to unknown culture conditions, the collective genomes of all the microorganisms (metagenome) present in the particular habitat, could to be used in the metagenomic approach (cultivation independent approach).Metagenomic analyses of universal marker of taxonomy such as 16S rRNA gene would help to identify the diversity, metabolic capabilities and physiological requirement of indigenous microbial communities of a given habitat [13,16].Bacterial 16S rRNA genes are comprised of nine hyper-variable regions (V1-V9) which reflects significant sequence variation among different bacteria.Initial phylogenetic studies included Sanger-sequencing strategies where complete 1.5 kb fragments of 16S rRNA gene were targeted for sequencing and identification.Nowadays, the hypervariable regions (V1-V9) that are interspersed within conserved regions (C1-C9) are targeted for sequencing and identification (Fig. 2).
Sequencing of hypervariable regions enabled researchers to identify many individual genera that could not have been identified with traditional biochemical methods.Moreover, structural and physiological information of indigenous microbial communities derived by metagenomic approach is used for designing or selection of special media and conditions for enrichment and isolation of novel microorganisms which are "yet to be cultured".These un-cultivable microorganisms are untapped source of different novel biomolecule(s) including biocatalysts, antibiotics, extremolytes, exopolymeric substances (EPS), and other bio-products that enables them to survive under harsh environments (Fig. 3).Microbial extremolytes are primarily used in cosmetic and medical sectors, and some examples include UV radiation-protective compounds such as melanin, ectoines and scytonemin [17,18].

Fig. 3. Schematic illustration representing applications of metagenomic approach for bioprospection of novel biomolecule(s) and acceleration of environmental bioremediation
Metagenomic analysis has emerged as a powerful technique for exploiting different biomolecules from the hidden diversity present in our environment [13].Metagenomic approach involves extraction of DNA from the environmental sample, cloning DNA in a suitable vector (cosmid, fosmid or BAC vectors), transformation of recombinant vector into a suitable host bacterium, and eventually screening of the clones for a particular function.The metagenomic clones are screened following "function based" or "sequence based" approaches.In "function based" approach recombinant clones are screened for the desired activity on different media containing a suitable substrate [13,16,19].However function based metagenomics has its limitations such as incompatibility of gene expression between host and cloned DNA.On the other hand, sequence based screening has been widely accepted which involve the generation of metagenomics DNA library, sequencing on the next generation sequencing platform, assembly of the raw data into longer contigs and identification of the protein sequences by aligning database against reference databases such as NCBI-NR or NCBI-Refseq.The taxonomic and functional annotation of the metagenomics data can be further taken into consideration for the next step of gene synthesis, codon optimization according to the destination expression in host, and biochemical characterization of identified enzymes.However, it is potentially possible that limited number of diverse sequences in the database and large number of novel sequences in the environmental sample is problematic for the annotations of metagenomics sequences.Moreover, sequence based screening methods often end with partial genes, which is a major drawback because it does not allow inferring the functionality of encoded enzymes [20].
With the onset of next generation sequencing (NGS) and bioinformatic tools, sequence based screening has facilitated the identification, isolation and characterization of several novel enzymes of industrial importance, and phylogenetic studies [21].The gene encoding polyketide synthase involved in polyketide synthesis was cloned from soil metagenome using sequence based approach [13].Moreover, shotgun sequencing is not cost effective with the target to discover genes encoding specific enzymes.In contrast, function based screening methods involve the selection of clones expressing desired traits, and further characterization of genes and encoded products [21].The screening of metagenomic libraries on

Sample
the basis of function has enabled identification and isolation of several genes encoding antibiotics, genes conferring resistance to antibiotics, Na+(Li+)/H+ transporters, and various degradative enzymes [22].Genes encoding novel lipases [23], proteases [24], amylases [25], melanin, antibiotics [13,26], and several other biomolecule(s) have been identified and characterized.Using functional screening method a novel phosphodiesterase enzyme has been identified and characterized from an Indian coal bed [27].However, major limitations for the functional screening method are the requirement for the presence of a cluster of all genes required for particular function and their expression within a heterologous host.An appropriate assay is also required for the detection of the function [28].To resolve the problems related to screening, different alternate strategies have been explored.In addition to E. coli, heterologous expression of secondary metabolites has been optimized in Streptomyces lividans or Pseudomonas putida [29,30].Using standard inhibition assay, terragine, an antibiotic against Mycobacterium has been discovered from a metagenomic clone expressed in Streptomyces lividans [31].Databases and bioinformatic tools used in metagenomic studies are listed below (Table 1).Some of the enzymes discovered through metagenomic approach are listed in Table 2

METAGENOMICS-ECOLOGICAL INFERENCES
Certain groups of microorganisms make symbiotic relationship with eukaryotes, and compete for nutrients to produce energy [13].Several medicinally important natural products have been obtained from invertebrate animals, e.g., sponges.Some sponges host large numbers of bacteria (40-60% of total biomass) within their tissues.Many bacterial symbionts do not readily grow on culture medium because of highly specialized and ancient relationship with their hosts [40].Using metagenomic analysis of 16S rRNA gene sequences, different microorganisms have been identified from various environments such as sponges, acid mine drainage, extreme desert environments, etc. [22].Such studies have been conducted with Cenarchaeum symbiosum: marine sponge symbiosis [41], Pseudomonas-like bacterium: Paederus beetles symbiosis [42], Buchnera aphidicola: aphids [43], and Proteobacterium: Riftia pachyptila symbiosis [44].Riftia pachyptila is a tube worm, dwelling 2600 m under the sea surface, close to the thermal vents rich in sulfide and having temperature about 400 °C.Tube worms lack mouth and digestive tract and thus, completely rely on symbiotic bacterial partner for food and energy.The worms transport hydrogen sulfide and CO 2 from environment to the bacterial symbiont.The bacterial partner residing inside the trophosome of worm oxidize hydrogen sulfide for CO 2 fixation, and provide food and energy to worm.The bacterium has not been isolated in pure culture on synthetic medium, while, following metagenomic analysis the bacterium was identified as γ-Proteobacteria.With the objective to understand the physiology of bacterial symbiont a metagenomic DNA library was constructed in fosmid vector [45].Robinson et al. [46], identified a gene encoding RubisCO enzyme following screening of fosmid library, which was similar to the RubisCO of Rhodospirillum rubrum.Moreover, metagenomic analysis revealed the presence of gene which encode for flagellin, and this gene was heterologously over-expressed within E. coli and was involved in synthesis of bacterial flagella.This suggested that endosymbiont may be freeliving and infects every generation of the worm instead of being moved through maternal transfer [47].Certain groups of microorganisms exert competition for resources and survival.Sequence based screening for genes conferring resistance is difficult, however functional genomics can be helpful in identification of genes encoding biologically active compounds that confer resistance to their producer organisms [13].Intriguingly, metagenomics also provides insight into paleogenomic, and metagenomic approach coupled to high-throughput sequencing is an efficient tool to study the nuclear genome of extinct microorganisms.Phylogenetic study has been performed by comparative analysis of 27 kb metagenomic DNA sequence of Ursusspelaus (cave bear) and PCR amplified orthologus sequence from black brown and polar bears [22].

METAGENOMICS-ENVIRONMENTAL BIOREMEDIATION
Environmental pollution is a serious global concern.All biological processes involved in removal or transformation of pollutants from the environment are referred to as bioremediation.In recent years, microorganisms mainly bacteria are used for bioremediation of pollutants at different sites due to their ability to resist, transform or degrade the pollutants.The former utilizes the indigenous microorganisms present at site, whereas engineered bioremediation utilizes different process such as availability of water, aeration, nutrients amendments etc.
In ex-situ processes contaminated sample is moved to another location before treatment.In the process of bioremediation, the decontamination rate is accelerated either by addition of whole cells (bioaugmentaion) or rate-limiting nutrients/ electron acceptors (biostimulation) [11,12].The major constrain in bioremediation is the requirement of information about parameters which govern the growth and metabolism of indigenous microorganisms residing within contaminated environments [51].A better understanding of microbial adaptations to environment is key to success of any bioremediation study [52].With conventional cultivation techniques it was difficult to ascertain the role of isolated microorganism for in situ bioremediation.The 16S rRNA gene sequence based cultivation-independent analyses help to understand the structure, metabolic capabilities, physiology, and other factors governing the indigenous microbial community (Fig. 1).However, only 16S rRNA gene based study cannot reveal the biochemical and genetic basis of bioremediation.Microarray technique viz., GeoChip [53] allows the fast and economical profiling of sample.Using GeoChip the presence of genes required for anaerobic degradation of alkane, toluene and ethyl benzene has been detected in the formation water of San Juan Basin [54].Unlike, 16S rRNA gene sequence based studies; the disadvantage of microarray approach is that it can be used only to analyze the targets which are identical to already known gene sequences.Sampling based studies are, thus, troubled by regular re-isolation of abundant sequences at the cost of identifying atypical species.Therefore, metagenomic approaches are typically employed with a view to identify functional genes of metabolic pathways [55].Metagenomic studies provide all the essential information for in situ bioremediation, and are centered on reconstruction of the genome for "yet uncultured" microorganisms.The newly reconstructed genomes of "yet uncultured" microorganisms are screened for novel biomolecule(s) involved in the process of bioremediation.A microcosm study employing the enrichment technique (using mercury, ethanol and diesel) followed by next generation sequencing and genome reconstruction produces the genome bins of selectively enriched bacteria.Another genome resolved metagenomic study analyzing the microbial communities from the thiocynate degrading bioreactors (solid particulate tailing and solid free reactor) revealed the differences in the bacterial strains and their metabolic capacities.Metagenomics based phylogenetic studies provide metabolic and physiological information required for the isolation of unculturable microorganisms.This information is used for the isolation of unculturables or acceleration of bioremediation by bioaugmentation and biostimulation processes.Nutritional and physiological information derived from cultivation independent analysis of microbial community associated with Indian coal bed has been utilized for accelerating the rate of coal biotransformation into methane gas [56].The advantage of bioaugmentation and biostimulation approach is due to its economical and ecofriendly nature.However, some factors which can limit the process of bioaugmentation and biostimulation are strain selection, strain adaptation and survival, differential environmental constrains, and process of introducing microbial culture and nutrient supplements to contaminated sites [55].

CONCLUSIONS
Metagenomics is a promising field that has enabled microbiologists to access hidden microbial resources relevant for different biotechnological sectors.Comprehensive study of composition and function of indigenous microbial community enables to find novel biomolecule(s).Earlier the microbiological processes for the production of biomolecules were labor and cost intensive while following the start of metagenomics it has become time saving and cost effective development of technology for the same.With the start of high-throughput sequencing technology and functional screening of metagenomics libraries facilitate the detection of novel biocatalysts and other bioactive molecules which are indispensable for mankind.The large data set derived from metagenomic studies are used to construct database which can be exploited to discover potentially undiscovered biomolecule(s).Future alteration of methods that enrich the genes encoding any meticulous function would help out in discovery of novel biomolecule(s).
In addition, metagenomics has potential application in the field of environmental bioremediation.Metagenomics based metabolic and physiological assessment of indigenous microbial community support to design technologies for efficient in situ bioremediation.

Fig. 1 .
Fig. 1.Schematics presentation of essential steps to explore and exploit the genomic diversity of soil microbial communities by metagenomics approach; nucleic acid is extracted from the environmental sample (metagenome) or meta transcriptomes (environmental mRNA).Downstream screening approaches can be activity-based through the screening of expression libraries, sequence-dependent or sequence-independent.The final expression requires a fulllength open reading frame (ORF) expressed in a suitable host to generate a functional gene product (Adapted from Cowan et al. [10])

Table 1 . List of important bioinformatic tools and databases used in next generation sequencing based metagenomics analysis Database/Software Application References (Link websites) 16S rRNA and marker genes based tools
. http://www.ncbi.nlm.nih.gov/RefSeq