Abstract:
With the decreasing cost of sequencing and availability of larger numbers of sequenced
genomes, comparative genomics is becoming increasingly attractive to complement experimental
techniques for the task of transcription factor (TF) binding site identification. In this study, we
redesigned BLSSpeller, a motif discovery algorithm, to cope with larger sequence datasets.
BLSSpeller was used to identify novel motifs in Zea mays in a comparative genomics setting
with 16 monocot lineages. We discovered 61 motifs of which 20 matched previously described
motif models in Arabidopsis. In addition, novel, yet uncharacterized motifs were detected, several
of which are supported by available sequence-based and/or functional data. Instances of the
predicted motifs were enriched around transcription start sites and contained signatures of selection.
Moreover, the enrichment of the predicted motif instances in open chromatin and TF
binding sites indicates their functionality, supported by the fact that genes carrying instances of
these motifs were often found to be co-expressed and/or enriched in similar GO functions.
Overall, our study unveiled several novel candidate motifs that might help our understanding of
the genotype to phenotype association in crops.