BACKROUND: De novo assembly of transcript sequences produced by short-read DNA sequencing technologies offers a rapid approach to obtain expressed gene catalogs for non-model organisms. A draft genome sequence will be produced in 2010 for a Eucalyptus tree species (E. grandis) representing the most important hardwood fibre
crop in the world. Genome annotation of this valuable woody plant and genetic dissection of its superior growth and productivity will be greatly facilitated by the availability of a comprehensive collection of expressed gene
sequences from multiple tissues and organs.
RESULTS: We present an extensive expressed gene catalog for a commercially grown E. grandis × E. urophylla hybrid clone constructed using only Illumina mRNA-Seq technology and de novo assembly. A total of 18,894 transcriptderived
contigs, a large proportion of which represent full-length protein coding genes were assembled and annotated. Analysis of assembly quality, length and diversity show that this dataset represent the most comprehensive expressed gene catalog for any Eucalyptus tree. mRNA-Seq analysis furthermore allowed digital expression profiling of all of the assembled transcripts across diverse xylogenic and non-xylogenic tissues, which is invaluable for ascribing putative gene functions.
CONCLUSIONS: De novo assembly of Illumina mRNA-Seq reads is an efficient approach for transcriptome sequencing and profiling in Eucalyptus and other non-model organisms. The transcriptome resource (Eucspresso, http://
eucspresso.bi.up.ac.za/) generated by this study will be of value for genomic analysis of woody biomass production in Eucalyptus and for comparative genomic analysis of growth and development in woody and herbaceous plants.