Assembly, annotation and polymorphism analysis of a draft transcriptome sequence for a fast-growing Eucalyptus plantation tree

Show simple item record

dc.contributor.advisor Joubert, Fourie en
dc.contributor.advisor Myburg, Alexander Andrew en
dc.contributor.postgraduate Hefer, Charles Amadeus en
dc.date.accessioned 2013-09-07T14:20:05Z
dc.date.available 2011-10-24 en
dc.date.available 2013-09-07T14:20:05Z
dc.date.created 2011-09-09 en
dc.date.issued 2011-10-24 en
dc.date.submitted 2011-10-18 en
dc.description Thesis (PhD)--University of Pretoria, 2011. en
dc.description.abstract Ultra-high throughput DNA sequencing technologies have rapidly changed the face of genomic research projects. Technologies such as mRNA-Seq have the potential to rapidly profile the expressed gene-catalog of non-model organisms, albeit with significant bioinformatics related costs and support required. This study developed automated data analysis workflows focused on the quality evaluation of mRNA-Seq reads, de novo transcriptome assembly, transcriptome annotation and digital gene expression profiling making use of data analysis tools available in the public domain and novel tools developed for this purpose. The developed workflows were made available in a private instance of the Galaxy workflow management system. The developed workflows were used to perform the de novo assembly of a gene-catalog of a Eucalyptus plantation tree. The fast growing and good wood properties of Eucalyptus tree species and their hybrids make them excellent renewable resources of fiber for pulp and paper, and woody biomass for bioenergy production. We produced an expressed gene-catalog of 18 894 de novo assembled contigs from Illumina deep mRNA-Seq of six sampled plant tissues. Using a novel coverage-assisted re-assembly approach, we were able to assemble near full-length biologically relevant transcripts. The assembly was evaluated in terms of contig quality and contiguity, and functional annotations were assigned. Digital expression profiling (FPKM values) of each contig across the tissues were calculated, which was used to identify of tissue-specific sets of expressed genes. Polymorphism analysis of 13 806 high-confidence contigs revealed a combined exon and untranslated region SNP density of 0.534 SNPs/100 bp, which provides a good opportunity for designing high-density SNP assays in the expressed regions of the Eucalyptus genome. The assembled and annotated gene catalog was made available for public use in a user-friendly, web-based interface as the Eucspresso database (http://eucspresso.bi.up.ac.za). The developed database acts as a prelude to a more comprehensive mRNA-Seq whole-transcriptome repository, the Eucalyptus Genome Intergrative Explorer (EucGenIE), a resource that will focus on identifying transcriptional networks active during woody biomass development. Results from the study proved that current bioinformatics software tools and approaches can be used to successfully assemble and characterize a large proportion of the transcriptome of a complex eukaryotic organism. This approach can be used to characterise the gene catalog of a wide range of non-model organisms using only data derived from uHTS experiments. en
dc.description.availability unrestricted en
dc.description.department Biochemistry en
dc.identifier.citation Hefer, CA 2011, Assembly, annotation and polymorphism analysis of a draft transcriptome sequence for a fast-growing Eucalyptus plantation tree, PhD thesis, University of Pretoria, Pretoria, viewed yymmdd < http://hdl.handle.net/2263/28833 > en
dc.identifier.other D11/9/153/ag en
dc.identifier.upetdurl http://upetd.up.ac.za/thesis/available/etd-10182011-163946/ en
dc.identifier.uri http://hdl.handle.net/2263/28833
dc.language.iso en
dc.publisher University of Pretoria en_ZA
dc.rights © 2011 University of Pretoria. All rights reserved. The copyright in this work vests in the University of Pretoria. No part of this work may be reproduced or transmitted in any form or by any means, without the prior written permission of the University of Pretoria. en
dc.subject Genomic research projects en
dc.subject Bioinformatics en
dc.subject Dna sequencing technologies en
dc.subject Eucalyptus tree species en
dc.subject UCTD en_US
dc.title Assembly, annotation and polymorphism analysis of a draft transcriptome sequence for a fast-growing Eucalyptus plantation tree en
dc.type Thesis en


Files in this item

This item appears in the following Collection(s)

Show simple item record