Combined de novo and genome guided assembly and annotation of the Pinus patula juvenile shoot transcriptome

Loading...
Thumbnail Image

Authors

Visser, Erik A.
Wegrzyn, Jill L.
Steenkamp, Emma Theodora
Myburg, Alexander Andrew
Naidoo, Sanushka

Journal Title

Journal ISSN

Volume Title

Publisher

BioMed Central

Abstract

BACKGROUND : Pines are the most important tree species to the international forestry industry, covering 42 % of the global industrial forest plantation area. One of the most pressing threats to cultivation of some pine species is the pitch canker fungus, Fusarium circinatum, which can have devastating effects in both the field and nursery. Investigation of the Pinus-F. circinatum host-pathogen interaction is crucial for development of effective disease management strategies. As with many non-model organisms, investigation of host-pathogen interactions in pine species is hampered by limited genomic resources. This was partially alleviated through release of the 22 Gbp Pinus taeda v1.01 genome sequence (http://pinegenome.org/pinerefseq/) in 2014. Despite the fact that the fragmented state of the genome may hamper comprehensive transcriptome analysis, it is possible to leverage the inherent redundancy resulting from deep RNA sequencing with Illumina short reads to assemble transcripts in the absence of a completed reference sequence. These data can then be integrated with available genomic data to produce a comprehensive transcriptome resource. The aim of this study was to provide a foundation for gene expression analysis of disease response mechanisms in Pinus patula through transcriptome assembly. RESULTS : Eighteen de novo and two reference based assemblies were produced for P. patula shoot tissue. For this purpose three transcriptome assemblers, Trinity, Velvet/OASES and SOAPdenovo-Trans, were used to maximise diversity and completeness of assembled transcripts. Redundancy in the assembly was reduced using the EvidentialGene pipeline. The resulting 52 Mb P. patula v1.0 shoot transcriptome consists of 52 112 unigenes, 60 % of which could be functionally annotated. CONCLUSIONS : The assembled transcriptome will serve as a major genomic resource for future investigation of P. patula and represents the largest gene catalogue produced to date for this species. Furthermore, this assembly can help detect gene-based genetic markers for P. patula and the comparative assembly workflow could be applied to generate similar resources for other non-model species.

Description

Additional file 1: Table S1. EvidentialGene tr2aacds pipeline output summary.
Additional file 2: Table S2. Assembly statistics for EvidentialGene tr2aacds pipeline merged assembly compared to average statistics for each assembler.
Additional file 3: Table S3. Predicted species distribution for non-pine origin sequences removed from the Pinus patula v1.0 transcriptome.
Additional file 4: Figure S1. Molecular function gene ontology distribution for the Pinus patula v1.0 transcriptome.
Additional file 5: Table S4. Tribe-MCL gene families and annotations for all 15 species used.
Additional file 6: Table S5. Conditional reciprocal best BLAST alignment results between full-length Sanger sequenced Pinus taeda cDNA and representative Pinus patula transcripts for each cDNA.
Additional file 7: Figure S2. Summary statistics for alignment of Pinus taeda complete CDS sequences to assembled Pinus patula transcripts. Pita = P. taeda. The x-axis represents the query P. taeda cDNA sequence. The solid y-axis (left) illustrates: cDNA query sequence length (pink circle), P. patula subject sequence length (blue square), conditional reciprocal best BLAST alignment length (gold triangle). The dashed y-axis (right) depicts the: percentage identity between sequences (black line), percentage coverage of the P. taeda cDNA by the corresponding P. patula transcript (green cross) and vice versa (purple plus).
Additional file 8: Table S6. EBSeq differential expression analysis results comparing expression between inoculated and mock-inoculated data.
Additional file 9: Table S7. Summarized list of differentially expressed genes between inoculated and mock-inoculated data with annotations.

Keywords

Pinus patula, De novo transcriptome assembly, Genome guided transcriptome assembly, RNA-seq

Sustainable Development Goals

Citation

Visser, EA, Wegrzyn, JL, Steenkmap, ET, Myburg, AA & Naidoo, S 2015, 'Combined de novo and genome guided assembly and annotation of the Pinus patula juvenile shoot transcriptome', BMC Genomics, vol. 16, art. 1057, pp. 1-13.