BACKGROUND : Pines are the most important tree species to the international forestry industry, covering 42 % of the
global industrial forest plantation area. One of the most pressing threats to cultivation of some pine species is the
pitch canker fungus, Fusarium circinatum, which can have devastating effects in both the field and nursery.
Investigation of the Pinus-F. circinatum host-pathogen interaction is crucial for development of effective disease
management strategies. As with many non-model organisms, investigation of host-pathogen interactions in pine
species is hampered by limited genomic resources. This was partially alleviated through release of the 22 Gbp Pinus
taeda v1.01 genome sequence (http://pinegenome.org/pinerefseq/) in 2014. Despite the fact that the fragmented
state of the genome may hamper comprehensive transcriptome analysis, it is possible to leverage the inherent
redundancy resulting from deep RNA sequencing with Illumina short reads to assemble transcripts in the absence
of a completed reference sequence. These data can then be integrated with available genomic data to produce a
comprehensive transcriptome resource. The aim of this study was to provide a foundation for gene expression
analysis of disease response mechanisms in Pinus patula through transcriptome assembly.
RESULTS : Eighteen de novo and two reference based assemblies were produced for P. patula shoot tissue. For this
purpose three transcriptome assemblers, Trinity, Velvet/OASES and SOAPdenovo-Trans, were used to maximise
diversity and completeness of assembled transcripts. Redundancy in the assembly was reduced using the
EvidentialGene pipeline. The resulting 52 Mb P. patula v1.0 shoot transcriptome consists of 52 112 unigenes, 60 %
of which could be functionally annotated.
CONCLUSIONS : The assembled transcriptome will serve as a major genomic resource for future investigation of P.
patula and represents the largest gene catalogue produced to date for this species. Furthermore, this assembly can
help detect gene-based genetic markers for P. patula and the comparative assembly workflow could be applied to
generate similar resources for other non-model species.
Additional file 2: Table S2. Assembly statistics for EvidentialGene
tr2aacds pipeline merged assembly compared to average statistics for
Additional file 3: Table S3. Predicted species distribution for non-pine
origin sequences removed from the Pinus patula v1.0 transcriptome.
Additional file 4: Figure S1. Molecular function gene ontology
distribution for the Pinus patula v1.0 transcriptome.
Additional file 5: Table S4. Tribe-MCL gene families and annotations
for all 15 species used.
Additional file 6: Table S5. Conditional reciprocal best BLAST
alignment results between full-length Sanger sequenced Pinus taeda
cDNA and representative Pinus patula transcripts for each cDNA.
Additional file 7: Figure S2. Summary statistics for alignment of Pinus
taeda complete CDS sequences to assembled Pinus patula transcripts.
Pita = P. taeda. The x-axis represents the query P. taeda cDNA sequence.
The solid y-axis (left) illustrates: cDNA query sequence length (pink circle),
P. patula subject sequence length (blue square), conditional reciprocal
best BLAST alignment length (gold triangle). The dashed y-axis (right)
depicts the: percentage identity between sequences (black line),
percentage coverage of the P. taeda cDNA by the corresponding P.
patula transcript (green cross) and vice versa (purple plus).
Additional file 8: Table S6. EBSeq differential expression analysis
results comparing expression between inoculated and mock-inoculated
Additional file 9: Table S7. Summarized list of differentially expressed
genes between inoculated and mock-inoculated data with annotations.