Eucalyptus grandis is a commercially important hardwood species and is known to be susceptible to a number of pests and pathogens. Determining mechanisms of defense is therefore a research priority. The published genome for E. grandis has aided the identification of one important class of resistance (R) genes that incorporate nucleotide binding sites and leucine-rich repeat domains (NBS-LRR). Using an iterative search process we identified NBS-LRR gene models within the E. grandis genome. We characterized the gene models and identified their genomic arrangement. The gene expression patterns were examined in E. grandis clones, challenged with a fungal pathogen (Chrysoporthe austroafricana) and insect pest (Leptocybe invasa). One thousand two hundred and fifteen putative NBS-LRR coding sequences were located which aligned into two large classes, Toll or interleukin-1 receptor (TIR) and coiled-coil (CC) based on NB-ARC domains. NBS-LRR gene-rich regions were identified with 76% organized in clusters of three or more genes. A further 272 putative incomplete resistance genes were also identified. We determined that E. grandis has a higher ratio of TIR to CC classed genes compared to other woody plant species as well as a smaller percentage of single NBS-LRR genes. Transcriptome profiles indicated expression hotspots, within physical clusters, including expression of many incomplete genes. The clustering of putative NBS-LRR genes correlates with differential expression responses in resistant and susceptible plants indicating functional relevance for the physical arrangement of this gene family. This analysis of the repertoire and expression of E. grandis putative NBS-LRR genes provides an important resource for the identification of novel and functional R-genes; a key objective for strategies to enhance resilience.
Full list of Eucalyptus grandis putative NBS-LRR genes sorted by position on the genome. Information per gene includes the chromosomal position, class, physical cluster and phylogeny clade membership, identification method, raw expression data, log2 fold change values and ANOVA results (p-values). S_F_C, susceptible, fungal treatment, control; S_F_I, susceptible, fungal treatment, inoculated; R_F_C, resistant, fungal treatment, control; R_F_I, resistant, fungal treatment, inoculated; S_I_C, susceptible, insect treatment, control; S_I_I, susceptible, insect treatment, infested; R_I_C, resistant, insect treatment, control; R_I_I, resistant, insect treatment, infested.
Table S2 Conserved amino acid sequences for NB-ARC and TIR motifs from MEME analysis with CNL-like and TNL-like gene models in Eucalyptus grandis (Eg) and Arabidopsis thaliana (At; Meyers et al., 2003). The expected amino acid tryptophan (W) is identified in the Kinase 2 subdomain for CNL sequences–underlined.
Neighbor joining tree of 480 Eucalyptus grandis NB-ARC domains from complete NBS-LRR genes. The tree is drawn to scale, with branch lengths in the same units as those of the evolutionary distances used to infer the phylogenetic tree. The evolutionary distances were computed using the p-distance method and are in the units of the number of amino acid differences per site. The analysis involved 495 amino acid sequences (480 E. grandis). All ambiguous positions were removed for each sequence pair.
Neighbor joining tree of 616 Eucalyptus grandis NB-ARC domains from all non-TIR NBS-LRR-like genes. The tree is drawn to scale, with branch lengths in the same units as those of the evolutionary distances used to infer the phylogenetic tree. The evolutionary distances were computed using the p-distance method and are in the units of the number of amino acid differences per site. The analysis involved 631 amino acid sequences (616 E. grandis). All ambiguous positions were removed for each sequence pair.
Neighbor joining tree of 396 Eucalyptus grandis NB-ARC domains from all TIR NBS-LRR-like genes. The tree is drawn to scale, with branch lengths in the same units as those of the evolutionary distances used to infer the phylogenetic tree. The evolutionary distances were computed using the p-distance method and are in the units of the number of amino acid differences per site. The analysis involved 411 amino acid sequences (396 E. grandis). All ambiguous positions were removed for each sequence pair.
The definition of a (A) cluster and a (B) supercluster is illustrated using a region (starting at 13 Mb and ending at 18 Mb) on chromosome 4.
Physical locations for all complete, partial, and incomplete NBS-LRR gene models that were expressed under challenge of Chrysoporthe austroafricana and Leptocybe invasa on Eucalyptus grandis chromosomes (Mapchart). Variation in means from treatment (ANOVA) were identified based on significance *p < 0.01, **p < 0.001, ***p < 0.0001 (*** are also underlined) and log2 gene expression ratios greater than 1 or smaller than −1 for resistant and susceptible plants. Color distinguishes between different classes (TNL = pink, CNL = green, NL = red, incomplete NL = black, BLAST homolog non-NL = black). Scale bar = Mb. Cluster and supercluster regions are indicated and E. grandis gene IDs are provided.
NB-ARC-LRR fused domains (A) and TIR-NB-ARC-LRR fused domains (B). Conserved amino acid sequences are indicated with lines (top). The GKT (Kinase 1) conserved motif is recognized as a P-loop structure important in ATP hydrolysis while the hDD is also well conserved in NB-ARC domains (Kinase 2) as important in co-ordinating Mg2+ as a co-factor (Tameling et al., 2006). These two important sub-domains of NB-ARC are sometimes termed the Walker A and Walker B motifs (Walker et al., 1982) and are identified as A and B, respectively, within the I-Tasser protein structures (bottom) for a representative CNL (Eucgr.L01363) and TNL (Eucgr.C00020) sequence from the Eucalyptus grandis genome.