Abstract:
De novo haplotype phased genome assemblies based on long-read sequencing technologies have improved the detection and characterization of structural variants (SVs) in plant and animal genomes. As long-reads are able to span across haplotypes, they also allow phased (haplo) assemblies of highly heterozygous genomes such as those of forest trees. Knowledge of SV function and their resulting impact on gene expression can be used by breeders to guide tree improvement. Eucalyptus species and hybrids are some of the most widely planted hardwood trees. Hybrids are often preferred as they combine the genetic background of two species to produce more resilient trees that can inhabit a wider environmental deployment range. For example, E. urophylla x E. grandis hybrids combines disease resistance of E. urophylla with fast growth and desirable wood properties of E. grandis. However, to use such a strategy in eucalypt breeding firstly requires a high-quality reference genome (preferably phased) with which additional de novo assembled genomes can be compared. The aim of this study was to assemble high-quality haplotype phased genomes for Eucalyptus urophylla and E. grandis. Using Nanopore sequencing data generated for an E. urophylla x E. grandis F1 hybrid and a trio-binning approach, we successfully assembled 544.51 Mb of the E. urophylla haplogenome (contig N50 of 1.93 Mb) and 566.75 Mb of the E. grandis haplogenome (contig N50 of 2.42 Mb) with a BUSCO completion score of 98.8%. Using high-density SNP genetic linkage maps of both parents, more than 88% of the haplogenome contigs could be anchored to one of the eleven chromosomes (scaffold N50 of 42.45 Mb and 43.82 Mb for the E. urophylla and E. grandis haplogenome assemblies, respectively). We also provide the first genome-wide comparison between the E. urophylla and E. grandis using the Synteny and Rearrangement Identifier (SyRI) to identify SVs, leading to the discovery of 48,729 SVs between the two haplogenomes. This study is the first step towards implementing haplotype-informed molecular breeding of Eucalyptus tree species.