Rapid protein evolution, organellar reductions, and invasive intronic elements in the marine aerobic parasite dinoflagellate Amoebophrya spp
Farhat, Sarah; Le, Phuong; Kayal, Ehsan; Noel, Benjamin; Bigeard, Estelle; Corre, Erwan; Maumus, Florian; Florent, Isabelle; Alberti, Adriana; Aury, Jean-Marc; Barbeyron, Tristan; Cai, Ruibo; Da Silva, Corinne; Istace, Benjamin; Labadie, Karine; Marie, Dominique; Mercier, Jonathan; Rukwavu, Tsinda; Szymczak, Jeremy; Tonon, Thierry; Alves-de-Souza, Catharina; Rouze, Pierre; Van de Peer, Yves; Wincker, Patrick; Rombauts, Stephane; Porcel, Betina M.; Guillou, Laure
Date:
2021-01-06
Abstract:
BACKGROUND : Dinoflagellates are aquatic protists particularly widespread in the oceans worldwide. Some are responsible for toxic blooms while others live in symbiotic relationships, either as mutualistic symbionts in corals or as parasites infecting other protists and animals. Dinoflagellates harbor atypically large genomes (~ 3 to 250 Gb), with gene organization and gene expression patterns very different from closely related apicomplexan parasites. Here we sequenced and analyzed the genomes of two early-diverging and co-occurring parasitic dinoflagellate Amoebophrya strains, to shed light on the emergence of such atypical genomic features, dinoflagellate evolution, and host specialization. RESULTS : We sequenced, assembled, and annotated high-quality genomes for two Amoebophrya strains (A25 and A120), using a combination of Illumina paired-end short-read and Oxford Nanopore Technology (ONT) MinION long-read sequencing approaches. We found a small number of transposable elements, along with short introns and intergenic regions, and a limited number of gene families, together contribute to the compactness of the Amoebophrya genomes, a feature potentially linked with parasitism. While the majority of Amoebophrya proteins (63.7% of A25 and 59.3% of A120) had no functional assignment, we found many orthologs shared with Dinophyceae. Our analyses revealed a strong tendency for genes encoded by unidirectional clusters and high levels of synteny conservation between the two genomes despite low interspecific protein sequence similarity, suggesting rapid protein evolution. Most strikingly, we identified a large portion of non-canonical introns, including repeated introns, displaying a broad variability of associated splicing motifs never observed among eukaryotes. Those introner elements appear to have the capacity to spread over their respective genomes in a manner similar to transposable elements. Finally, we confirmed the reduction of organelles observed in Amoebophrya spp., i.e., loss of the plastid, potential loss of a mitochondrial genome and functions.
CONCLUSION : These results expand the range of atypical genome features found in basal dinoflagellates and raise
questions regarding speciation and the evolutionary mechanisms at play while parastitism was selected for in this
particular unicellular lineage.
Description:
ADDITIONAL FILE 1: FIGURE S1. Phylogeny of Alveolata. Proteomes from 89 alveolates genomes and transcriptome assemblies from the MMETSP project (https://zenodo.org/record/257026/files/) were used to create orthologous groups using orthofinder v2.2 with the diamond BLAST similarity search. Single ortholog alignments were pruned using PhyloTreePruner v.1.0 (minimum taxa to keep 44 and support value 0.9) and realigned using mafft v7 and filtered with Gblocks v.0.91b (−b5 = a -p = n). Filtered alignments were concatenated using seqCat.pl and a phylogenetic tree was produced under Maximum Likelihood framework using RAxML v8.2.9 with the PROTGAMMALGF model of sequence evolution and 101 bootstraps. Asterics represent support values of 95 and above. A detailed method can be found in Kayal et al. 2018 BMC Evol. Biol. (https://doi.org/10.1186/s12862-018-1142-0). The full tree can be found at http://mmo.sb-roscoff.fr/jbrowseAmoebophrya/. FIGURE S2. SSU rDNA sequence identity (in percentage, relative to A25 and A120 compared to other species). FIGURE S3. Distribution of k-mer in A25 and A120 genomes. FIGURE S4. Classification of repeated elements in 3 Amoebophrya genomes (AT5, A25, and A120) using REPET. The x-axis represents the cumulated number of bases of repeated elements in the genome. FIGURE S5. Conserved motif of the putative splice leader (SL) in A25 and A120. FIGURE S6. Alignments of gene encoding the putative spliced leader (SL) gene in A25 and A120. FIGURE S7. Gene orientation change rate in 3 Amoebophrya genomes. FIGURE S8. Number of orthologs genes shared by selected taxa. FIGURE S9. Boxplot of the dN/dS ratios of orthologous genes between A25 and A120, calculated using the model average method (MA). FIGURE S10. Synteny dot-plot obtained by comparison between Amoebophrya A25 and AT5 genomes. FIGURE S11. Synteny dot-plot obtained by comparison between Amoebophrya A120 and AT5 genomes. FIGURE S12. Intron length distribution. FIGURE S13. GC content distribution. FIGURE S14. Multiple alignments of U2 snRNAs. FIGURE S15. Multiple alignments of U4 snRNAs. FIGURE S16. Multiple alignments of U5 snRNAs. FIGURE S17. Multiple alignments of U6 snRNAs. FIGURE S18. Secondary structure of Amoebophrya snRNA. FIGURE S19. Example of introner elements (IEs) in Amoebophrya. FIGURE S20. Distribution the direct repeats with size ranging between 3 and 8 nucleotides in A25. FIGURE S21. Distribution of the direct repeats with size ranging between 3 and 8 nucleotides in A120. FIGURE S22. Composition of direct repeats in introners elements. The diversity in composition of the three (a, b, c) most abundant of direct repeats in introner elements in A25 (up) and A120 (down). FIGURE S23. Terminal inverted repeat locations around the splicing sites in A25 and A120. The position of inverted repeats according to the location of the splice sites in A25 and A120. Left, the inverted repeats of A120 are located at 1–5 the nucleotides upstream and downstream of the splice sites. Right, the inverted repeats of A25 are located at the 1–6 nucleotides in upstream and downstream of the splice sites. FIGURE S24. The flowchart for the in silico search of introner elements. FIGURE S25. Hierarchical clustering analysis (pairwise similarity and OrthoMCL) of all intron families and of the inverted repeats in A25 and A120. FIGURE S26. Percentage of genes with assigned functions in relation with introns composition. FIGURE S27. Difference in the proportion of IEs-containing-genes compared to their KEGG assignment in A25 and A120. FIGURE S28. Distribution of conserved introns. TABLE S1. RCC number, date and site of isolation of strains considered in this study. TABLE S2. Metrics of Nanopore runs for the two Amoebophrya strains. TABLE S3. Search for pathways involved in plastidial functions that are entirely independent of plastid-encoded gene content. TABLE S4. Number of the different types of introns identified in A25 and A120 genomes. TABLE S5. Search for RNA editing in A25 and A120 introns. TABLE S6. Putative Amoebophrya A25 and A120 snRNP homologs. TABLE S7. Classification into families of non-canonical introns in A25 and A120. TABLE S8. RNAseq read assembly statistics of Amoebophrya A25 and A120 corresponding samples from the different time of infection and to the freeliving stage (dinospore only). TABLE S9. Total number of contigs belonging to samples from different stages of infection and the proportion of them that were aligned against the genomes of both Amoebophrya A25 and A120. ND corresponds to “not determined” when no measurement was done. TABLE S10. Metabolic pathway screened in A25 and A120 proteomes.