Inferring secondary cell wall-related R2R3-MYB transcription factor gene targets in Eucalyptus grandis using DNA affinity purification sequencing
Loading...
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
University of Pretoria
Abstract
Secondary Cell Wall (SCW) development in plants is tightly regulated by a transcriptional regulatory network (TRN) comprising several transcription factor (TF) families, including the conserved R2R3-MYB family. Previous studies in model species such as Arabidopsis and Populus have revealed that MYBs occupy various positions in the multi-tiered SCW TRN, making them attractive candidates for manipulating the regulation of wood formation in the fast-growing and economically important genus, Eucalyptus. The MYB-mediated SCW TRN in Eucalyptus is, however, poorly characterised. Therefore, the study aimed to identify target genes of, and understand the regulatory relationships between, Eucalyptus grandis MYB (EgrMYB) family TFs whose orthologs have previously been linked to xylem SCW regulation in Arabidopsis. DNA Affinity Purification Sequencing (DAP-seq) is an in vitro assay that allows for genome-wide TF binding site (TFBS) and, indirectly, putative target gene identification. Given prior evidence linking EgrMYB TFs to SCW regulation, it was hypothesized that the predicted target genes of SCW-related MYB TFs inferred from their genome-wide TFBS generated using DAP-seq, especially when combined with functional genomic data through machine learning (ML), would be consistent with a SCW-related role for these TFs in Eucalyptus.
Genome-wide TFBSs of ten SCW-related EgrMYB-family TFs were generated using DAP-seq. In a trial experiment of six selected EgrMYB-family TFs, TFBS signatures had weak enrichment for R2R3-MYB motifs, but predicted target genes were statistically enriched for immature xylem accessible chromatin and associated with co-expression modules associated with SCW-related processes, and or woody biomass-related gene sets in Eucalyptus. In a substantial follow-up study, technical factors influencing DAP-seq performance were interrogated to improve the quality and reproducibility of peaks and target genes of seven SCW-linked EgrMYB TFs. To improve peak to gene assignment, a machine learning (ML) algorithm that was previously developed and trained in another study was applied to the Eucalyptus DAP-seq data in this study.
Library preparation significantly impacted DAP-seq performance, and PCR cycle number influenced the reproducibility and quality of peaks. An adapted DAP-seq workflow was developed which included several alterations to the protocol. Using stringent criteria to filter weaker datasets, five DAP-seq peak sets for SCW master regulator EgrMYB2, SCW thickness and vessel density-linked EgrMYB137, SCW repressor EgrMYB1 and lesser known EgrMYB122 and EgrMYB135 both linked to SCW development were retained for target gene inference using ML. The ML approach assigns predicted and biologically relevant target genes to peaks using a random forest classifier and reduces the likelihood of incidental associations by incorporating open chromatin, TF perturbation, and conserved noncoding sequence data, among others.
The DAP-seq-ML approach for target gene assignment yielded 1,457 target genes on average per TF candidate when using a probability cut-off of ≥ 0.5. Four of the five TF target gene sets were significantly enriched for SCW-related gene ontology (GO) terms such as lignin biosynthesis, particularly after applying the ML approach. Using ML-assigned targets, a SCW regulatory subnetwork was constructed using 62 bona fide lignin, cellulose, and hemicellulose structural genes and 51 SCW-related TF genes. At least thirteen out of the seventeen bona fide genes involved in the lignin biosynthesis pathway were co-targeted by EgrMYB2 and EgrMYB137, which may imply functional redundancy or overlapping roles in different developmental/environmental contexts. These results strongly suggest that EgrMYB137 regulates lignification, and this is supported by an independent study showing that EgrMYB137 promotes xylem and leaf vein lignification in planta. In this thesis, I present evidence that DAP-seq-ML could identify previously unknown genome-wide target genes for five SCW-associated R2R3-MYB family TFs in Eucalyptus, and demonstrate a possible role for four of them in regulating SCW-related biological processes, particularly lignification.
Description
Thesis (PhD (Genetics))--University of Pretoria, 2023.
Keywords
UCTD, Secondary cell wall, DAP-seq, Transcription factors, Machine learning, Eucalyptus, R2R3-MYB
Sustainable Development Goals
Citation
*