Abstract:
Malaria parasites cause human disease through completing a complicated life cycle within both human and mosquito hosts. These organisms are also characterized by numerous molecular eccentricities that make them of immediate biological interest to study. However, the complexity of the parasite life cycle and the composition of its genes and proteins makes studying gene regulation in Plasmodium falciparum parasites a multifaceted problem and challenging to resolve.
This doctoral thesis presents the following approaches to study gene regulation using an array of different tools to construct Gene Regulatory Networks (GRNs) for various phases of P. falciparum parasite development. 1) We investigated gene regulation of the intraerythrocytic phases of the parasite life cycle, the asexual proliferative phase in which causes the symptoms of malaria as well as the sexual differentiative phase that forms transmissive gametocytes. Initially we investigated the two developmental phases in isolation using time course-based experiments and analysing the data with Dynamic Bayesian Network (DBN) tools. We studied asexual gene regulation using a strategic cell cycle arrest and re-entry experiment, whereby regulatory candidate genes were inferred based on re-entry expression patterns. Application of DBN time course analysis yielded a calcium signalling cascade along with multiple regulatory elements. This approach was expanded to study the sexual development phase as well, using a transcriptomics dataset capturing the daily maturation of gametocytes, which focused on the role of transcription factors. The application of DBN analysis to gametocyte microarray data produced insights into the potential regulatory roles of key ApiAP2 transcription factors which presented with a cascade-like expression as well as putative repressor ApiAP2’s which potentially drive the active repression of proliferation-associated
transcription.
2) The two developmental phases were also evaluated collectively using RNA-seq datasets sourced from prior research as well as a newly generated gametocyte maturation dataset, capturing all stages of gametocyte development. Integration of data and constructing of a co-expression network lead to a gametocyte associated subnetwork which highlighted potentially novel and significant regulators of gametocyte maturation. The co-expression network itself also constitutes a solid set of curated, cross-dataset normalized genes that can be further used to predict stage-specificity of transcripts in asexual stages of development. Investigations into long non-coding RNA (lncRNA) and their role during gametocyte development was also a key focus of the study. Novel lncRNA were uncovered for gametocyte stages and co-expression network analysis has highlighted many targets of the lncRNA. Investigation into the role of anti-sense RNA (asRNA) has yielded 9 clusters illustrating the potential for numerous genes (n=285) to be silenced/controlled by their own asRNA.
3) The analysis was further expanded through the construction of a large-scale supervised gene regulatory network using advance ensemble machine learning techniques (GRNBoost2), which evaluated 124 regulatory genes against a total of 5163 target genes. This approached showed great improvements over the previous strategies. This supervised approach was packaged in a user-friendly web application called MALBoost. This application allows user to submit their own transcriptomic data and regulator gene list to perform a choice of two analyses, GRNBoost2 or GENIE3. This approach removes the coding element from the analysis and makes this level of GRN based work available to non-computational biologists.
This thesis presents an in-depth analysis using high-level machine learning and statistical analysis applied to teasing apart the biological significance of transcriptional data. This contributes to our understanding of transcriptional regulation in sexual differentiation and promotes the use of machine learning algorithms in better understanding P. falciparum transcriptomes.