Clustering time-course data using P-splines and mixed effects mixture models

dc.contributor.advisorKanfer, F.H.J. (Frans)
dc.contributor.coadvisorMillard, Sollie M.
dc.contributor.emailU04639864@tuks.co.zaen_ZA
dc.contributor.postgraduateBredenkamp, Deidre
dc.date.accessioned2022-01-25T07:43:09Z
dc.date.available2022-01-25T07:43:09Z
dc.date.created2022-08
dc.date.issued2022
dc.descriptionMini Dissertation (MCom (Advanced Data Analytics))--University of Pretoria 2022.en_ZA
dc.description.abstractIn the field of biology, gene expressions are evaluated over time to study complicated biological processes and genetic supervisory networks. Because the process is continuous, time-course gene-expression data may be represented by a continuous function. This mini dissertation addresses cluster analysis of time-course data in a mixture model framework. To take into account the time dependency of such time-course data, as well as the degree of error present in many datasets, the mixed effects model with penalized B-splines is considered. In this mini dissertation the performance of such a mixed effects model has been studied with regards to the clustering of time-course gene expression data in a mixture model system. The EM algorithm has been implemented to fit the mixture model in a mixed effects model structure. For each subject the best linear unbiased smooth estimate of its time-course trajectory has been calculated and subjects with similar mean curves have been clustered in the same cluster. Model validation statistics such has the model accuracy and the coefficient of determination (R 2 ) indicates that the model can cluster simulated data effectively into clusters that differ in either the form of the curves or the timing to the curves’ peaks. The proposed technique is further evidenced by clustering time-course gene expression data consisting of microarray samples from lung tissue of mice exposed to different Influenza strains from 14 time-points.en_ZA
dc.description.availabilityUnrestricteden_ZA
dc.description.degreeMCom (Advanced Data Analytics)en_ZA
dc.description.departmentStatisticsen_ZA
dc.description.sponsorshipNational Research Foundation, South Africa (Research chair: Computational and Methodological Statistics, Grant number 71199)(SARChI).en_ZA
dc.identifier.citationBredenkamp, DM 2022, Clustering time-course data using P-splines and mixed effects mixture models, MSc Mini Dissertation, University of Pretoria, Pretoria viewed yymmdd http://hdl.handle.net/2263/83444en_ZA
dc.identifier.otherA2022en_ZA
dc.identifier.urihttp://hdl.handle.net/2263/83444
dc.language.isoenen_ZA
dc.publisherUniversity of Pretoria
dc.rights© 2019 University of Pretoria. All rights reserved. The copyright in this work vests in the University of Pretoria. No part of this work may be reproduced or transmitted in any form or by any means, without the prior written permission of the University of Pretoria.
dc.subjectUCTDen_ZA
dc.subjectStatisticsen_ZA
dc.titleClustering time-course data using P-splines and mixed effects mixture modelsen_ZA
dc.typeMini Dissertationen_ZA

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Bredenkamp_Clustering_2022.pdf
Size:
8.9 MB
Format:
Adobe Portable Document Format
Description:
Mini Dissertation

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.75 KB
Format:
Item-specific license agreed upon to submission
Description: