Abstract:
In the field of biology, gene expressions are evaluated over time to study complicated biological processes and
genetic supervisory networks. Because the process is continuous, time-course gene-expression data may be
represented by a continuous function.
This mini dissertation addresses cluster analysis of time-course data in a mixture model framework. To
take into account the time dependency of such time-course data, as well as the degree of error present in
many datasets, the mixed effects model with penalized B-splines is considered. In this mini dissertation the
performance of such a mixed effects model has been studied with regards to the clustering of time-course gene
expression data in a mixture model system. The EM algorithm has been implemented to fit the mixture model
in a mixed effects model structure. For each subject the best linear unbiased smooth estimate of its time-course
trajectory has been calculated and subjects with similar mean curves have been clustered in the same cluster.
Model validation statistics such has the model accuracy and the coefficient of determination (R
2
) indicates
that the model can cluster simulated data effectively into clusters that differ in either the form of the curves
or the timing to the curves’ peaks. The proposed technique is further evidenced by clustering time-course
gene expression data consisting of microarray samples from lung tissue of mice exposed to different Influenza
strains from 14 time-points.