Penalized feature selection in model-based clustering

Potgieter, Luandrie

UPSpace Home
→
University of Pretoria: Research Output
→
Theses and Dissertations (University of Pretoria)
→
View Item

dc.contributor.advisor	Millard, Sollie M.
dc.contributor.coadvisor	Kanfer, F.H.J. (Frans)
dc.contributor.postgraduate	Potgieter, Luandrie
dc.date.accessioned	2023-06-06T13:00:21Z
dc.date.available	2023-06-06T13:00:21Z
dc.date.created	2023-09-01
dc.date.issued	2022
dc.description	Dissertation (MSc (Advanced Data Analytics))--University of Pretoria, 2022.	en_US
dc.description.abstract	Cluster analysis is a popular unsupervised statistical method used to group observations into clusters. Identifying latent segments and groupings in the data aids in the understanding of natural phenomena. The data driven society we live in today has made high dimensional data quite ubiquitous and hence noise variables are unavoidable. Modelbased clustering methods have had to adjust in order to identify these non-informative variables since they unduly increase a model’s complexity. This mini dissertation reviews the effectiveness of different penalized likelihood approaches and how they aid in identifying and removing uninformative variables. An EM algorithm is used to fit a penalized Gaussian mixture model to the data. The penalized log likelihood is maximized and if a variable’s parameter estimates are reduced to the same value across all clusters, it is removed from the model and deemed uninformative. It was found that by penalizing the mean, uninformative variables were successfully identified and removed.	en_US
dc.description.availability	Unrestricted	en_US
dc.description.degree	MSc (Advanced Data Analytics)	en_US
dc.description.department	Statistics	en_US
dc.description.sponsorship	CSIR	en_US
dc.identifier.citation	*	en_US
dc.identifier.doi	10.25403/UPresearchdata.23219531	en_US
dc.identifier.other	S2023
dc.identifier.uri	http://hdl.handle.net/2263/91035
dc.language.iso	en	en_US
dc.publisher	University of Pretoria
dc.rights	© 2023 University of Pretoria. All rights reserved. The copyright in this work vests in the University of Pretoria. No part of this work may be reproduced or transmitted in any form or by any means, without the prior written permission of the University of Pretoria.
dc.subject	UCTD	en_US
dc.subject	Variable selection	en_US
dc.subject	Clustering	en_US
dc.subject	Expectation Maximisation
dc.subject	Penalized log-likelihood
dc.subject	Penalized feature selection
dc.title	Penalized feature selection in model-based clustering	en_US
dc.type	Mini Dissertation	en_US