Penalized feature selection in model-based clustering

dc.contributor.advisorMillard, Sollie M.
dc.contributor.coadvisorKanfer, F.H.J. (Frans)
dc.contributor.emailluan3potgieter@gmail.comen_US
dc.contributor.postgraduatePotgieter, Luandrie
dc.date.accessioned2023-06-06T13:00:21Z
dc.date.available2023-06-06T13:00:21Z
dc.date.created2023-09-01
dc.date.issued2022
dc.descriptionDissertation (MSc (Advanced Data Analytics))--University of Pretoria, 2022.en_US
dc.description.abstractCluster analysis is a popular unsupervised statistical method used to group observations into clusters. Identifying latent segments and groupings in the data aids in the understanding of natural phenomena. The data driven society we live in today has made high dimensional data quite ubiquitous and hence noise variables are unavoidable. Modelbased clustering methods have had to adjust in order to identify these non-informative variables since they unduly increase a model’s complexity. This mini dissertation reviews the effectiveness of different penalized likelihood approaches and how they aid in identifying and removing uninformative variables. An EM algorithm is used to fit a penalized Gaussian mixture model to the data. The penalized log likelihood is maximized and if a variable’s parameter estimates are reduced to the same value across all clusters, it is removed from the model and deemed uninformative. It was found that by penalizing the mean, uninformative variables were successfully identified and removed.en_US
dc.description.availabilityUnrestricteden_US
dc.description.degreeMSc (Advanced Data Analytics)en_US
dc.description.departmentStatisticsen_US
dc.description.sponsorshipCSIRen_US
dc.identifier.citation*en_US
dc.identifier.doi10.25403/UPresearchdata.23219531en_US
dc.identifier.otherS2023
dc.identifier.urihttp://hdl.handle.net/2263/91035
dc.language.isoenen_US
dc.publisherUniversity of Pretoria
dc.rights© 2023 University of Pretoria. All rights reserved. The copyright in this work vests in the University of Pretoria. No part of this work may be reproduced or transmitted in any form or by any means, without the prior written permission of the University of Pretoria.
dc.subjectUCTDen_US
dc.subjectVariable selectionen_US
dc.subjectClusteringen_US
dc.subjectExpectation Maximisation
dc.subjectPenalized log-likelihood
dc.subjectPenalized feature selection
dc.titlePenalized feature selection in model-based clusteringen_US
dc.typeMini Dissertationen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Potgieter_Penalized_2022.pdf
Size:
3.66 MB
Format:
Adobe Portable Document Format
Description:

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: