Determining the number of clusters using penalised k-means clustering

dc.contributor.advisorMillard, Sollie M.
dc.contributor.coadvisorKanfer, F.H.J. (Frans)
dc.contributor.emailrobert.w.greyling@gmail.comen_US
dc.contributor.postgraduateGreyling, Robert William
dc.date.accessioned2025-02-10T07:15:08Z
dc.date.available2025-02-10T07:15:08Z
dc.date.created2025-04
dc.date.issued2024-11
dc.descriptionDissertation (MSc (Advanced Data Analytics))--University of Pretoria, 2024.en_US
dc.description.abstractClustering is an important part of statistics. However the issue of pre-initialisation of the number of clusters is still persistent. In this minor dissertation we consider a procedure to eliminate the pre-initialisation of the number of clusters in the k-means algorithm. This important advancement reduces manual effort in clustering tasks. This procedure aims to automatically eliminate the determination of the correct value of k. Following the approach by Sinaga and Yang; we modify the traditional k-means objective function by adding two entropy terms as penalty terms. An additional step was added to the algorithm to ensure that the initial clusters are not empty. A simulation study was conducted using multiple datasets with varying true cluster counts k, data dimensionalities D, and sample sizes n. Results indicate that the proposed algorithm performs well in identifying distinct clusters, particularly in lower-dimensional data.en_US
dc.description.availabilityUnrestricteden_US
dc.description.degreeMSc (Advanced Data Analytics)en_US
dc.description.departmentStatisticsen_US
dc.description.facultyFaculty of Natural and Agricultural Sciencesen_US
dc.description.sdgNoneen_US
dc.identifier.citation*en_US
dc.identifier.doihttps://doi.org/10.25403/UPresearchdata.28380005en_US
dc.identifier.otherA2025en_US
dc.identifier.urihttp://hdl.handle.net/2263/100627
dc.language.isoenen_US
dc.publisherUniversity of Pretoria
dc.rights© 2023 University of Pretoria. All rights reserved. The copyright in this work vests in the University of Pretoria. No part of this work may be reproduced or transmitted in any form or by any means, without the prior written permission of the University of Pretoria.
dc.subjectUCTDen_US
dc.subjectK-meansen_US
dc.subjectUnsupervised k-meansen_US
dc.subjectEntropyen_US
dc.subjectPre-intialisationen_US
dc.subjectNumber of clustersen_US
dc.titleDetermining the number of clusters using penalised k-means clusteringen_US
dc.typeDissertationen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Greyling_Determining_2024.pdf
Size:
5.55 MB
Format:
Adobe Portable Document Format
Description:
Dissertation

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: