Determining the number of clusters using penalised k-means clustering

doi:https://doi.org/10.25403/UPresearchdata.28380005

Determining the number of clusters using penalised k-means clustering

dc.contributor.advisor	Millard, Sollie M.
dc.contributor.coadvisor	Kanfer, F.H.J. (Frans)
dc.contributor.email	robert.w.greyling@gmail.com	en_US
dc.contributor.postgraduate	Greyling, Robert William
dc.date.accessioned	2025-02-10T07:15:08Z
dc.date.available	2025-02-10T07:15:08Z
dc.date.created	2025-04
dc.date.issued	2024-11
dc.description	Dissertation (MSc (Advanced Data Analytics))--University of Pretoria, 2024.	en_US
dc.description.abstract	Clustering is an important part of statistics. However the issue of pre-initialisation of the number of clusters is still persistent. In this minor dissertation we consider a procedure to eliminate the pre-initialisation of the number of clusters in the k-means algorithm. This important advancement reduces manual effort in clustering tasks. This procedure aims to automatically eliminate the determination of the correct value of k. Following the approach by Sinaga and Yang; we modify the traditional k-means objective function by adding two entropy terms as penalty terms. An additional step was added to the algorithm to ensure that the initial clusters are not empty. A simulation study was conducted using multiple datasets with varying true cluster counts k, data dimensionalities D, and sample sizes n. Results indicate that the proposed algorithm performs well in identifying distinct clusters, particularly in lower-dimensional data.	en_US
dc.description.availability	Unrestricted	en_US
dc.description.degree	MSc (Advanced Data Analytics)	en_US
dc.description.department	Statistics	en_US
dc.description.faculty	Faculty of Natural and Agricultural Sciences	en_US
dc.description.sdg	None	en_US
dc.identifier.citation	*	en_US
dc.identifier.doi	https://doi.org/10.25403/UPresearchdata.28380005	en_US
dc.identifier.other	A2025	en_US
dc.identifier.uri	http://hdl.handle.net/2263/100627
dc.language.iso	en	en_US
dc.publisher	University of Pretoria
dc.rights	© 2023 University of Pretoria. All rights reserved. The copyright in this work vests in the University of Pretoria. No part of this work may be reproduced or transmitted in any form or by any means, without the prior written permission of the University of Pretoria.
dc.subject	UCTD	en_US
dc.subject	K-means	en_US
dc.subject	Unsupervised k-means	en_US
dc.subject	Entropy	en_US
dc.subject	Pre-intialisation	en_US
dc.subject	Number of clusters	en_US
dc.title	Determining the number of clusters using penalised k-means clustering	en_US
dc.type	Dissertation	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Greyling_Determining_2024.pdf
Size:: 5.55 MB
Format:: Adobe Portable Document Format
Description:: Dissertation

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.71 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Theses and Dissertations (University of Pretoria)
Theses and Dissertations (Statistics)

Simple item page