Determining the number of clusters using penalised k-means clustering

doi:https://doi.org/10.25403/UPresearchdata.28380005

Determining the number of clusters using penalised k-means clustering

Files

Greyling_Determining_2024.pdf (5.55 MB)

Date

2024-11

Authors

Greyling, Robert William

Publisher

University of Pretoria

Abstract

Clustering is an important part of statistics. However the issue of pre-initialisation of the number of clusters is still persistent. In this minor dissertation we consider a procedure to eliminate the pre-initialisation of the number of clusters in the k-means algorithm. This important advancement reduces manual effort in clustering tasks. This procedure aims to automatically eliminate the determination of the correct value of k. Following the approach by Sinaga and Yang; we modify the traditional k-means objective function by adding two entropy terms as penalty terms. An additional step was added to the algorithm to ensure that the initial clusters are not empty. A simulation study was conducted using multiple datasets with varying true cluster counts k, data dimensionalities D, and sample sizes n. Results indicate that the proposed algorithm performs well in identifying distinct clusters, particularly in lower-dimensional data.

Description

Dissertation (MSc (Advanced Data Analytics))--University of Pretoria, 2024.

Keywords

UCTD, K-means, Unsupervised k-means, Entropy, Pre-intialisation, Number of clusters

Sustainable Development Goals

None

Citation

*

URI

http://hdl.handle.net/2263/100627

Collections

Theses and Dissertations (University of Pretoria)
Theses and Dissertations (Statistics)

Full item page

Determining the number of clusters using penalised k-means clustering

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Sustainable Development Goals

Citation

URI

Collections