An investigation of K-means clustering to high and multi-dimensional biological data

Baridam, Barilee B.; Ali, M. Montaz

An investigation of K-means clustering to high and multi-dimensional biological data

Files

Baridam_Investigation(2013).pdf (474.14 KB)

Date

2013

Authors

Baridam, Barilee B.

Ali, M. Montaz

Publisher

Emerald

Abstract

PURPOSE – The K-means clustering algorithm has been intensely researched owing to its simplicity of implementation and usefulness in the clustering task. However, there have also been criticisms on its performance, in particular, for demanding the value of K before the actual clustering task. It is evident from previous researches that providing the number of clusters a priori does not in any way assist in the production of good quality clusters. The authors' investigations in this paper also confirm this finding. The purpose of this paper is to investigate further, the usefulness of the K-means clustering in the clustering of high and multi-dimensional data by applying it to biological sequence data. DESIGN/METHODOLOGY/APPROACH – The authors suggest a scheme which maps the high dimensional data into low dimensions, then show that the K-means algorithm with pre-processor produces good quality, compact and well-separated clusters of the biological data mapped in low dimensions. For the purpose of clustering, a character-to-numeric conversion was conducted to transform the nucleic/amino acids symbols to numeric values.

Description

1 pdf file.

Keywords

Cluster analysis, Programming and algorithm theory, Data management, Clustering, Dimensionality, Categorical data, Silhouette validity index, Bioinformatics, Computational intelligence

Citation

Baridam, BB & Ali, MM 2013, 'An investigation of K-means clustering to high and multi-dimensional biological data', Kybernetes, vol. 42, no. 4, pp. 614-627.

URI

http://hdl.handle.net/2263/32205

Collections

Research Articles (Computer Science)
Research Articles (University of Pretoria)

Full item page

An investigation of K-means clustering to high and multi-dimensional biological data

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Sustainable Development Goals

Citation

URI

Collections