An investigation of K-means clustering to high and multi-dimensional biological data

Show simple item record

dc.contributor.author Baridam, Barilee B.
dc.contributor.author Ali, M. Montaz
dc.date.accessioned 2013-10-29T13:01:20Z
dc.date.available 2013-10-29T13:01:20Z
dc.date.issued 2013
dc.description 1 pdf file.
dc.description.abstract PURPOSE – The K-means clustering algorithm has been intensely researched owing to its simplicity of implementation and usefulness in the clustering task. However, there have also been criticisms on its performance, in particular, for demanding the value of K before the actual clustering task. It is evident from previous researches that providing the number of clusters a priori does not in any way assist in the production of good quality clusters. The authors' investigations in this paper also confirm this finding. The purpose of this paper is to investigate further, the usefulness of the K-means clustering in the clustering of high and multi-dimensional data by applying it to biological sequence data. DESIGN/METHODOLOGY/APPROACH – The authors suggest a scheme which maps the high dimensional data into low dimensions, then show that the K-means algorithm with pre-processor produces good quality, compact and well-separated clusters of the biological data mapped in low dimensions. For the purpose of clustering, a character-to-numeric conversion was conducted to transform the nucleic/amino acids symbols to numeric values. en_US
dc.description.librarian hb2013 en_US
dc.description.statementofresponsibility Barileé B. Baridam, M. M. Ali en_US
dc.description.uri http://www.emeraldsight.com/journalshtm?/issn=0368-492X en_US
dc.format.extent 14 p.
dc.identifier.citation Baridam, BB & Ali, MM 2013, 'An investigation of K-means clustering to high and multi-dimensional biological data', Kybernetes, vol. 42, no. 4, pp. 614-627. en_US
dc.identifier.issn 0368-492X
dc.identifier.other 10.1108/K-02-2013-0028
dc.identifier.uri http://hdl.handle.net/2263/32205
dc.language.iso en en_US
dc.publisher Emerald en_US
dc.relation.requires Adobe Acrobat Reader, version 6.0 en_US
dc.rights Emerald Group Publishing en_US
dc.subject Cluster analysis en_US
dc.subject Programming and algorithm theory en_US
dc.subject Data management en_US
dc.subject Clustering en_US
dc.subject Dimensionality en_US
dc.subject Categorical data en_US
dc.subject Silhouette validity index en_US
dc.subject Bioinformatics en_US
dc.subject Computational intelligence en_US
dc.subject.ddc 004.35
dc.subject.lcsh Cluster analysis -- Computer programs en_US
dc.title An investigation of K-means clustering to high and multi-dimensional biological data en_US
dc.type Postprint Article en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record