An investigation of K-means clustering to high and multi-dimensional biological data

dc.contributor.authorBaridam, Barilee B.
dc.contributor.authorAli, M. Montaz
dc.date.accessioned2013-10-29T13:01:20Z
dc.date.available2013-10-29T13:01:20Z
dc.date.issued2013
dc.description1 pdf file.
dc.description.abstractPURPOSE – The K-means clustering algorithm has been intensely researched owing to its simplicity of implementation and usefulness in the clustering task. However, there have also been criticisms on its performance, in particular, for demanding the value of K before the actual clustering task. It is evident from previous researches that providing the number of clusters a priori does not in any way assist in the production of good quality clusters. The authors' investigations in this paper also confirm this finding. The purpose of this paper is to investigate further, the usefulness of the K-means clustering in the clustering of high and multi-dimensional data by applying it to biological sequence data. DESIGN/METHODOLOGY/APPROACH – The authors suggest a scheme which maps the high dimensional data into low dimensions, then show that the K-means algorithm with pre-processor produces good quality, compact and well-separated clusters of the biological data mapped in low dimensions. For the purpose of clustering, a character-to-numeric conversion was conducted to transform the nucleic/amino acids symbols to numeric values.en_US
dc.description.librarianhb2013en_US
dc.description.statementofresponsibilityBarileé B. Baridam, M. M. Alien_US
dc.description.urihttp://www.emeraldsight.com/journalshtm?/issn=0368-492Xen_US
dc.format.extent14 p.
dc.identifier.citationBaridam, BB & Ali, MM 2013, 'An investigation of K-means clustering to high and multi-dimensional biological data', Kybernetes, vol. 42, no. 4, pp. 614-627.en_US
dc.identifier.issn0368-492X
dc.identifier.other10.1108/K-02-2013-0028
dc.identifier.urihttp://hdl.handle.net/2263/32205
dc.language.isoenen_US
dc.publisherEmeralden_US
dc.relation.requiresAdobe Acrobat Reader, version 6.0en_US
dc.rightsEmerald Group Publishingen_US
dc.subjectCluster analysisen_US
dc.subjectProgramming and algorithm theoryen_US
dc.subjectData managementen_US
dc.subjectClusteringen_US
dc.subjectDimensionalityen_US
dc.subjectCategorical dataen_US
dc.subjectSilhouette validity indexen_US
dc.subjectBioinformaticsen_US
dc.subjectComputational intelligenceen_US
dc.subject.ddc004.35
dc.subject.lcshCluster analysis -- Computer programsen_US
dc.titleAn investigation of K-means clustering to high and multi-dimensional biological dataen_US
dc.typePostprint Articleen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Baridam_Investigation(2013).pdf
Size:
474.14 KB
Format:
Adobe Portable Document Format
Description:
Postprint Article

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: