Data measures that characterise classification problems

Van der Walt, Christiaan Maarten

UPSpace Home
→
University of Pretoria: Research Output
→
Theses and Dissertations (University of Pretoria)
→
View Item

dc.contributor.advisor	Barnard, E.	en
dc.contributor.postgraduate	Van der Walt, Christiaan Maarten	en
dc.date.accessioned	2013-09-07T11:52:19Z
dc.date.available	2008-09-09	en
dc.date.available	2013-09-07T11:52:19Z
dc.date.created	2008-04-09	en
dc.date.issued	2008-09-09	en
dc.date.submitted	2008-08-29	en
dc.description	Dissertation (MEng)--University of Pretoria, 2008.	en
dc.description.abstract	We have a wide-range of classifiers today that are employed in numerous applications, from credit scoring to speech-processing, with great technical and commercial success. No classifier, however, exists that will outperform all other classifiers on all classification tasks, and the process of classifier selection is still mainly one of trial and error. The optimal classifier for a classification task is determined by the characteristics of the data set employed; understanding the relationship between data characteristics and the performance of classifiers is therefore crucial to the process of classifier selection. Empirical and theoretical approaches have been employed in the literature to define this relationship. None of these approaches have, however, been very successful in accurately predicting or explaining classifier performance on real-world data. We use theoretical properties of classifiers to identify data characteristics that influence classifier performance; these data properties guide us in the development of measures that describe the relationship between data characteristics and classifier performance. We employ these data measures on real-world and artificial data to construct a meta-classification system. We use theoretical properties of classifiers to identify data characteristics that influence classifier performance; these data properties guide us in the development of measures that describe the relationship between data characteristics and classifier performance. We employ these data measures on real-world and artificial data to construct a meta-classification system. The purpose of this meta-classifier is two-fold: (1) to predict the classification performance of real-world classification tasks, and (2) to explain these predictions in order to gain insight into the properties of real-world data. We show that these data measures can be employed successfully to predict the classification performance of real-world data sets; these predictions are accurate in some instances but there is still unpredictable behaviour in other instances. We illustrate that these data measures can give valuable insight into the properties and data structures of real-world data; these insights are extremely valuable for high-dimensional classification problems.	en
dc.description.availability	unrestricted	en
dc.description.department	Electrical, Electronic and Computer Engineering	en
dc.identifier.citation	a 2008	en
dc.identifier.other	E1080/gm	en
dc.identifier.upetdurl	http://upetd.up.ac.za/thesis/available/etd-08292008-162648/	en
dc.identifier.uri	http://hdl.handle.net/2263/27624
dc.language.iso		en
dc.publisher	University of Pretoria	en_ZA
dc.rights	© University of Pretoria 2008 E1080/	en
dc.subject	Classifier selection	en
dc.subject	Data measures	en
dc.subject	Data characteristics	en
dc.subject	Artificial data	en
dc.subject	Data analysis	en
dc.subject	Classification	en
dc.subject	Supervised learning	en
dc.subject	Pattern recognition	en
dc.subject	Meta-classification	en
dc.subject	Classification prediction	en
dc.subject	UCTD	en_US
dc.title	Data measures that characterise classification problems	en
dc.type	Dissertation	en