A comparative study of sample selection methods for classification

Show simple item record

dc.contributor.author Lutu, P.E.N. (Patricia Elizabeth Nalwoga)
dc.contributor.author Engelbrecht, Andries P.
dc.date.accessioned 2008-04-08T12:45:29Z
dc.date.available 2008-04-08T12:45:29Z
dc.date.issued 2006-06
dc.description.abstract Sampling of large datasets for data mining is important for at least two reasons. The processing of large amounts of data results in increased computational complexity. The cost of this additional complexity may not be justifiable. On the other hand, the use of small samples results in fast and efficient computation for data mining algorithms. Statistical methods for obtaining sufficient samples from datasets for classification problems are discussed in this paper. Results are presented for an empirical study based on the use of sequential random sampling and sample evaluation using univariate hypothesis testing and an information theoretic measure. Comparisons are made between theoretical and empirical estimates. en
dc.format.extent 342371 bytes
dc.format.mimetype application/pdf
dc.identifier.citation Lutu, PEN & Engelbrecht, AP 2006, 'A comparative study of sample selection methods for classification', South African Computer Journal, issue 36, pp.69-85,[http://www.journals.co.za/ej/ejour_comp.html] en
dc.identifier.issn 1015-7999
dc.identifier.uri http://hdl.handle.net/2263/4904
dc.language.iso en en
dc.publisher Computer Society of South Africa en
dc.rights Computer Society of South Africa en
dc.subject Dataset sampling en
dc.subject Data analysis en
dc.subject Machine learning en
dc.subject Classification en
dc.subject Information measures en
dc.subject.lcsh Sampling
dc.subject.lcsh Information measurement
dc.subject.lcsh Machine learning
dc.title A comparative study of sample selection methods for classification en
dc.type Article en


Files in this item

This item appears in the following Collection(s)

Show simple item record