A comparative study of sample selection methods for classification

Lutu, Patricia Elizabeth Nalwoga; Engelbrecht, Andries P.

A comparative study of sample selection methods for classification

Files

Lutu_Comparative(2006).pdf (334.35 KB)

Date

2006-06

Authors

Lutu, Patricia Elizabeth Nalwoga

Engelbrecht, Andries P.

Publisher

Computer Society of South Africa

Abstract

Sampling of large datasets for data mining is important for at least two reasons. The processing of large amounts of data results in increased computational complexity. The cost of this additional complexity may not be justifiable. On the other hand, the use of small samples results in fast and efficient computation for data mining algorithms. Statistical methods for obtaining sufficient samples from datasets for classification problems are discussed in this paper. Results are presented for an empirical study based on the use of sequential random sampling and sample evaluation using univariate hypothesis testing and an information theoretic measure. Comparisons are made between theoretical and empirical estimates.

Keywords

Dataset sampling, Data analysis, Machine learning, Classification, Information measures

Citation

Lutu, PEN & Engelbrecht, AP 2006, 'A comparative study of sample selection methods for classification', South African Computer Journal, issue 36, pp.69-85,[http://www.journals.co.za/ej/ejour_comp.html]

URI

http://hdl.handle.net/2263/4904

Collections

Research Articles (Informatics)
Research Articles (University of Pretoria)

Full item page

A comparative study of sample selection methods for classification

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Sustainable Development Goals

Citation

URI

Collections