Determining the correct sample size is of utmost importance in study design. Large samples yield classifiers or parameters with more precision and conversely, samples that are too small yield unreliable results. Fixed sample size methods, as determined by the specified level of error between the obtained parameter and population value, or a confidence level associated with the estimate, have been developed and are available. These methods are extremely useful when there is little or no cost (consequences of action), financial and time, involved in gathering the data. Alternatively, sequential sampling procedures have been developed specifically to obtain a classifier or parameter estimate that is as accurate as deemed necessary by the researcher, while sampling the least number of observations required to obtain the specified level of accuracy.
This dissertation discusses a sequential procedure, derived using Martingale Limit Theory, which had been developed to train a classifier with the minimum number of observations to ensure, with a high enough probability, that the next observation sampled has a low enough probability of being misclassified. Various classification methods are discussed and tested, with multiple combinations of parameters tested. Additionally, the sequential procedure is tested on microarray data. Various advantages and shortcomings of the sequential procedure are pointed out and discussed.
This dissertation also proposes a new sequential procedure that trains the classifier to such an extent as to accurately estimate the Bayes error with a high probability. The sequential procedure retains all of the advantages of the previous method, while addressing the most serious shortcoming. Ultimately, the sequential procedure developed enables the researcher to dictate how accurate the classifier should be and provides more control over the trained classifier.