Abstract:
This dissertation investigates the performance of two-class classi cation credit scoring data
sets with low default ratios. The standard two-class parametric Gaussian and naive Bayes
(NB), as well as the non-parametric Parzen classi ers are extended, using Bayes' rule, to
include either a class imbalance or a Bernoulli prior. This is done with the aim of addressing
the low default probability problem. Furthermore, the performance of Parzen classi cation
with Silverman and Minimum Leave-one-out Entropy (MLE) Gaussian kernel bandwidth
estimation is also investigated. It is shown that the non-parametric Parzen classi ers yield
superior classi cation power.
However, there is a longing for these non-parametric classi ers to posses a predictive power,
such as exhibited by the odds ratio found in logistic regression (LR). The dissertation therefore
dedicates a section to, amongst other things, study the paper entitled \Model-Free Objective
Bayesian Prediction" (Bernardo 1999). Since this approach to Bayesian kernel density
estimation is only developed for the univariate and the uncorrelated multivariate case, the
section develops a theoretical multivariate approach to Bayesian kernel density estimation.
This approach is theoretically capable of handling both correlated as well as uncorrelated
features in data. This is done through the assumption of a multivariate Gaussian kernel
function and the use of an inverse Wishart prior.