Abstract:
This mini-dissertation seeks to provide the reader with an understanding of one
of the most popular boosting methods in use today called Adaboost and its first extension
Adaboost.M1. Boosting, as the name suggests, is an ensemble and machine
learningmethod created to improve or "boost" prediction accuracy via repeatedMonte-
Carlo type simulations. Due to the methods flexibility to be applied over any learning
algorithm, in this dissertation we have chosen to make use of decision trees, or more
specifically classification trees constructed by the CART method, as a base predictor.
The reason for boosting classification trees include the learning algorithms lack of accuracy
when applied on a stand-alone basis in many settings, its practical real world
application and the ability for classification trees to perform natural internal feature
selection. The core topics covered include where the Adaboost method arose from,
how and why it works, possible issues with the method and examples using classification
trees as the base predictor to demonstrate and assess the methods performance.
Although no formal mathematical derivation of the method was provided at the time
the method was created, a statistical justification was put forward several years later
which explained Adaboost in terms of well known additive modelling when minimizing
a specific exponential loss function or criterion. This justification is provided along with real and simulated examples demonstrating Adaboost’s performance using two
types of classification trees i.e. stumps (classification trees with two terminal nodes)
and optimized or pruned full trees. What is shown empirically is that when boosting
tree stumps the performance enhancements achieved by Adaboost in many cases
meets or exceeds the single or boosted larger tree structures. This finding has benefits
such as simplified model structures and lower computational time. Lastly we provide
a cursory review of new developments within the field of boosting such as margin theory
which seeks to provide an explanation as to the methods seemingly mysterious test
and training error performance; optimized tree boosting procedures such as gradient
boosted methods and combinatorial ensemble methods using bagging and boosting.