Statistical properties of forward selection regression estimators
Loading...
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
University of Pretoria
Abstract
In practice, when one has many candidate variables as explanatory variables in multiple regression, there is always the possibility that variables that are important determinants of the response variable might be omitted from the model, while unimportant variables might be included. Both types of errors are important, and in this dissertation it is attempted to quantify the probabilities of these errors. A simulation study is reported in this dissertation. Different numbers of variables, i.e. p= 4 to 20 are assumed, and different sample sizes, i.e. n=0.5p, p, 2p, 4p. For each p the underlying model assumes that roughly half of the independent variables are actually correlated with the dependant variable and the other half not. The noise is ε~ N(0, σ2, where σ2, is set fixed. The data was simulated 10000 times for each combination of n and p using known underlying models and ε randomly selected from of a normal distribution. For this investigation the full model and forward selection regression are compared. The mean squared error of the estimated coefficient β(p) is determined from the true β of each n and p set. A full discussion, as well as graphs, is presented.
Description
Dissertation (MSc)--University of Pretoria, 2011.
Keywords
Regression, Multiple regression, Full mode, Bhat, Mean-squared error, Sample size, Forward selection, Number of variables, UCTD
Sustainable Development Goals
Citation
Thiebaut, NM 2011 Statistical properties of forward selection regression estimators , MSc dissertation, University of Pretoria, Pretoria, viewed yymmdd < http://hdl.handle.net/2263/27014 >