The vast number of different study variables or population characteristics and the different domains of interest in a survey, make it impractical and almost impossible to calculate and publish standard errors for each estimated value of a population variable or characteristic and each domain individually. Since estimated values are subject to statistical variation (resulting from the probability sampling), standard errors may not be omitted from the survey report. Estimated values can be evaluated only if their precision is known. The purpose of this research project is to study the feasibility of mathematical modeling to estimate the standard errors of estimated values of population parameters or characteristics in survey data sets and to investigate effective and user-friendly presentation methods of these models in reports. The following data sets were used in the investigation: • October Household Survey (OHS) 1995 - Workers and Household data set • OHS 1996 - Workers and Household data set • OHS 1997 - Workers and Household data set • Victims of Crime Survey (VOC) 1998 The basic methodology consists of the estimation of standard errors of the statistics considered in the survey for a variety of domains (such as the whole country, provinces, urban/rural areas, population groups, gender and age groups as well as combinations of these). This is done by means of a computer program that takes into consideration the complexity of the different sample designs. The direct calculated standard errors were obtained in this way. Different models are then fitted to the data by means of regression modeling in the search for a suitable standard error model. A function of the direct calculated standard error value served as the dependent variable and a function of the size of the statistic served as the independent variable. A linear model, equating the natural logarithm of the coefficient of relative variation of a statistic to a linear function of the natural logarithm of the size of the statistic, gave an adequate fit in most of the cases. Well-known tests for the occurrence of outliers were applied in the model fitting procedure. For each observation indicated as an outlier, it was established whether the observation could be deleted legitimately (e.g. when the domain sample size was too small, or the estimate biased). Afterwards the fitting procedure was repeated. The Australian Bureau of Statistics also uses the above model in similar surveys. They derived this model especially for variables that count people in a specific category. It was found that this model performs equally well when the variable of interest counts households or incidents as in the case of the VOC. The set of domains considered in the fitting procedure included segregated classes, mixed classes and cross-classes. Thus, the model can be used irrespective of the type of subclass domain. This result makes it possible to approximate standard errors for any type of domain with the same model. The fitted model, as a mathematical formula, is not a user-friendly presentation method of the precision of estimates. Consequently, user-friendly and effective presentation methods of standard errors are summarized in this report. The suitability of a specific presentation method, however, depends on the extent of the survey and the number of study variables involved.
Dissertation (MSc (Mathematical Statistics))--University of Pretoria, 2007.