Abstract:
Literature on the estimation of standard errors of estimators of variable values for complex samples is not so commonly obtainable as that concerning simple random sampling with independent observations. The complexity of a sample is formed by stratification, clustering and possible unequal selection probabilities. It is important to take the complexity of the complex sample into account when estimating variances, this is the main purpose of this study. Except for the standard errors of estimators, there are two other measures that are worth using during the analysis of the data resulting from a complex sample, namely the design effect (deff) and the intraclass correlation coefficient (rah= 'rate of homogeneity'). The design effect is per definition the variance of an estimator of population parameters in a complex sample in relation to the variance of the same estimator ignoring the complexity of the sample. The intraclass correlation coefficient, rah, measures the variation in a variable within the primary or first stage sampling units (PSU's) of the sample in comparison with the variance in the variable in the complete sample. The concept portability is discussed and practically illustrated. Portability refers to the possibility of carrying over from one subclass to another, from one variable to another or from one survey to another, the conclusions drawn regarding the sampling error, deff value or rah value. Two important subclasses are cross classes and segregated classes. Cross classes cut across the whole sample and contain a part of each PSU, for example sex and age. Segregated classes do not cut through any PSU's, but contain a few PSU's in totality for example geographical regions and race groups. The deff value as a measure of portability is more widely applicable in practice than the standard error, and is especially suitable for segregated classes, since the size of the PSU's and the sampling design which may have a drastic effect on the value of the design effect do not change drastically. Of the three possible measures the rah value as a measure of portability is the widest applicable in practice and is especially suitable for cross classes since the size of the PSU's, which might differ drastically, has little, if any, effect on the roh value. The programme CLUSTERS (developed by the World Fertility Survey) is used to investigate portability by using HSRC sampling data. CLUSTERS make use of the Taylor linearization method for variance estimation, which has a big shortcoming in the sense that it is restricted to the variance estimation of linear estimators (for example means or proportions) of variable values. Two repeated replication methods of variance estimation have been developed and are discussed in the literature to overcome this shortcoming. The two repeated replication methods, namely Balanced Repeated Replication (BRR) and Jack-knife Repeated Replication (JRR) as well as the Taylor linearization method are discussed. The values of the variance estimate for linear estimators of variable values, using the HSRC sampling survey data, are compared for all three methods of variance estimation with the aid of CLUSTERS and a BRRJRR programme developed by the author.