Abstract:
BACKGROUND : Multilevel logistic regression models are widely used in health sciences research to account for
clustering in multilevel data when estimating effects on subject binary outcomes of individual-level and cluster-level
covariates. Several measures for quantifying between-cluster heterogeneity have been proposed. This study
compared the performance of between-cluster variance based heterogeneity measures (the Intra-class Correlation
Coefficient (ICC) and the Median Odds Ratio (MOR)), and cluster-level covariate based heterogeneity measures (the
80% Interval Odds Ratio (IOR-80) and the Sorting Out Index (SOI)).
METHODS : We used several simulation datasets of a two-level logistic regression model to assess the performance of
the four clusteringmeasures for a multilevel logistic regression model. We also empirically compared the four measures of
cluster variation with an analysis of childhood anemia to investigate the importance of unexplained heterogeneity
between communities and community geographic type (rural vs urban) effect in Malawi.
RESULTS : Our findings showed that the estimates of SOI and ICC were generally unbiased with at least 10 clusters and
a cluster size of at least 20. On the other hand, estimates of MOR and IOR-80 were less accurate with 50 or fewer
clusters regardless of the cluster size. The performance of the four clustering measures improved with increased
clusters and cluster size at all cluster variances. In the analysis of childhood anemia, the estimate of the
between-community variance was 0.455, and the effect of community geographic type (rural vs urban) had an odds
ratio (OR)=1.21 (95% CI: 0.97, 1.52). The resulting estimates of ICC, MOR, IOR-80 and SOI were 0.122 (indicative of low
homogeneity of childhood anemia in the same community); 1.898 (indicative of large unexplained heterogeneity);
0.345-3.978 and 56.7% (implying that the between community heterogeneity was more significant in explaining the
variations in childhood anemia than the estimated effect of community geographic type (rural vs urban)), respectively.
CONCLUSION : At least 300 clusters with sizes of at least 50 would be adequate to estimate the strength of clustering in
multilevel logistic regression with negligible bias. We recommend using the SOI to assess unexplained heterogeneity
between clusters when the interest also involves the effect of cluster-level covariates, otherwise, the usual intra-cluster
correlation coefficient would suffice in multilevel logistic regression analyses.