Exploring machine learning classification for community based health insurance enrollment in Ethiopia

Abstract

BACKGROUND : Community-based health insurance (CBHI) is a vital tool for achieving universal health coverage (UHC), a key global health priority outlined in the sustainable development goals (SDGs). Sub-Saharan Africa continues to face challenges in achieving UHC and protecting individuals from the financial burden of disease. As a result, CBHI has become popular in low- and middle-income countries, including Ethiopia. Therefore, this study aimed to identify the ML algorithm with the best predictive accuracy for CBHI enrollment and to determine the most influential predictors among the dataset. METHODS : The 2019 Ethiopian Mini Demographic and Health Survey (EMDHS) data were used. The CBHI were predicted using seven machine learning models: linear discriminant analysis (LDA), support vector machine with radial basis function (SVM), k-nearest neighbors (KNN), classification and regression tree (CART), and random forest (RF). Receiver operating characteristic curves and other metrics were used to evaluate each model’s accuracy. RESULTS : The RF algorithm was determined to be the best machine learning model based on different performance assessments. The result indicates that age, wealth index, household members, and land usage all significantly affect CBHI in Ethiopia. CONCLUSION : This study found that RF machine learning models could improve the ability to classify CBHI in Ethiopia with high accuracy. Age, wealth index, household members, and land utilization are some of the most significant variables associated with CBHI that were determined by feature importance. The results of the study can help health professionals and policymakers create focused strategies to improve CBHI enrollment in Ethiopia.

Description

AVAILABILITY DATA STATEMENT : The datasets presented in this study can be found in online repositories.

Keywords

Machine learning, Health insurance, Random forest, Accuracy, Ethiopia, Community-based health insurance (CBHI), Universal health coverage (UHC), Sub-Saharan Africa (SSA), Low- and middle-income countries (LMICs), Llinear discriminant analysis (LDA), Support vector machine (SVM), K-nearest neighbors (KNN), Classification and regression tree (CART)

Sustainable Development Goals

SDG-01: No poverty
SDG-09: Industry, innovation and infrastructure

Citation

Yilema, S.A., Shiferaw, Y.A., Moyehodie, Y.A., Fenta, S.M., Belay, D.B., Fenta, H.M., Nigussie, T.Z. & Chen, D.-G. (2025) Exploring machine learning classification for community based health insurance enrollment in Ethiopia. Frontiers in Public Health 13:1549210. doi: 10.3389/fpubh.2025.1549210.