Abstract:
HIV treatment programs face challenges in identifying patients at risk for loss-to-follow-up and
uncontrolled viremia. We applied predictive machine learning algorithms to anonymised, patientlevel HIV programmatic data from two districts in South Africa, 2016–2018. We developed patient
risk scores for two outcomes: (1) visit attendance≤ 28 days of the next scheduled clinic visit and
(2) suppression of the next HIV viral load (VL). Demographic, clinical, behavioral and laboratory
data were investigated in multiple models as predictor variables of attending the next scheduled
visit and VL results at the next test. Three classifcation algorithms (logistical regression, random
forest and AdaBoost) were evaluated for building predictive models. Data were randomly sampled
on a 70/30 split into a training and test set. The training set included a balanced set of positive and
negative examples from which the classifcation algorithm could learn. The predictor variable data
from the unseen test set were given to the model, and each predicted outcome was scored against
known outcomes. Finally, we estimated performance metrics for each model in terms of sensitivity,
specifcity, positive and negative predictive value and area under the curve (AUC). In total, 445,636
patients were included in the retention model and 363,977 in the VL model. The predictive metric
(AUC) ranged from 0.69 for attendance at the next scheduled visit to 0.76 for VL suppression,
suggesting that the model correctly classifed whether a scheduled visit would be attended in 2 of
3 patients and whether the VL result at the next test would be suppressed in approximately 3 of
4 patients. Variables that were important predictors of both outcomes included prior late visits,
number of prior VL tests, time since their last visit, number of visits on their current regimen, age, and
treatment duration. For retention, the number of visits at the current facility and the details of the
next appointment date were also predictors, while for VL suppression, other predictors included the
range of the previous VL value. Machine learning can identify HIV patients at risk for disengagement
and unsuppressed VL. Predictive modeling can improve the targeting of interventions through
diferentiated models of care before patients disengage from treatment programmes, increasing costefectiveness and improving patient outcomes.