Abstract:
OBJECTIVE : To systematically evaluate the development of Machine Learning (ML) models and compare their diagnostic
accuracy for the classification of Middle Ear Disorders (MED) using Tympanic Membrane (TM) images.
METHODS : PubMed, EMBASE, CINAHL, and CENTRAL were searched up until November 30, 2021. Studies on the development
of ML approaches for diagnosing MED using TM images were selected according to the inclusion criteria. PRISMA guidelines
were followed with study design, analysis method, and outcomes extracted. Sensitivity, specificity, and area under the
curve (AUC) were used to summarize the performance metrics of the meta-analysis. Risk of Bias was assessed using the Quality
Assessment of Diagnostic Accuracy Studies-2 tool in combination with the Prediction Model Risk of Bias Assessment Tool.
RESULTS : Sixteen studies were included, encompassing 20254 TM images (7025 normal TM and 13229 MED). The sample
size ranged from 45 to 6066 per study. The accuracy of the 25 included ML approaches ranged from 76.00% to 98.26%.
Eleven studies (68.8%) were rated as having a low risk of bias, with the reference standard as the major domain of high risk
of bias (37.5%). Sensitivity and specificity were 93% (95% CI, 90%–95%) and 85% (95% CI, 82%–88%), respectively. The
AUC of total TM images was 94% (95% CI, 91%–96%). The greater AUC was found using otoendoscopic images than otoscopic
images.
CONCLUSIONS : ML approaches perform robustly in distinguishing between normal ears and MED, however, it is proposed
that a standardized TM image acquisition and annotation protocol should be developed.