Abstract:
In recent years, machine learning (ML) has become a pivotal tool for predicting and
diagnosing thyroid disease. While many studies have explored the use of individual ML models for thyroid
disease detection, the accuracy and robustness of these single-model approaches are often constrained
by data imbalance and inherent model biases. This study introduces a filter-based feature selection and
stacking-based ensemble ML framework, tailored specifically for thyroid disease detection. This framework
capitalizes on the collective strengths of multiple base models by aggregating their predictions, aiming
to surpass the predictive performance of individual models. Such an approach can also reduce screening
time and costs considering few clinical attributes are used for diagnosis. Through extensive experiments
conducted on a clinical thyroid disease dataset, the filter-based feature selection approach and the ensemble
learning method demonstrated superior discriminative ability, reflected by improved receiver operating
characteristic-area under the curve (ROC-AUC) scores of 99.9%. The proposed framework sheds light
on the complementary strengths of different base models, fostering a deeper understanding of their joint
predictive performance. Our findings underscore the potential of ensemble strategies to significantly improve
the efficacy of ML-based detection of thyroid diseases, marking a shift from reliance on single models to
more robust, collective approaches.