Abstract:
Efficacy data from diverse chemical libraries,
screened against the various stages of the malaria parasite
Plasmodium falciparum, including asexual blood stage (ABS)
parasites and transmissible gametocytes, serve as a valuable
reservoir of information on the chemical space of compounds
that are either active (or not) against the parasite. We postulated
that this data can be mined to define chemical features associated
with the sole ABS activity and/or those that provide additional life
cycle activity profiles like gametocytocidal activity. Additionally,
this information could provide chemical features associated with
inactive compounds, which could eliminate any future unnecessary screening of similar chemical analogs. Therefore, we aimed to use
machine learning to identify the chemical space associated with stage-specific antimalarial activity. We collected data from various
chemical libraries that were screened against the asexual (126 374 compounds) and sexual (gametocyte) stages of the parasite (93
941 compounds), calculated the compounds’ molecular fingerprints, and trained machine learning models to recognize stage-specific
active and inactive compounds. We were able to build several models that predict compound activity against ABS and dual activity
against ABS and gametocytes, with Support Vector Machines (SVM) showing superior abilities with high recall (90 and 66%) and
low false-positive predictions (15 and 1%). This allowed the identification of chemical features enriched in active and inactive
populations, an important outcome that could be mined for essential chemical features to streamline hit-to-lead optimization
strategies of antimalarial candidates. The predictive capabilities of the models held true in diverse chemical spaces, indicating that the
ML models are therefore robust and can serve as a prioritization tool to drive and guide phenotypic screening and medicinal
chemistry programs.
Description:
DATA AVAILABILITY STATEMENT : Code Availability Statement: All python scripts for clustering, undersampling and model building as well as evaluation can be
obtained from github: http://github.com/M2PL/Machines-
Against-Malaria. To facilitate model usage, we have also
incorporated the models in the Ersilia Model Hub (https://
www.ersilia.io/model-hub; identifier eos80ch).
SUPPORTING INFORMATION : Information on hyperparameters used for model building and additional performance metrics of model predictions in representative and novel chemical spaces (PDF) Chemical information on compounds within the databases used for ML as well as performance metrics of models trained on imbalanced/oversampled/undersampled data using either ECFP or MACCS molecular fingerprints as well as information on enriched ECFP features for activity/inactivity against ABS and/or gametocytes. SMILES contains Simplified Molecular Input Line Entry System (SMILES) of compounds used for machine learning.