posted on 2022-12-06, 14:08authored byTeng-Zhi Long, Shao-Hua Shi, Shao Liu, Ai-Ping Lu, Zhao-Qian Liu, Min Li, Ting-Jun Hou, Dong-Sheng Cao
Hematotoxicity has been becoming
a serious but overlooked toxicity
in drug discovery. However, only a few in silico models
have been reported for the prediction of hematotoxicity. In this study,
we constructed a high-quality dataset comprising 759 hematotoxic compounds
and 1623 nonhematotoxic compounds and then established a series of
classification models based on a combination of seven machine learning
(ML) algorithms and nine molecular representations. The results based
on two data partitioning strategies and applicability domain (AD)
analysis illustrate that the best prediction model based on Attentive
FP yielded a balanced accuracy (BA) of 72.6%, an area under the receiver
operating characteristic curve (AUC) value of 76.8% for the validation
set, and a BA of 69.2%, an AUC of 75.9% for the test set. In addition,
compared with existing filtering rules and models, our model achieved
the highest BA value of 67.5% for the external validation set. Additionally,
the shapley additive explanation (SHAP) and atom heatmap approaches
were utilized to discover the important features and structural fragments
related to hematotoxicity, which could offer helpful tips to detect
undesired positive substances. Furthermore, matched molecular pair
analysis (MMPA) and representative substructure derivation technique
were employed to further characterize and investigate the transformation
principles and distinctive structural features of hematotoxic chemicals.
We believe that the novel graph-based deep learning algorithms and
insightful interpretation presented in this study can be used as a
trustworthy and effective tool to assess hematotoxicity in the development
of new drugs.