American Chemical Society
ci8b00551_si_001.pdf (2.28 MB)

In Silico Prediction of Endocrine Disrupting Chemicals Using Single-Label and Multilabel Models

Download (2.28 MB)
journal contribution
posted on 2019-02-26, 00:00 authored by Lixia Sun, Hongbin Yang, Yingchun Cai, Weihua Li, Guixia Liu, Yun Tang
Endocrine disruption (ED) has become a serious public health issue and also poses a significant threat to the ecosystem. Due to complex mechanisms of ED, traditional in silico models focusing on only one mechanism are insufficient for detection of endocrine disrupting chemicals (EDCs), let alone offering an overview of possible action mechanisms for a known EDC. To remove these limitations, in this study both single-label and multilabel models were constructed across six ED targets, namely, AR (androgen receptor), ER (estrogen receptor alpha), TR (thyroid receptor), GR (glucocorticoid receptor), PPARg (peroxisome proliferator-activated receptor gamma), and aromatase. Two machine learning methods were used to build the single-label models, with multiple random under-sampling combining voting classification to overcome the challenge of data imbalance. Four methods were explored to construct the multilabel models that can predict the interaction of one EDC against multiple targets simultaneously. The single-label models of all the six targets have achieved reasonable performance with balanced accuracy (BA) values from 0.742 to 0.816. Each top single-label model was then joined to predict the multilabel test set with BA values from 0.586 to 0.711. The multilabel models could offer a significant boost over the single-label baselines with BA values for the multilabel test set from 0.659 to 0.832. Therefore, we concluded that single-label models could be employed for identification of potential EDCs, while multilabel ones are preferable for prediction of possible mechanisms of known EDCs.