Development and Validation of Decision Forest Model for Estrogen Receptor Binding Prediction of Chemicals Using Large Data Sets
journal contributionposted on 21.12.2015, 00:00 by Hui Wen Ng, Stephen W. Doughty, Heng Luo, Hao Ye, Weigong Ge, Weida Tong, Huixiao Hong
Some chemicals in the environment possess the potential to interact with the endocrine system in the human body. Multiple receptors are involved in the endocrine system; estrogen receptor α (ERα) plays very important roles in endocrine activity and is the most studied receptor. Understanding and predicting estrogenic activity of chemicals facilitates the evaluation of their endocrine activity. Hence, we have developed a decision forest classification model to predict chemical binding to ERα using a large training data set of 3308 chemicals obtained from the U.S. Food and Drug Administration’s Estrogenic Activity Database. We tested the model using cross validations and external data sets of 1641 chemicals obtained from the U.S. Environmental Protection Agency’s ToxCast project. The model showed good performance in both internal (92% accuracy) and external validations (∼70–89% relative balanced accuracies), where the latter involved the validations of the model across different ER pathway-related assays in ToxCast. The important features that contribute to the prediction ability of the model were identified through informative descriptor analysis and were related to current knowledge of ER binding. Prediction confidence analysis revealed that the model had both high prediction confidence and accuracy for most predicted chemicals. The results demonstrated that the model constructed based on the large training data set is more accurate and robust for predicting ER binding of chemicals than the published models that have been developed using much smaller data sets. The model could be useful for the evaluation of ERα-mediated endocrine activity potential of environmental chemicals.