Machine-Learning-Assisted Approach for Discovering Novel Inhibitors Targeting Bromodomain-Containing Protein 4
datasetposted on 21.06.2017 by Jing Xing, Wenchao Lu, Rongfeng Liu, Yulan Wang, Yiqian Xie, Hao Zhang, Zhe Shi, Hao Jiang, Yu-Chih Liu, Kaixian Chen, Hualiang Jiang, Cheng Luo, Mingyue Zheng
Datasets usually provide raw data for analysis. This raw data often comes in spreadsheet form, but can be any collection of data, on which analysis can be performed.
Bromodomain-containing protein 4 (BRD4) is implicated in the pathogenesis of a number of different cancers, inflammatory diseases and heart failure. Much effort has been dedicated toward discovering novel scaffold BRD4 inhibitors (BRD4is) with different selectivity profiles and potential antiresistance properties. Structure-based drug design (SBDD) and virtual screening (VS) are the most frequently used approaches. Here, we demonstrate a novel, structure-based VS approach that uses machine-learning algorithms trained on the priori structure and activity knowledge to predict the likelihood that a compound is a BRD4i based on its binding pattern with BRD4. In addition to positive experimental data, such as X-ray structures of BRD4–ligand complexes and BRD4 inhibitory potencies, negative data such as false positives (FPs) identified from our earlier ligand screening results were incorporated into our knowledge base. We used the resulting data to train a machine-learning model named BRD4LGR to predict the BRD4i-likeness of a compound. BRD4LGR achieved a 20–30% higher AUC-ROC than that of Glide using the same test set. When conducting in vitro experiments against a library of previously untested, commercially available organic compounds, the second round of VS using BRD4LGR generated 15 new BRD4is. Moreover, inverting the machine-learning model provided easy access to structure–activity relationship (SAR) interpretation for hit-to-lead optimization.