ci9b01188_si_001.pdf (447.5 kB)
Structural Analysis and Identification of False Positive Hits in Luciferase-Based Assays
journal contribution
posted on 2020-03-28, 12:29 authored by Zi-Yi Yang, Jie Dong, Zhi-Jiang Yang, Ai-Ping Lu, Ting-Jun Hou, Dong-Sheng CaoLuciferase-based
bioluminescence detection techniques are highly
favored in high-throughput screening (HTS), in which the firefly luciferase
(FLuc) is the most commonly used variant. However, FLuc inhibitors
can interfere with the activity of luciferase, which may result in
false positive signals in HTS assays. In order to reduce the unnecessary
cost of time and money, an in silico prediction model
for FLuc inhibitors is highly desirable. In this study, we built an
extensive data set consisting of 20 888 FLuc inhibitors and
198 608 noninhibitors, and then developed a group of classification
models based on the combination of three machine learning (ML) algorithms
and four types of molecular representations. The best prediction model
based on XGBoost and ECFP4 and MOE2d descriptors yielded a balanced
accuracy (BA) of 0.878 and an area under the receiver operating characteristic
curve (AUC) value of 0.958 for the validation set, and a BA of 0.886
and an AUC of 0.947 for the test set. Three external validation sets,
including set 1 (3231 FLuc inhibitors and 69 783 noninhibitors),
set 2 (695 FLuc inhibitors and 75 913 noninhibitors), and set
3 (1138 FLuc inhibitors and 8155 noninhibitors), were used to verify
the predictive ability of our models. The BA values for the three
external validation sets given by the best model are 0.864, 0.845,
and 0.791, respectively. In addition, the important features or structural
fragments related to FLuc inhibitors were recognized by the Shapley
additive explanations (SHAP) method along with their influences on
predictions, which may provide valuable clues to detecting undesirable
luciferase inhibitors. Based on the important and explanatory features,
16 rules were proposed for detecting FLuc inhibitors, which can achieve
a correction rate of 70% for FLuc inhibitors. Furthermore, a comparison
with existing prediction rules and models for FLuc inhibitors used
in virtual screening verified the high reliability of the models and
rules proposed in this study. We also used the model to screen three
curated chemical databases, and almost 10% of the molecules in the
evaluated databases were predicted as inhibitors, highlighting the
potential risk of false positives in luciferase-based assays. Finally,
a public web server called ChemFLuc was developed (http://admet.scbdd.com/chemfluc/index/), and it offers a free available service to predict potential FLuc
inhibitors.