posted on 2023-05-12, 13:35authored byXu Qian, Xiaowen Dai, Lin Luo, Mingde Lin, Yuan Xu, Yang Zhao, Dingfang Huang, Haodi Qiu, Li Liang, Haichun Liu, Yingbo Liu, Lingxi Gu, Tao Lu, Yadong Chen, Yanmin Zhang
The cyclin-dependent protein kinases (CDKs) are protein-serine/threonine
kinases with crucial effects on the regulation of cell cycle and transcription.
CDKs can be a hallmark of cancer since their excessive expression
could lead to impaired cell proliferation. However, the selectivity
profile of most developed CDK inhibitors is not enough, which have
hindered the therapeutic use of CDK inhibitors. In this study, we
propose a multitask deep learning framework called BiLAT based on
SMILES representation for the prediction of the inhibitory activity
of molecules on eight CDK subtypes (CDK1, 2, 4–9). The framework
is mainly composed of an improved bidirectional long short-term memory
module BiLSTM and the encode layer of the Transformer framework. Additionally,
the data enhancement method of SMILES enumeration is applied to improve
the performance of the model. Compared with baseline predictive models
based on three conventional machine learning methods and two multitask
deep learning algorithms, BiLAT achieves the best performance with
the highest average AUC, ACC, F1-score, and MCC values of 0.938, 0.894,
0.911, and 0.715 for the test set. Moreover, we constructed a targeted
external data set CDK-Dec for the CDK family, which mainly contains
bait values screened by 3D similarity with active compounds. This
dataset was utilized in the subsequent evaluation of our model. It
is worth mentioning that the BiLAT model is interpretable and can
be used by chemists to design and synthesize compounds with improved
activity. To further verify the generalization ability of the multitask
BiLAT model, we also conducted another evaluation on three public
datasets (Tox21, ClinTox, and SIDER). Compared with several currently
popular models, BiLAT shows the best performance on two datasets.
These results indicate that BiLAT is an effective tool for accelerating
drug discovery.