Classification of Cyclooxygenase‑2 Inhibitors Using Support Vector Machine and Random Forest Methods
datasetposted on 14.02.2019, 00:00 by Zijian Qin, Yao Xi, Shengde Zhang, Guiping Tu, Aixia Yan
This work reports the classification study conducted on the biggest COX-2 inhibitor data set so far. Using 2925 diverse COX-2 inhibitors collected from 168 pieces of literature, we applied machine learning methods, support vector machine (SVM) and random forest (RF), to develop 12 classification models. The best SVM and RF models resulted in MCC values of 0.73 and 0.72, respectively. The 2925 COX-2 inhibitors were reduced to a data set of 1630 molecules by removing intermediately active inhibitors, and 12 new classification models were constructed, yielding MCC values above 0.72. The best MCC value of the external test set was predicted to be 0.68 by the RF model using ECFP_4 fingerprints. Moreover, the 2925 COX-2 inhibitors were clustered into eight subsets, and the structural features of each subset were investigated. We identified substructures important for activity including halogen, carboxyl, sulfonamide, and methanesulfonyl groups, as well as the aromatic nitrogen atoms. The models developed in this study could serve as useful tools for compound screening prior to lab tests.