posted on 2021-07-25, 17:43authored byFarid Nasiri, Fereshteh Fallah Atanaki, Saman Behrouzi, Kaveh Kavousi, Mojtaba Bagheri
Cell-penetrating
anticancer peptides (Cp-ACPs) are considered promising
candidates in solid tumor and hematologic cancer therapies. Current
approaches for the design and discovery of Cp-ACPs trust the expensive
high-throughput screenings that often give rise to multiple obstacles,
including instrumentation adaptation and experimental handling. The
application of machine learning (ML) tools developed for peptide activity
prediction is importantly of growing interest. In this study, we applied
the random forest (RF)-, support vector machine (SVM)-, and eXtreme
gradient boosting (XGBoost)-based algorithms to predict the active
Cp-ACPs using an experimentally validated data set. The model, CpACpP,
was developed on the basis of two independent cell-penetrating peptide
(CPP) and anticancer peptide (ACP) subpredictors. Various compositional
and physiochemical-based features were combined or selected using
the multilayered recursive feature elimination (RFE) method for both
data sets. Our results showed that the ACP subclassifiers obtain a
mean performance accuracy (ACC) of 0.98 with an area under curve (AUC)
≈ 0.98 vis-à-vis the CPP predictors displaying relevant
values of ∼0.94 and ∼0.95 via the hybrid-based features
and independent data sets, respectively. Also, the predicting evaluation
of Cp-ACPs gave accuracies of ∼0.79 and 0.89 on a series of
independent sequences by applying our CPP and ACP classifiers, respectively,
which leaves the performance of our predictors better than the earlier
reported ACPred, mACPpred, MLCPP, and CPPred-RF. The described consensus-based
fusion method additionally reached an AUC of 0.94 for the prediction
of Cp-ACP (http://cbb1.ut.ac.ir/CpACpP/Index).