ci300146h_si_001.pdf (1.21 MB)
GA(M)E-QSAR: A Novel, Fully Automatic Genetic-Algorithm-(Meta)-Ensembles Approach for Binary Classification in Ligand-Based Drug Design
journal contribution
posted on 2016-02-20, 10:52 authored by Yunierkis Pérez-Castillo, Cosmin Lazar, Jonatan Taminau, Mathy Froeyen, Miguel Ángel Cabrera-Pérez, Ann NowéComputer-aided drug design has become an important component
of
the drug discovery process. Despite the advances in this field, there
is not a unique modeling approach that can be successfully applied
to solve the whole range of problems faced during QSAR modeling. Feature
selection and ensemble modeling are active areas of research in ligand-based
drug design. Here we introduce the GA(M)E-QSAR algorithm that combines
the search and optimization capabilities of Genetic Algorithms with
the simplicity of the Adaboost ensemble-based classification algorithm
to solve binary classification problems. We also explore the usefulness
of Meta-Ensembles trained with Adaboost and Voting schemes to further
improve the accuracy, generalization, and robustness of the optimal
Adaboost Single Ensemble derived from the Genetic Algorithm optimization.
We evaluated the performance of our algorithm using five data sets
from the literature and found that it is capable of yielding similar
or better classification results to what has been reported for these
data sets with a higher enrichment of active compounds relative to
the whole actives subset when only the most active chemicals are considered.
More important, we compared our methodology with state of the art
feature selection and classification approaches and found that it
can provide highly accurate, robust, and generalizable models. In
the case of the Adaboost Ensembles derived from the Genetic Algorithm
search, the final models are quite simple since they consist of a
weighted sum of the output of single feature classifiers. Furthermore,
the Adaboost scores can be used as ranking criterion to prioritize
chemicals for synthesis and biological evaluation after virtual screening
experiments.