jm030584q_si_001.pdf (84.65 kB)
Download fileApplication of Predictive QSAR Models to Database Mining: Identification and Experimental Validation of Novel Anticonvulsant Compounds
journal contribution
posted on 2004-04-22, 00:00 authored by Min Shen, Cécile Béguin, Alexander Golbraikh, James P. Stables, Harold Kohn, Alexander TropshaWe have developed a drug discovery strategy that employs variable selection quantitative
structure-activity relationship (QSAR) models for chemical database mining. The approach
starts with the development of rigorously validated QSAR models obtained with the variable
selection k nearest neighbor (kNN) method (or, in principle, with any other robust model-building technique). Model validation is based on several statistical criteria, including the
randomization of the target property (Y-randomization), independent assessment of the training
set model's predictive power using external test sets, and the establishment of the model's
applicability domain. All successful models are employed in database mining concurrently; in
each case, only variables selected as a result of model building (termed descriptor pharmacophore) are used in chemical similarity searches comparing active compounds of the training
set (queries) with those in chemical databases. Specific biological activity (characteristic of
the training set compounds) of external database entries found to be within a predefined
similarity threshold of the training set molecules is predicted on the basis of the validated
QSAR models using the applicability domain criteria. Compounds judged to have high predicted
activities by all or the majority of all models are considered as consensus hits. We report on
the application of this computational strategy for the first time for the discovery of anticonvulsant agents in the Maybridge and National Cancer Institute (NCI) databases containing
ca. 250 000 compounds combined. Forty-eight anticonvulsant agents of the functionalized amino
acid (FAA) series were used to build kNN variable selection QSAR models. The 10 best models
were applied to mining chemical databases, and 22 compounds were selected as consensus
hits. Nine compounds were synthesized and tested at the NIH Epilepsy Branch, Rockville,
MD using the same biological test that was employed to assess the anticonvulsant activity of
the training set compounds; of these nine, four were exact database hits and five were derived
from the hits by minor chemical modifications. Seven of these nine compounds were confirmed
to be active, indicating an exceptionally high hit rate. The approach described in this report
can be used as a general rational drug discovery tool.
History
Usage metrics
Read the peer-reviewed publication
Categories
Keywords
chemical similarity searcheschemical database miningpredefined similarity thresholdNovel Anticonvulsant Compoundsapplicability domain criteriaNational Cancer InstituteMD250 000 compoundsdrug discovery strategyPredictive QSAR Modelsdrug discovery toolselection QSAR modelstrainingNIH Epilepsy BranchFAAQSAR modelsmining chemical databasesk NNconsensus hitsNCI