Similarity Searching for Potent Compounds Using Feature Selection

Martin Vogt, Jürgen Bajorath
In similarity searching, compound potency is usually not taken into account. Given a set of active reference compounds, similarity to database molecules is calculated using different metrics without considering compound potency as a search parameter. Herein, we introduce a feature selection method for fingerprint similarity searching to maximize compound recall and preferentially detect potent compounds. On the basis of training examples, fingerprint features are selected that identify potent compounds and produce high recall. Using the reduced fingerprint representations, potent hits are preferentially detected, even if reference compounds have only moderate or low potency. Small sets of simple chemical features are found to yield high search performance.