Ligand Prediction for Orphan Targets Using Support Vector Machines and Various Target-Ligand Kernels Is Dominated by Nearest Neighbor Effects
journal contributionposted on 26.10.2009, 00:00 by Anne Mai Wassermann, Hanna Geppert, Jürgen Bajorath
Support vector machine (SVM) calculations combining protein and small molecule information have been applied to identify ligands for simulated orphan targets (i.e., targets for which no ligands were available). The combination of protein and ligand information was facilitated through the design of target-ligand kernel functions that account for pairwise ligand and target similarity. The design and biological information content of such kernel functions was expected to play a major role for target-directed ligand prediction. Therefore, a variety of target-ligand kernels were implemented to capture different types of target information including sequence, secondary structure, tertiary structure, biophysical properties, ontologies, or structural taxonomy. These kernels were tested in ligand predictions for simulated orphan targets in two target protein systems characterized by the presence of different intertarget relationships. Surprisingly, although there were target- and set-specific differences in prediction rates for alternative target-ligand kernels, the performance of these kernels was overall similar and also similar to SVM linear combinations. Test calculations designed to better understand possible reasons for these observations revealed that ligand information provided by nearest neighbors of orphan targets significantly influenced SVM performance, much more so than the inclusion of protein information. As long as ligands of closely related neighbors of orphan targets were available for SVM learning, orphan target ligands could be well predicted, regardless of the type and sophistication of the kernel function that was used. These findings suggest simplified strategies for SVM-based ligand prediction for orphan targets.