Interaction Model Based on Local Protein Substructures Generalizes to the Entire Structural Enzyme-Ligand Space
datasetposted on 24.11.2008, 00:00 by Helena Strömbergsson, Pawel Daniluk, Andriy Kryshtafovych, Krzysztof Fidelis, Jarl E. S. Wikberg, Gerard J. Kleywegt, Torgeir R. Hvidsten
Chemogenomics is a new strategy in in silico drug discovery, where the ultimate goal is to understand molecular recognition for all molecules interacting with all proteins in the proteome. To study such cross interactions, methods that can generalize over proteins that vary greatly in sequence, structure, and function are needed. We present a general quantitative approach to protein−ligand binding affinity prediction that spans the entire structural enzyme-ligand space. The model was trained on a data set composed of all available enzymes cocrystallized with druglike ligands, taken from four publicly available interaction databases, for which a crystal structure is available. Each enzyme was characterized by a set of local descriptors of protein structure that describe the binding site of the cocrystallized ligand. The ligands in the training set were described by traditional QSAR descriptors. To evaluate the model, a comprehensive test set consisting of enzyme structures and ligands was manually curated. The test set contained enzyme-ligand complexes for which no crystal structures were available, and thus the binding modes were unknown. The test set enzymes were therefore characterized by matching their entire structures to the local descriptor library constructed from the training set. Both the training and the test set contained enzyme-ligand complexes from all major enzyme classes, and the enzymes spanned a large range of sequences and folds. The experimental binding affinities (pKi) ranged from 0.5 to 11.9 (0.7−11.0 in the test set). The induced model predicted the binding affinities of the external test set enzyme-ligand complexes with an r2 of 0.53 and an RMSEP of 1.5. This demonstrates that the use of local descriptors makes it possible to create rough predictive models that can generalize over a wide range of protein targets.