Public Data Set of Protein–Ligand Dissociation Kinetic Constants for Quantitative Structure–Kinetics Relationship Studies
datasetposted on 2022-05-26, 14:34 authored by Huisi Liu, Minyi Su, Hai-Xia Lin, Renxiao Wang, Yan Li
Protein–ligand binding affinity reflects the equilibrium thermodynamics of the protein–ligand binding process. Binding/unbinding kinetics is the other side of the coin. Computational models for interpreting the quantitative structure–kinetics relationship (QSKR) aim at predicting protein–ligand binding/unbinding kinetics based on protein structure, ligand structure, or their complex structure, which in principle can provide a more rational basis for structure-based drug design. Thus far, most of the public data sets used for deriving such QSKR models are rather limited in sample size and structural diversity. To tackle this problem, we have compiled a set of 680 protein–ligand complexes with experimental dissociation rate constants (koff), which were mainly curated from the references accumulated for updating our PDBbind database. Three-dimensional structure of each protein–ligand complex in this data set was either retrieved from the Protein Data Bank or carefully modeled based on a proper template. The entire data set covers 155 types of protein, with their dissociation kinetic constants (koff) spanning nearly 10 orders of magnitude. To the best of our knowledge, this data set is the largest of its kind reported publicly. Utilizing this data set, we derived a random forest (RF) model based on protein–ligand atom pair descriptors for predicting koff values. We also demonstrated that utilizing modeled structures as additional training samples will benefit the model performance. The RF model with mixed structures can serve as a baseline for testifying other more sophisticated QSKR models. The whole data set, namely, PDBbind-koff-2020, is available for free download at our PDBbind-CN web site (http://www.pdbbind.org.cn/download.php).
Read the peer-reviewed publication
kind reported publiclyhttp :// wwwadditional training samplescarefully modeled basedbased drug designwhole data setpublic data setcn web siteunbinding kinetics basedprotein data bank</ sub >),utilizing modeled structuressophisticated qskr modelsunbinding kineticsdata setmodel based</ submixed structuresqskr modelscomputational models>< subk </2020 </thus farstructural diversitysample sizereferences accumulatedrational basisrather limitedrandom forestproper templatephp ).model performancemainly curatedequilibrium thermodynamicseither retrievedalso demonstrated