ci400120b_si_001.pdf (22.44 kB)
SFCscoreRF: A Random Forest-Based Scoring Function for Improved Affinity Prediction of Protein–Ligand Complexes
journal contribution
posted on 2016-02-19, 00:01 authored by David Zilian, Christoph A. SotrifferA major
shortcoming of empirical scoring functions for protein–ligand
complexes is the low degree of correlation between predicted and experimental
binding affinities, as frequently observed not only for large and
diverse data sets but also for SAR series of individual targets. Improvements
can be envisaged by developing new descriptors, employing larger training
sets of higher quality, and resorting to more sophisticated regression
methods. Herein, we describe the use of SFCscore descriptors to develop
an improved scoring function by means of a PDBbind training set of
1005 complexes in combination with random forest for regression. This
provided SFCscoreRF as a new scoring function
with significantly improved performance on the PDBbind and CSAR–NRC
HiQ benchmarks in comparison to previously developed SFCscore functions.
A leave-cluster-out cross-validation and performance in the CSAR 2012
scoring exercise point out remaining limitations but also directions
for further improvements of SFCscoreRF and empirical scoring functions in general.