Improving Docking Results via Reranking of Ensembles of Ligand Poses in Multiple X‑ray Protein Conformations with MM-GBSA

There is a tendency in the literature to be critical of scoring functions when docking programs perform poorly. The assumption is that existing scoring functions need to be enhanced or new ones developed in order to improve the performance of docking programs for tasks such as pose prediction and virtual screening. However, failures can result from either sampling or scoring (or a combination of the two), although less emphasis tends to be given to the former. In this work, we use the programs GOLD and Glide on a high-quality data set to explore whether failures in pose prediction and binding affinity estimation can be attributable more to sampling or scoring. We show that identification of the correct pose (docking power) can be improved by incorporating ligand strain into the scoring function or rescoring an ensemble of diverse docking poses with MM-GBSA in a postprocessing step. We explore the use of nondefault docking settings and find that enhancing ligand sampling also improves docking power, again suggesting that sampling is more limiting than scoring for the docking programs investigated in this work. In cross-docking calculations (docking a ligand to a noncognate receptor structure) we observe a significant reduction in the accuracy of pose ranking, as expected and has been reported by others; however, we demonstrate that these alternate poses may in fact be more complementary between the ligand and the rigid receptor conformation, emphasizing that treating the receptor rigidly is an artificial constraint on the docking problem. We simulate protein flexibility by the use of multiple crystallographic conformations of a protein and demonstrate that docking results can be improved with this level of protein sampling. This work indicates the need for better sampling in docking programs, especially for the receptor. This study also highlights the variable descriptive value of RMSD as the sole arbiter of pose replication quality. It is shown that ligand poses within 2 Å of the crystallographic one can show dramatic differences in calculated relative protein–ligand energies. MM-GBSA rescoring of distinct poses overcomes some of the sensitivities of pose ranking experienced by the docking scoring functions due to protein preparation and binding site definition.