posted on 2020-07-06, 18:12authored byJun Pei, Lin Frank Song, Kenneth M. Merz
Atom
pairwise potential functions make up an essential part of
many scoring functions for protein decoy detection. With the development
of machine learning (ML) tools, there are multiple ways to combine
potential functions to create novel ML models and methods. Potential
function parameters can be easily extracted; however, it is usually
hard to directly obtain the calculated atom pairwise energies from
scoring functions. Amber, as one of the most popular suites of modeling
programs, has an extensive history and library of force field potential
functions. In this work, we directly used the force field parameters
in ff94 and ff14SB from Amber and encoded them to calculate atom pairwise
energies for different interactions. Two sets of structures (single
amino acid set and a dipeptide set) were used to evaluate the performance
of our encoded Amber potentials. From the comparison results between
energy terms obtained from our encoding and Amber, we find energy
difference within ±0.06 kcal/mol for all tested structures. Previously
we have shown that the Random Forest (RF) model can help to emphasize
more important atom pairwise interactions and ignore insignificant
ones [Pei, J.; Zheng, Z.; Merz, K. M. J. Chem. Inf. Model. 2019, 59, 1919−1929]. Here, as an example of combining ML methods
with traditional potential functions, we followed the same work flow
to combine the RF models with force field potential functions from
Amber. To determine the performance of our RF models with force field
potential functions, 224 different protein native-decoy systems were
used as our training and testing sets We find that the RF models with
ff94 and ff14SB force field parameters outperformed all other scoring
functions (RF models with KECSA2, RWplus, DFIRE, dDFIRE, and GOAP)
considered in this work for native structure detection, and they performed
similarly in detecting the best decoy. Through inclusion of best decoy
to decoy comparisons in building our RF models, we were able to generate
models that outperformed the score functions tested herein both on
accuracy and best decoy detection, again showing the performance and
flexibility of our RF models to tackle this problem. Finally, the
importance of the RF algorithm and force field parameters were also
tested and the comparison results suggest that both the RF algorithm
and force field potentials are important with the ML scoring function
achieving its best performance only by combining them together. All
code and data used in this work are available at https://github.com/JunPei000/FFENCODER_for_Protein_Folding_Pose_Selection.