posted on 2020-05-19, 16:33authored byDragos Horvath, Gilles Marcou, Alexandre Varnek
The Generalized Born
(GB) solvent model is offering the best accuracy/computing
effort ratio yet requires drastic simplifications to estimate of the
Effective Born Radii (EBR) in bypassing a too expensive volume integration
step. EBRs are a measure of the degree of burial of an atom and not
very sensitive to small changes of geometry: in molecular dynamics,
the costly EBR update procedure is not mandatory at every step. This
work however aims at implementing a GB model into the Sampler for
Multiple Protein–Ligand Entities (S4MPLE) evolutionary algorithm
with mandatory EBR updates at each step triggering arbitrarily large
geometric changes. Therefore, a quantitative structure–property
relationship has been developed in order to express the EBRs as a
linear function of both the topological neighborhood and geometric
occupancy of the space around atoms. A training set of 810 molecular
systems, starting from fragment-like to drug-like compounds, proteins,
host–guest systems, and ligand–protein complexes, has
been compiled. For each species, S4MPLE generated several hundreds
of random conformers. For each atom in each geometry of each species,
its “standard” EBR was calculated by numeric integration
and associated to topological and geometric descriptors of the atom
neighborhood. This training set (EBR, atom descriptors) involving
>5 M entries was subjected to a boot-strapping multilinear regression
process with descriptor selection. In parallel, the strategy was repurposed
to also learn atomic solvent-accessible areas (SA) based on the same
descriptors. Resulting linear equations were challenged to predict
EBR and SA values for a similarly compiled external set of >2000
new
molecular systems. Solvation energies calculated with estimated EBR
and SA match “standard” energies within the typical
error of a force-field-based approach (a few kilocalories per mole).
Given the extreme diversity of molecular systems covered by the model,
this simple EBR/SA estimator covers a vast applicability domain.