ct9b00627_si_001.pdf (523.31 kB)
Gaussian Process-Based Refinement of Dispersion Corrections
journal contribution
posted on 2019-10-11, 18:05 authored by Jonny Proppe, Stefan Gugler, Markus ReiherWe employ Gaussian
process (GP) regression to adjust for systematic
errors in D3-type dispersion corrections. We refer to the associated,
statistically improved model as D3-GP. It is trained on differences
between interaction energies obtained from PBE-D3(BJ)/ma-def2-QZVPP
and DLPNO-CCSD(T)/CBS calculations. We generated a data set containing
interaction energies for 1248 molecular dimers, which resemble the
dispersion-dominated systems contained in the S66 data set. Our systems
represent not only equilibrium structures but also dimers with various
relative orientations and conformations at both shorter and longer
distances. A reparametrization of the D3(BJ) model based on 66 of
these dimers suggests that two of its three empirical parameters, a1 and s8, are zero,
whereas a2 = 5.6841 bohr. For the remaining
1182 dimers, we find that this new set of parameters is superior to
all previously published D3(BJ) parameter sets. To train our D3-GP
model, we engineered two different vectorial representations of (supra-)molecular
systems, both derived from the matrix of atom-pairwise D3(BJ) interaction
terms: (a) a distance-resolved interaction energy histogram, histD3(BJ),
and (b) eigenvalues of the interaction matrix ordered according to
their decreasing absolute value, eigD3(BJ). Hence, the GP learns a
mapping from D3(BJ) information only, which renders D3-GP-type dispersion
corrections comparable to those obtained with the original D3 approach.
They improve systematically if the underlying training set is selected
carefully. Here, we harness the prediction variance obtained from
GP regression to select optimal training sets in an automated fashion.
The larger the variance, the more information the corresponding data
point may add to the training set. For a given set of molecular systems,
variance-based sampling can approximately determine the smallest subset
being subjected to reference calculations such that all dispersion
corrections for the remaining systems fall below a predefined accuracy
threshold. To render the entire D3-GP workflow as efficient as possible,
we present an improvement over our variance-based, sequential active-learning
scheme [J. Chem. Theory
Comput. 2018, 14, 5238]. Our refined learning algorithm selects multiple (instead
of single) systems that can be subjected to reference calculations
simultaneously. We refer to the underlying selection strategy as batchwise
variance-based sampling (BVS). BVS-guided active learning is an essential
component of our D3-GP workflow, which is implemented in a black-box
fashion. Once provided with reference data for new molecular systems,
the underlying GP model automatically learns to adapt to these and
similar systems. This approach leads overall to a self-improving model
(D3-GP) that predicts system-focused and GP-refined D3-type dispersion
corrections for any given system of reference data.
History
Usage metrics
Categories
Keywords
predefined accuracy thresholdGP-refined D 3-type dispersion correctionsreference datadimersD 3 approachbatchwise variance-based samplingBVSD 3-GP dispersion correctionsPBE-Dreference calculationsD 3-GP modelD 3-GP workflowsequential active-learning schemeD 3-GPdistance-resolved interaction energy histogramD 3-type dispersion correctionsGaussian Process-Based RefinementS 66 datainteraction energiesDLPNO-CCSD
Licence
Exports
RefWorks
BibTeX
Ref. manager
Endnote
DataCite
NLM
DC