posted on 2020-02-17, 05:31authored byTroy D. Loeffler, Tarak K. Patra, Henry Chan, Mathew Cherukara, Subramanian K. R. S. Sankaranarayanan
Molecular
dynamics with predefined functional forms is a popular
technique for understanding dynamical evolution of systems. The predefined
functional forms impose limits on the physics that can be captured.
Artificial neural network (ANN) models have emerged as an attractive
flexible alternative to the expensive quantum calculations (e.g.,
density functional theory) in the area of molecular force-fields.
Ideally, if one is able to train a ANN to accurately predict the correct
DFT energy and forces for any given structure, they gain the ability
to perform molecular dynamics with high accuracy while simultaneously
reducing the computation cost in a dramatic fashion. While this goal
is very lucrative, neural networks are interpolative and therefore,
it is not always clear how one should go about training a neural network
to exhaustively fit the entire phase space of a given system. Currently,
ANNs are trained by generating large quantities (on the order of 104 or greater) of training data in hopes that the ANN has adequately
sampled the energy landscape both near and far-from-equilibrium. This
can, however, be a bit prohibitive when it comes to more accurate
levels of quantum theory. As such, it is desirable to train a model
using the absolute minimal data set possible, especially when costs
of high-fidelity calculations such as CCSD and QMC are high. Here,
we present an active learning approach that iteratively trains an
ANN model to faithfully replicate the coarse-grained energy surface
of water clusters using only 426 total structures in its training
data. Our active learning workflow starts with a sparse training data
set which is continually updated via a Nested Ensemble Monte Carlo
scheme that sparsely queries the energy landscape and tests the network
performance. Next, the network is retrained with an updated training
set that includes failed configurations/energies from previous iteration
until convergence is attained. Once trained, we generate an extensive
test set of 100 000 configurations sampled across clusters
ranging from 1 to 200 molecules and demonstrate that the trained network
adequately reproduces the energies (within mean absolute error (MAE)
of 2 meV/molecule) and forces (MAE 40 meV/Å) compared to the
reference model. More importantly, the trained ANN model also accurately
captures both the structure as well as the free energy as a function
of the various cluster sizes. Overall, this study reports a new active
learning scheme with promising strategy to develop accurate force-fields
for molecular simulations using extremely sparse training data sets.