jp0c01777_si_001.pdf (1.05 MB)
Quantum-Chemically Informed Machine Learning: Prediction of Energies of Organic Molecules with 10 to 14 Non-hydrogen Atoms
journal contribution
posted on 2020-07-02, 19:04 authored by Naveen Dandu, Logan Ward, Rajeev S. Assary, Paul C. Redfern, Badri Narayanan, Ian T. Foster, Larry A. CurtissHigh-fidelity quantum-chemical
calculations can provide accurate
predictions of molecular energies, but their high computational costs
limit their utility, especially for larger molecules. We have shown
in previous work that machine learning models trained on high-level
quantum-chemical calculations (G4MP2) for organic molecules with one
to nine non-hydrogen atoms can provide accurate predictions for other
molecules of comparable size at much lower costs. Here we demonstrate
that such models can also be used to effectively predict energies
of molecules larger than those in the training set. To implement this
strategy, we first established a set of 191 molecules with 10–14
non-hydrogen atoms having reliable experimental enthalpies of formation.
We then assessed the accuracy of computed G4MP2 enthalpies of formation
for these 191 molecules. The error in the G4MP2 results was somewhat
larger than that for smaller molecules, and the reason for this increase
is discussed. Two density functional methods, B3LYP and ωB97X-D,
were also used on this set of molecules, with ωB97X-D found
to perform better than B3LYP at predicting energies. The G4MP2 energies
for the 191 molecules were then predicted using these two functionals
with two machine learning methods, the FCHL-Δ and SchNet-Δ
models, with the learning done on calculated energies of the one to
nine non-hydrogen atom molecules. The better-performing model, FCHL-Δ,
gave atomization energies of the 191 organic molecules with 10–14
non-hydrogen atoms within 0.4 kcal/mol of their G4MP2 energies. Thus,
this work demonstrates that quantum-chemically informed machine learning
can be used to successfully predict the energies of large organic
molecules whose size is beyond that in the training set.