jp0c01777_si_002.xls (207 kB)

Quantum-Chemically Informed Machine Learning: Prediction of Energies of Organic Molecules with 10 to 14 Non-hydrogen Atoms

Download (207 kB)
dataset
posted on 02.07.2020, 19:04 by Naveen Dandu, Logan Ward, Rajeev S. Assary, Paul C. Redfern, Badri Narayanan, Ian T. Foster, Larry A. Curtiss
High-fidelity quantum-chemical calculations can provide accurate predictions of molecular energies, but their high computational costs limit their utility, especially for larger molecules. We have shown in previous work that machine learning models trained on high-level quantum-chemical calculations (G4MP2) for organic molecules with one to nine non-hydrogen atoms can provide accurate predictions for other molecules of comparable size at much lower costs. Here we demonstrate that such models can also be used to effectively predict energies of molecules larger than those in the training set. To implement this strategy, we first established a set of 191 molecules with 10–14 non-hydrogen atoms having reliable experimental enthalpies of formation. We then assessed the accuracy of computed G4MP2 enthalpies of formation for these 191 molecules. The error in the G4MP2 results was somewhat larger than that for smaller molecules, and the reason for this increase is discussed. Two density functional methods, B3LYP and ωB97X-D, were also used on this set of molecules, with ωB97X-D found to perform better than B3LYP at predicting energies. The G4MP2 energies for the 191 molecules were then predicted using these two functionals with two machine learning methods, the FCHL-Δ and SchNet-Δ models, with the learning done on calculated energies of the one to nine non-hydrogen atom molecules. The better-performing model, FCHL-Δ, gave atomization energies of the 191 organic molecules with 10–14 non-hydrogen atoms within 0.4 kcal/mol of their G4MP2 energies. Thus, this work demonstrates that quantum-chemically informed machine learning can be used to successfully predict the energies of large organic molecules whose size is beyond that in the training set.

History

Exports