# Many-Body Descriptors for Predicting Molecular Properties with Machine Learning: Analysis of Pairwise and Three-Body Interactions in Molecules

journal contribution

posted on 11.05.2018, 00:00 by Wiktor Pronobis, Alexandre Tkatchenko, Klaus-Robert MüllerMachine
learning (ML) based prediction of molecular properties across chemical
compound space is an important and alternative approach to efficiently
estimate the solutions of highly complex many-electron problems in
chemistry and physics. Statistical methods represent molecules as
descriptors that should encode molecular symmetries and interactions
between atoms. Many such descriptors have been proposed; all of them
have advantages and limitations. Here, we propose a set of general
two-body and three-body interaction descriptors which are invariant
to translation, rotation, and atomic indexing. By adapting the successfully
used kernel ridge regression methods of machine learning, we evaluate
our descriptors on predicting several properties of small organic
molecules calculated using density-functional theory. We use two data
sets. The GDB-7 set contains 6868 molecules with up to 7 heavy atoms
of type CNO. The GDB-9 set is composed of 131722 molecules with up
to 9 heavy atoms containing CNO. When trained on 5000 random molecules,
our best model achieves an accuracy of 0.8 kcal/mol (on the remaining
1868 molecules of GDB-7) and 1.5 kcal/mol (on the remaining 126722
molecules of GDB-9) respectively. Applying a linear regression model
on our novel many-body descriptors performs almost equal to a nonlinear
kernelized model. Linear models are readily interpretable: a feature
importance ranking measure helps to obtain qualitative and quantitative
insights on the importance of two- and three-body molecular interactions
for predicting molecular properties computed with quantum-mechanical
methods.

## Read the peer-reviewed publication

## Categories

## Keywords

data setsquantum-mechanical methodsnovel many-body descriptorsmany-electron problems131722 moleculesLinear models1868 moleculesMany-Body Descriptorschemical compound spaceregression modelinteraction descriptorsMLStatistical methodsMolecules Machinedensity-functional theorykernel ridge regression methodsMolecular Propertiesfeature importance126722 moleculestype CNOalternative approachMachine Learning6868 moleculesnonlinear kernelized modelGDB -9GDB -7Three-Body Interactions

## History

## Licence

## Exports

## Read the peer-reviewed publication

## Categories

## Keywords

data setsquantum-mechanical methodsnovel many-body descriptorsmany-electron problems131722 moleculesLinear models1868 moleculesMany-Body Descriptorschemical compound spaceregression modelinteraction descriptorsMLStatistical methodsMolecules Machinedensity-functional theorykernel ridge regression methodsMolecular Propertiesfeature importance126722 moleculestype CNOalternative approachMachine Learning6868 moleculesnonlinear kernelized modelGDB -9GDB -7Three-Body Interactions