posted on 2023-02-03, 16:40authored byKaihang Shi, Zhao Li, Dylan M. Anstine, Dai Tang, Coray M. Colina, David S. Sholl, J. Ilja Siepmann, Randall Q. Snurr
A major
obstacle for machine learning (ML) in chemical science
is the lack of physically informed feature representations that provide
both accurate prediction and easy interpretability of the ML model.
In this work, we describe adsorption systems using novel two-dimensional
energy histogram (2D-EH) features, which are obtained from the probe-adsorbent
energies and energy gradients at grid points located throughout the
adsorbent. The 2D-EH features encode both energetic and structural
information of the material and lead to highly accurate ML models
(coefficient of determination R2 ∼
0.94–0.99) for predicting single-component adsorption capacity
in metal–organic frameworks (MOFs). We consider the adsorption
of spherical molecules (Kr and Xe), linear alkanes with a wide range
of aspect ratios (ethane, propane, n-butane, and n-hexane), and a branched alkane (2,2-dimethylbutane) over
a wide range of temperatures and pressures. The interpretable 2D-EH
features enable the ML model to learn the basic physics of adsorption
in pores from the training data. We show that these MOF-data-trained
ML models are transferrable to different families of amorphous nanoporous
materials. We also identify several adsorption systems where capillary
condensation occurs, and ML predictions are more challenging. Nevertheless,
our 2D-EH features still outperform structural features including
those derived from persistent homology. The novel 2D-EH features may
help accelerate the discovery and design of advanced nanoporous materials
using ML for gas storage and separation in the future.