posted on 2023-07-24, 23:29authored bySayandeep Biswas, Yunsie Chung, Josephine Ramirez, Haoyang Wu, William H. Green
Knowledge of critical properties, such as critical temperature,
pressure, density, as well as acentric factor, is essential to calculate
thermo-physical properties of chemical compounds. Experiments to determine
critical properties and acentric factors are expensive and time intensive;
therefore, we developed a machine learning (ML) model that can predict
these molecular properties given the SMILES representation of a chemical
species. We explored directed message passing neural network (D-MPNN)
and graph attention network as ML architecture choices. Additionally,
we investigated featurization with additional atomic and molecular
features, multitask training, and pretraining using estimated data
to optimize model performance. Our final model utilizes a D-MPNN layer
to learn the molecular representation and is supplemented by Abraham
parameters. A multitask training scheme was used to train a single
model to predict all the critical properties and acentric factors
along with boiling point, melting point, enthalpy of vaporization,
and enthalpy of fusion. The model was evaluated on both random and
scaffold splits where it shows state-of-the-art accuracies. The extensive
data set of critical properties and acentric factors contains 1144
chemical compounds and is made available in the public domain together
with the source code that can be used for further exploration.