posted on 2025-06-24, 08:29authored byJosé Ferraz-Caetano, Filipe Teixeira, M. Natália D. S. Cordeiro
Catalytic epoxidations are key chemical processes serving
as essential
steps in the synthesis of commercially valuable compounds. This study
presents an innovative supervised machine learning (ML) model to predict
the reaction yield of the vanadium-catalyzed epoxidation of small
alcohols and alkenes. Our framework uncovers relevant chemical characteristics
for structure design, offering a pathway for automated optimization
of epoxidation reactions. The study also incorporates the concept
of data augmentation, handling experimental variability by generating
synthetic reactions to densify under-represented data segments. Trained
on a curated data set of 273 experimental epoxidation reactions with
vanadyl catalyst groups, the model achieved a predictive R2 test score of 90%, with a mean absolute yield prediction
error of 4.7%. The ML model offers a high degree of explainability,
as descriptor analysis identified key experimental and chemical descriptors
that influence catalytic reaction predictions. This represents a significant
development in catalytic epoxidation studies, highlighting the critical
role of data science in reaction research and catalyst optimization.