posted on 2025-05-07, 19:03authored bySeokwoo Kim, Minhi Han, Jinyong Park, Kiwoong Lee, Sungnam Park
Coumarin derivatives have been widely developed and utilized
as
chromophores and fluorophores in various research fields. In this
study, we constructed an experimental database of the optical propertiesspecifically,
absorption and emission wavelengths measured in solutionsand
developed a machine learning (ML) model based on Gaussian-weighted
graph convolution (GWGC) and subgraph modular input (SMI) to predict
these properties. The GWGC was introduced as a novel molecular representation
that accounts for interatomic effects among neighboring atoms when
the optical properties of coumarin derivatives were predicted. The
SMI was introduced to represent coumarin derivatives as subgraphs
composed of a coumarin core and six substituents, thereby modularizing
the molecular vector into a core vector and substituent vectors. This
approach encodes both the separate chemical information on the core
and substituents as well as the positional information on the substituents,
facilitating an understanding of how each substituent influences the
optical properties of the coumarin core. ML models leveraging GWGC
and SMI outperformed those based on RDKit descriptors and count-based
Morgan fingerprint. The ML models with GWGC and SMI can be generally
applied to predict properties of molecules composed of a core structure
and its various substituents.