posted on 2024-02-17, 04:03authored bySung Wook Moon, Seung Kyu Min
Molecular discovery is central to
the field of chemical
informatics.
Although optimization approaches have been developed that target-specific
molecular properties in combination with machine learning techniques,
optimization using databases of limited size is challenging for efficient
molecular design. We present a molecular design method with a Gaussian
process regression model and a graph-based genetic algorithm (GB-GA)
from a data set comprising a small number of compounds by introducing
mutation probability control in the genetic algorithm to enhance the
optimization capability and speed up the convergence to the optimal
solution. In addition, we propose reducing the number of parameters
in the conventional GB-GA focusing on efficient molecular design from
a small database. We generated a target-specific database by combining
active learning and iterative design in the evolutionary methodologies
and chose Gaussian process regression as the prediction model for
molecular properties. We show that the proposed scheme is more efficient
for optimization toward the target properties from goal-directed benchmarks
with several drug-like molecules compared to the conventional GB-GA
method. Finally, we provide a demonstration whereby we designed D-luciferin
analogues with near-infrared fluorescence for bioimaging, which is
desirable for effective in vivo light sources, from a small-size data
set.