posted on 2024-02-09, 17:09authored byBenson Chen, Mohammad M. Sultan, Theofanis Karaletsos
DNA-encoded
library (DEL) has proven to be a powerful tool that
utilizes combinatorially constructed small molecules to facilitate
highly efficient screening experiments. These selection experiments,
involving multiple stages of washing, elution, and identification
of potent binders via unique DNA barcodes, often generate complex
data. This complexity can potentially mask the underlying signals,
necessitating the application of computational tools, such as machine
learning, to uncover valuable insights. We introduce a compositional
deep probabilistic model of DEL data, DEL-Compose, which
decomposes molecular representations into their monosynthon, disynthon,
and trisynthon building blocks and capitalizes on the inherent hierarchical
structure of these molecules by modeling latent reactions between
embedded synthons. Additionally, we investigate methods to improve
the observation models for DEL count data, such as integrating covariate
factors to more effectively account for data noise. Across two popular
public benchmark data sets (CA-IX and HRP), our model demonstrates
strong performance compared to count baselines, enriches the correct
pharmacophores, and offers valuable insights via its intrinsic interpretable
structure, thereby providing a robust tool for the analysis of DEL
data.