posted on 2024-12-30, 12:42authored byJoseph DeCorte, Benjamin Brown, Rathmell Jeffrey, Jens Meiler
Machine learning (ML) models now play a crucial role
in predicting
properties essential to drug development, such as a drug’s
logscale acid-dissociation constant (pKa). Despite recent architectural advances, these models often generalize
poorly to novel compounds due to a scarcity of ground-truth data.
Further, these models lack interpretability. To this end, with deliberate
molecular embeddings, atomic-resolution information is accessible
in chemical structures by observing the model response to atomic perturbations
of an input molecule. Here, we present BCL-XpKa, a deep neural network
(DNN)-based multitask classifier for pKa prediction that encodes local atomic environments through Mol2D
descriptors. BCL-XpKa outputs a discrete distribution for each molecule,
which stores the pKa prediction and the
model’s uncertainty for that molecule. BCL-XpKa generalizes
well to novel small molecules. BCL-XpKa performs competitively with
modern ML pKa predictors, outperforms
several models in generalization tasks, and accurately models the
effects of common molecular modifications on a molecule’s ionizability.
We then leverage BCL-XpKa’s granular descriptor set and distribution-centered
output through atomic sensitivity analysis (ASA), which decomposes
a molecule’s predicted pKa value
into its respective atomic contributions without model retraining.
ASA reveals that BCL-XpKa has implicitly learned high-resolution information
about molecular substructures. We further demonstrate ASA’s
utility in structure preparation for protein–ligand docking
by identifying ionization sites in 93.2% and 87.8% of complex small
molecule acids and bases. We then applied ASA with BCL-XpKa to identify
and optimize the physicochemical liabilities of a recently published
KRAS-degrading PROTAC.