posted on 2023-01-03, 20:33authored byRaya Stoyanova, Paul Maximilian Katzberger, Leonid Komissarov, Aous Khadhraoui, Lisa Sach-Peltason, Katrin Groebke Zbinden, Torsten Schindler, Nenad Manevski
Although computational predictions of pharmacokinetics
(PK) are
desirable at the drug design stage, existing approaches are often
limited by prediction accuracy and human interpretability. Using a
discovery data set of mouse and rat PK studies at Roche (9,685 unique
compounds), we performed a proof-of-concept study to predict key PK
properties from chemical structure alone, including plasma clearance
(CLp), volume of distribution at steady-state (Vss), and oral bioavailability
(F). Ten machine learning (ML) models were evaluated, including Single-Task,
Multitask, and transfer learning approaches (i.e., pretraining with in vitro data). In addition to prediction accuracy, we emphasized
human interpretability of outcomes, especially the quantification
of uncertainty, applicability domains, and explanations of predictions
in terms of molecular features. Results show that intravenous (IV)
PK properties (CLp and Vss) can be predicted with good precision (average
absolute fold error, AAFE of 1.96–2.84 depending on data split)
and low bias (average fold error, AFE of 0.98–1.36), with AutoGluon,
Gaussian Process Regressor (GP), and ChemProp displaying the best
performance. Driven by higher complexity of oral PK studies, predictions
of F were more challenging, with the best AAFE values of 2.35–2.60
and higher overprediction bias (AFE of 1.45–1.62). Multi-Task
approaches and pretraining of ChemProp neural networks with in vitro data showed similar precision to Single-Task models
but helped reduce the bias and increase correlations between observations
and predictions. A combination of GP-computed prediction variance,
molecular clustering, and dimensionality-reduction provided valuable
quantitative insights into prediction uncertainty and applicability
domains. SHAPley Additive exPlanations (SHAPs) highlighted molecular
features contributing to prediction outcomes of Vss, providing explanations
that could aid drug design. Combined results show that computational
predictions of PK are feasible at the drug design stage, with several
ML technologies converging to successfully leverage historical PK
data sets. Further studies are needed to unlock the full potential
of this approach, especially with respect to data set sizes and quality,
transfer learning between in vitro and in
vivo data sets, model-independent quantification of uncertainty,
and explainability of predictions.