ci6b00291_si_002.pdf (373.55 kB)
Informing the Human Plasma Protein Binding of Environmental Chemicals by Machine Learning in the Pharmaceutical Space: Applicability Domain and Limits of Predictability
journal contribution
posted on 2016-09-29, 00:00 authored by Brandall
L. Ingle, Brandon C. Veber, John W. Nichols, Rogelio Tornero-VelezThe
free fraction of a xenobiotic in plasma (Fub) is an important determinant of chemical adsorption,
distribution, metabolism, elimination, and toxicity, yet experimental
plasma protein binding data are scarce for environmentally relevant
chemicals. The presented work explores the merit of utilizing available
pharmaceutical data to predict Fub for
environmentally relevant chemicals via machine learning techniques.
Quantitative structure–activity relationship (QSAR) models
were constructed with k nearest neighbors (kNN),
support vector machines (SVM), and random forest (RF) machine learning
algorithms from a training set of 1045 pharmaceuticals. The models
were then evaluated with independent test sets of pharmaceuticals
(200 compounds) and environmentally relevant ToxCast chemicals (406
total, in two groups of 238 and 168 compounds). The selection of a
minimal feature set of 10–15 2D molecular descriptors allowed
for both informative feature interpretation and practical applicability
domain assessment via a bounded box of descriptor ranges and principal
component analysis. The diverse pharmaceutical and environmental chemical
sets exhibit similarities in terms of chemical space (99–82%
overlap), as well as comparable bias and variance in constructed learning
curves. All the models exhibit significant predictability with mean
absolute errors (MAE) in the range of 0.10–0.18Fub. The models performed best for highly bound chemicals
(MAE 0.07–0.12), neutrals (MAE 0.11–0.14), and acids
(MAE 0.14–0.17). A consensus model had the highest accuracy
across both pharmaceuticals (MAE 0.151–0.155) and environmentally
relevant chemicals (MAE 0.110–0.131). The inclusion of the
majority of the ToxCast test sets within the AD of the consensus model,
coupled with high prediction accuracy for these chemicals, indicates
the model provides a QSAR for Fub that
is broadly applicable to both pharmaceuticals and environmentally
relevant chemicals.