%0 Journal Article
%A Clark, Alex M.
%A Dole, Krishna
%A Coulon-Spektor, Anna
%A McNutt, Andrew
%A Grass, George
%A Freundlich, Joel S.
%A Reynolds, Robert
C.
%A Ekins, Sean
%D 2015
%T Open Source Bayesian Models. 1. Application to ADME/Tox
and Drug Discovery Datasets
%U https://acs.figshare.com/articles/journal_contribution/Open_Source_Bayesian_Models_1_Application_to_ADME_Tox_and_Drug_Discovery_Datasets/2051400
%R 10.1021/acs.jcim.5b00143.s001
%2 https://acs.figshare.com/ndownloader/files/3622749
%K Drug Discovery DatasetsOn
%K drug discovery
%K Bayesian models
%K FCFP 6 descriptors
%K Open Source Bayesian Models
%K CDK
%K CDD Vault
%K ADME
%K Chemistry Development Kit
%X On the order of hundreds of absorption,
distribution, metabolism,
excretion, and toxicity (ADME/Tox) models have been described in the
literature in the past decade which are more often than not inaccessible
to anyone but their authors. Public accessibility is also an issue
with computational models for bioactivity, and the ability to share
such models still remains a major challenge limiting drug discovery.
We describe the creation of a reference implementation of a Bayesian
model-building software module, which we have released as an open
source component that is now included in the Chemistry Development
Kit (CDK) project, as well as implemented in the CDD Vault and
in several mobile apps. We use this implementation to build an array
of Bayesian models for ADME/Tox, in vitro and in vivo bioactivity, and other physicochemical properties.
We show that these models possess cross-validation receiver operator
curve values comparable to those generated previously in prior publications
using alternative tools. We have now described how the implementation
of Bayesian models with FCFP6 descriptors generated in the CDD Vault
enables the rapid production of robust machine learning models from
public data or the user’s own datasets. The current study sets
the stage for generating models in proprietary software (such as CDD)
and exporting these models in a format that could be run in open source
software using CDK components. This work also demonstrates that we
can enable biocomputation across distributed private or public datasets
to enhance drug discovery.
%I ACS Publications