ci0c00601_si_001.xlsx (634.02 kB)
Download fileReason Vectors: Abstract Representation of Chemistry–Biology Interaction Outcomes, for Reasoning and Prediction
dataset
posted on 2020-10-02, 07:14 authored by Suman K. ChakravartiMany traditional
quantitative structure–activity relationship
(QSAR) models are based on correlation with high-dimensional, highly
variable molecular features in their raw form, limiting their generalizing
capabilities despite the use of large training sets. They also lack
elements of causality and reasoning. With these issues in mind, we
developed a method for learning higher-level abstract representations
of the effects of the interactions between molecular features and
biology. We named the representations as the reason vectors. They are composed of a series of computed activity of substructures
obtained from stepwise reconstruction of the molecule. This representation
is very different from fingerprints, which are composed of molecular
features directly. These vectors capture reasons of bioactivity of
chemicals (or absence thereof) in an abstract form, uncover causality
in interactions between chemical features, and generalize beyond specific
chemical classes or bioactivity. Reason vectors contain only a few
key attributes and are much smaller than molecular fingerprints. They
allow vague and conceptual similarity searches, less susceptible to
failure on novel combinations of query molecule features and more
likely to identify reasons of activity in chemical classes that are
absent in training data. Reason vectors can be compared with each
other and their activity can be computed by matching with vectors
from molecules with known bioactivity. A single molecule produces
as many reason vectors as heavy atoms in it, and a simple count of
these vectors in a series of activity ranges is all what is needed
to predict its bioactivity. Thus, the prediction method is devoid
of gradient optimization or statistical fitting.