American Chemical Society
ci0c00601_si_001.xlsx (634.02 kB)
Download file

Reason Vectors: Abstract Representation of Chemistry–Biology Interaction Outcomes, for Reasoning and Prediction

Download (634.02 kB)
posted on 2020-10-02, 07:14 authored by Suman K. Chakravarti
Many traditional quantitative structure–activity relationship (QSAR) models are based on correlation with high-dimensional, highly variable molecular features in their raw form, limiting their generalizing capabilities despite the use of large training sets. They also lack elements of causality and reasoning. With these issues in mind, we developed a method for learning higher-level abstract representations of the effects of the interactions between molecular features and biology. We named the representations as the reason vectors. They are composed of a series of computed activity of substructures obtained from stepwise reconstruction of the molecule. This representation is very different from fingerprints, which are composed of molecular features directly. These vectors capture reasons of bioactivity of chemicals (or absence thereof) in an abstract form, uncover causality in interactions between chemical features, and generalize beyond specific chemical classes or bioactivity. Reason vectors contain only a few key attributes and are much smaller than molecular fingerprints. They allow vague and conceptual similarity searches, less susceptible to failure on novel combinations of query molecule features and more likely to identify reasons of activity in chemical classes that are absent in training data. Reason vectors can be compared with each other and their activity can be computed by matching with vectors from molecules with known bioactivity. A single molecule produces as many reason vectors as heavy atoms in it, and a simple count of these vectors in a series of activity ranges is all what is needed to predict its bioactivity. Thus, the prediction method is devoid of gradient optimization or statistical fitting.