Distributed Representation of Chemical Fragments
Posted on 2018-03-08 - 19:16
This
article describes an unsupervised machine learning method
for computing distributed vector representation of molecular fragments.
These vectors encode fragment features in a continuous high-dimensional
space and enable similarity computation between individual fragments,
even for small fragments with only two heavy atoms. The method is
based on a word embedding algorithm borrowed from natural language
processing field, and approximately 6 million unlabeled PubChem chemicals
were used for training. The resulting dense fragment vectors are in
contrast to the traditional sparse “one-hot” fragment
representation and capture rich relational structure in the fragment
space. The vectors of small linear fragments were averaged to yield
distributed vectors of bigger fragments and molecules, which were
used for different tasks, e.g., clustering, ligand recall, and quantitative
structure–activity relationship modeling. The distributed vectors
were found to be better at clustering ring systems and recall of kinase
ligands as compared to standard binary fingerprints. This work demonstrates
unsupervised learning of fragment chemistry from large sets of unlabeled
chemical structures and subsequent application to supervised training
on relatively small data sets of labeled chemicals.
CITE THIS COLLECTION
DataCite
3 Biotech
3D Printing in Medicine
3D Research
3D-Printed Materials and Systems
4OR
AAPG Bulletin
AAPS Open
AAPS PharmSciTech
Abhandlungen aus dem Mathematischen Seminar der Universität Hamburg
ABI Technik (German)
Academic Medicine
Academic Pediatrics
Academic Psychiatry
Academic Questions
Academy of Management Discoveries
Academy of Management Journal
Academy of Management Learning and Education
Academy of Management Perspectives
Academy of Management Proceedings
Academy of Management Review
Chakravarti, Suman K. (2018). Distributed Representation of Chemical Fragments. ACS Publications. Collection. https://doi.org/10.1021/acsomega.7b02045
or
Select your citation style and then place your mouse over the citation text to select it.
SHARE
Usage metrics
Read the peer-reviewed publication
AUTHORS (1)
SC
Suman K. Chakravarti
KEYWORDS
high-dimensional spacechemical structuresfragment spacevector representationdata setsvectors encode fragment featureskinase ligandsPubChem chemicalssimilarity computationfragment chemistrylanguage processing fieldword embedding algorithmChemical Fragmentsring systemsfragment vectorsunsupervised machine