posted on 2021-11-29, 13:06authored byZhuo Yang, Jianfei Song, Minjian Yang, Lin Yao, Jiahua Zhang, Hui Shi, Xiangyang Ji, Yafeng Deng, Xiaojian Wang
Library
matching using carbon-13 nuclear magnetic resonance (13C NMR) spectra has been a popular method adopted in compound
identification systems. However, the usability of existing approaches
has been restricted as enlarging a library containing both a chemical
structure and spectrum is a costly and time-consuming process. Therefore,
we propose a fundamentally different, novel approach to match 13C NMR spectra directly against a molecular structure library.
We develop a cross-modal retrieval between spectrum and structure
(CReSS) system using deep contrastive learning, which allows us to
search a molecular structure library using the 13C NMR
spectrum of a compound. In the test of searching 41,494 13C NMR spectra against a reference structure library containing 10.4
million compounds, CReSS reached a recall@10 accuracy of 91.64% and
a processing speed of 0.114 s per query spectrum. When further incorporating
a filter with a molecular weight tolerance of 5 Da, CReSS achieved
a new remarkable recall@10 of 98.39%. Furthermore, CReSS has potential
in detecting scaffolds of novel structures and demonstrates great
performance for the task of structural revision. CReSS is built and
developed to bridge the gap between 13C NMR spectra and
structures and could be generally applicable in compound identification.