American Chemical Society
ac0c02082_si_001.pdf (5.17 MB)

Deep Learning Driven GC-MS Library Search and Its Application for Metabolomics

Download (5.17 MB)
journal contribution
posted on 2020-08-12, 18:38 authored by Dmitriy D. Matyushin, Anastasia Yu. Sholokhova, Aleksey K. Buryak
Preliminary compound identification and peak annotation in gas chromatography–mass spectrometry is usually made using mass spectral databases. There are a few algorithms that enable performing a search of a spectrum in a large mass spectral library. In many cases, a library search procedure returns a wrong answer even if a correct compound is contained in a library. In this work, we present a deep learning driven approach to a library search in order to reduce the probability of such cases. Machine learning ranking (learning to rank) is a class of machine learning and deep learning algorithms that perform a comparison (ranking) of objects. This work introduces the usage of deep learning ranking for small molecules identification using low-resolution electron ionization mass spectrometry. Instead of simple similarity measures for two spectra, such as the dot product or the Euclidean distance between vectors that represent spectra, a deep convolutional neural network is used. The deep learning ranking model outperforms other approaches and enables reducing a fraction of wrong answers (at rank-1) by 9–23% depending on the used data set. Spectra from the Golm Metabolome Database, Human Metabolome Database, and FiehnLib were used for testing the model.