posted on 2020-08-12, 18:38authored byDmitriy D. Matyushin, Anastasia Yu. Sholokhova, Aleksey K. Buryak
Preliminary
compound identification and peak annotation in gas
chromatography–mass spectrometry is usually made using mass
spectral databases. There are a few algorithms that enable performing
a search of a spectrum in a large mass spectral library. In many cases,
a library search procedure returns a wrong answer even if a correct
compound is contained in a library. In this work, we present a deep
learning driven approach to a library search in order to reduce the
probability of such cases. Machine learning ranking (learning to rank)
is a class of machine learning and deep learning algorithms that perform
a comparison (ranking) of objects. This work introduces the usage
of deep learning ranking for small molecules identification using
low-resolution electron ionization mass spectrometry. Instead of simple
similarity measures for two spectra, such as the dot product or the
Euclidean distance between vectors that represent spectra, a deep
convolutional neural network is used. The deep learning ranking model
outperforms other approaches and enables reducing a fraction of wrong
answers (at rank-1) by 9–23% depending on the used data set.
Spectra from the Golm Metabolome Database, Human Metabolome Database,
and FiehnLib were used for testing the model.