posted on 2023-01-04, 08:58authored byHailiang Zhang, Zhenbo Xu, Xiaqiong Fan, Yue Wang, Qiong Yang, Jinyu Sun, Ming Wen, Xiao Kang, Zhimin Zhang, Hongmei Lu
Region of interest (ROI) extraction is a fundamental
step in analyzing
metabolomic datasets acquired by liquid chromatography–mass
spectrometry (LC–MS). However, noises and backgrounds in LC–MS
data often affect the quality of extracted ROIs. Therefore, developing
effective ROI evaluation algorithms is necessary to eliminate false
positives meanwhile keep the false-negative rate as low as possible.
In this study, a deep fused filter of ROIs (dffROI) was proposed to
improve the accuracy of ROI extraction by combining the handcrafted
evaluation metrics with convolutional neural network (CNN)-learned
representations. To evaluate the performance of dffROI, dffROI was
compared with peakonly (CNN-learned representation) and five handcrafted
metrics on three LC–MS datasets and a gas chromatography–mass
spectrometry (GC–MS) dataset. Results show that dffROI can
achieve higher accuracy, better true-positive rate, and lower false-positive
rate. Its accuracy, true-positive rate, and false-positive rate are
0.9841, 0.9869, and 0.0186 on the test set, respectively. The classification
error rate of dffROI (1.59%) is significantly reduced compared with
peakonly (2.73%). The model-agnostic feature importance demonstrates
the necessity of fusing handcrafted evaluation metrics with the convolutional
neural network representations. dffROI is an automatic, robust, and
universal method for ROI filtering by virtue of information fusion
and end-to-end learning. It is implemented in Python programming language
and open-sourced at https://github.com/zhanghailiangcsu/dffROI under BSD License. Furthermore, it has been integrated into the
KPIC2 framework previously proposed by our group to facilitate real
metabolomic LC–MS dataset analysis.