posted on 2020-03-24, 11:04authored byZhiwei Rong, Qilong Tan, Lei Cao, Liuchao Zhang, Kui Deng, Yue Huang, Zheng-Jiang Zhu, Zhenzi Li, Kang Li
Untargeted metabolomics based on
liquid chromatography–mass
spectrometry is affected by nonlinear batch effects, which cover up
biological effects, result in nonreproducibility, and are difficult
to be calibrate. In this study, we propose a novel deep learning model,
called Normalization Autoencoder (NormAE), which is based on nonlinear
autoencoders (AEs) and adversarial learning. An additional classifier
and ranker are trained to provide adversarial regularization during
the training of the AE model, latent representations are extracted
by the encoder, and then the decoder reconstructs the data without
batch effects. The NormAE method was tested on two real metabolomics
data sets. After calibration by NormAE, the quality control samples
(QCs) for both data sets gathered most closely in a PCA score plot
(average distances decreased from 56.550 and 52.476 to 7.383 and 14.075,
respectively) and obtained the highest average correlation coefficients
(from 0.873 and 0.907 to 0.997 for both). Additionally, NormAE significantly
improved biomarker discovery (median number of differential peaks
increased from 322 and 466 to 1140 and 1622, respectively). NormAE
was compared with four commonly used batch effect removal methods.
The results demonstrated that using NormAE produces the best calibration
results.