Evaluation of
Multivariate Classification Models for
Analyzing NMR Metabolomics Data
Posted on 2019-08-22 - 16:35
Analytical
techniques such as NMR and mass spectrometry can generate
large metabolomics data sets containing thousands of spectral features
derived from numerous biological observations. Multivariate data analysis
is routinely used to uncover the underlying biological information
contained within these large metabolomics data sets. This is typically
accomplished by classifying the observations into groups (e.g., control
versus treated) and by identifying associated discriminating features.
There are a variety of classification models to select from, which
include some well-established techniques (e.g., principal component
analysis [PCA], orthogonal projection to latent structure [OPLS],
or partial least-squares projection to latent structures [PLS]) and
newly emerging machine learning algorithms (e.g., support vector machines
or random forests). However, it is unclear which classification model,
if any, is an optimal choice for the analysis of metabolomics data.
Herein, we present a comprehensive evaluation of five common classification
models routinely employed in the metabolomics field and that are also
currently available in our MVAPACK metabolomics software package.
Simulated and experimental NMR data sets with various levels of group
separation were used to evaluate each model. Model performance was
assessed by classification accuracy rate, by the area under a receiver
operating characteristic (AUROC) curve, and by the identification
of true discriminating features. Our findings suggest that the five
classification models perform equally well with robust data sets.
Only when the models are stressed with subtle data set differences
does OPLS emerge as the best-performing model. OPLS maintained a high-prediction
accuracy rate and a large area under the ROC curve while yielding
loadings closest to the true loadings with limited group separations.
CITE THIS COLLECTION
DataCite
3 Biotech
3D Printing in Medicine
3D Research
3D-Printed Materials and Systems
4OR
AAPG Bulletin
AAPS Open
AAPS PharmSciTech
Abhandlungen aus dem Mathematischen Seminar der Universität Hamburg
ABI Technik (German)
Academic Medicine
Academic Pediatrics
Academic Psychiatry
Academic Questions
Academy of Management Discoveries
Academy of Management Journal
Academy of Management Learning and Education
Academy of Management Perspectives
Academy of Management Proceedings
Academy of Management Review
Vu, Thao; Siemek, Parker; Bhinderwala, Fatema; Xu, Yuhang; Powers, Robert (2019). Evaluation of
Multivariate Classification Models for
Analyzing NMR Metabolomics Data. ACS Publications. Collection. https://doi.org/10.1021/acs.jproteome.9b00227
or
Select your citation style and then place your mouse over the citation text to select it.
SHARE
Usage metrics
Read the peer-reviewed publication
AUTHORS (5)
TV
Thao Vu
PS
Parker Siemek
FB
Fatema Bhinderwala
YX
Yuhang Xu
RP
Robert Powers