Statistical Variable Selection: An Alternative Prioritization
Strategy during the Nontarget Analysis of LC-HR-MS Data
Saer Samanipour
Malcolm J. Reid
Kevin V. Thomas
10.1021/acs.analchem.7b00743.s001
https://acs.figshare.com/articles/journal_contribution/Statistical_Variable_Selection_An_Alternative_Prioritization_Strategy_during_the_Nontarget_Analysis_of_LC-HR-MS_Data/4978004
Liquid
chromatography coupled to high resolution mass spectrometry
(LC-HR-MS) has been one of the main analytical tools for the analysis
of small polar organic pollutants in the environment. LC-HR-MS typically
produces a large amount of data for a single chromatogram. The analyst
is therefore required to perform prioritization prior to nontarget
structural elucidation. In the present study, we have combined the
F-ratio statistical variable selection and the apex detection algorithms
in order to perform prioritization in data sets produced via LC-HR-MS.
The approach was validated through the use of semisynthetic data,
which was a combination of real environmental data and the artificially
added signal of 31 alkanes in that sample. We evaluated the performance
of this method as a function of four false detection probabilities,
namely: 0.01, 0.02, 0.05, and 0.1%. We generated 100 different semisynthetic
data sets for each F-ratio and evaluated that data set using this
method. This design of experiment created a population of 30 000
true positives and 32 000 true negatives for each F-ratio,
which was considered sufficiently large enough in order to fully validate
this method for analysis of LC-HR-MS data. The effect of both the
F-ratio and signal-to-noise ratio (<i>S</i>/<i>N</i>) on the performance of the suggested approach were evaluated through
normalized statistical tests. We also compared this method to the
pixel-by-pixel as well as peak list approaches. More than 92% of features
present in the final feature list via the F-ratio method were also
present in the conventional peak list generated by MZmine. However,
this method was the only approach successful in the classification
of samples, and thus prioritization, when compared to the other evaluated
approaches. The application potential and limitations of the suggested
method are discussed.
2017-04-24 00:00:00
F-ratio method
data sets
semisynthetic data
semisynthetic data sets
prioritization
Nontarget Analysis
peak list
peak list approaches
30 000
feature list
LC-HR-MS Data Liquid chromatography
32 000
Alternative Prioritization Strategy
Statistical Variable Selection
apex detection algorithms
31 alkanes
resolution mass spectrometry
LC-HR-MS data
detection probabilities