10.1021/acs.analchem.8b05592.s001
Sili Fan
Sili
Fan
Tobias Kind
Tobias
Kind
Tomas Cajka
Tomas
Cajka
Stanley L. Hazen
Stanley L.
Hazen
W. H. Wilson Tang
W.
H. Wilson Tang
Rima Kaddurah-Daouk
Rima
Kaddurah-Daouk
Marguerite R. Irvin
Marguerite
R. Irvin
Donna K. Arnett
Donna
K. Arnett
Dinesh K. Barupal
Dinesh
K. Barupal
Oliver Fiehn
Oliver
Fiehn
Systematic Error Removal Using Random Forest for Normalizing
Large-Scale Untargeted Lipidomics Data
American Chemical Society
2019
analysis time
data acquisition processes
sample sets
instrument-to-instrument variation
Normalizing Large-Scale Untargeted Lipidomics Data Large-scale untargeted lipidomics experiments
Random Forest
data sets
SERRF
quality control pool samples
novel normalization approach
2696 samples
Systematic Error Removal
Such data sets
Technical data variance
error removal
lipidomics data sets
QC
normalization methods
batch differences
2019-02-13 00:00:00
Journal contribution
https://acs.figshare.com/articles/journal_contribution/Systematic_Error_Removal_Using_Random_Forest_for_Normalizing_Large-Scale_Untargeted_Lipidomics_Data/7733000
Large-scale
untargeted lipidomics experiments involve the measurement
of hundreds to thousands of samples. Such data sets are usually acquired
on one instrument over days or weeks of analysis time. Such extensive
data acquisition processes introduce a variety of systematic errors,
including batch differences, longitudinal drifts, or even instrument-to-instrument
variation. Technical data variance can obscure the true biological
signal and hinder biological discoveries. To combat this issue, we
present a novel normalization approach based on using quality control
pool samples (QC). This method is called systematic error removal
using random forest (SERRF) for eliminating the unwanted systematic
variations in large sample sets. We compared SERRF with 15 other commonly
used normalization methods using six lipidomics data sets from three
large cohort studies (832, 1162, and 2696 samples). SERRF reduced
the average technical errors for these data sets to 5% relative standard
deviation. We conclude that SERRF outperforms other existing methods
and can significantly reduce the unwanted systematic variation, revealing
biological variance of interest.