ac8b05592_si_001.pdf (848.96 kB)
Systematic Error Removal Using Random Forest for Normalizing Large-Scale Untargeted Lipidomics Data
journal contribution
posted on 2019-02-13, 00:00 authored by Sili Fan, Tobias Kind, Tomas Cajka, Stanley L. Hazen, W. H. Wilson Tang, Rima Kaddurah-Daouk, Marguerite R. Irvin, Donna K. Arnett, Dinesh K. Barupal, Oliver FiehnLarge-scale
untargeted lipidomics experiments involve the measurement
of hundreds to thousands of samples. Such data sets are usually acquired
on one instrument over days or weeks of analysis time. Such extensive
data acquisition processes introduce a variety of systematic errors,
including batch differences, longitudinal drifts, or even instrument-to-instrument
variation. Technical data variance can obscure the true biological
signal and hinder biological discoveries. To combat this issue, we
present a novel normalization approach based on using quality control
pool samples (QC). This method is called systematic error removal
using random forest (SERRF) for eliminating the unwanted systematic
variations in large sample sets. We compared SERRF with 15 other commonly
used normalization methods using six lipidomics data sets from three
large cohort studies (832, 1162, and 2696 samples). SERRF reduced
the average technical errors for these data sets to 5% relative standard
deviation. We conclude that SERRF outperforms other existing methods
and can significantly reduce the unwanted systematic variation, revealing
biological variance of interest.
History
Usage metrics
Categories
Keywords
analysis timedata acquisition processessample setsinstrument-to-instrument variationNormalizing Large-Scale Untargeted Lipidomics Data Large-scale untargeted lipidomics experimentsRandom Forestdata setsSERRFquality control pool samplesnovel normalization approach2696 samplesSystematic Error RemovalSuch data setsTechnical data varianceerror removallipidomics data setsQCnormalization methodsbatch differences
Licence
Exports
RefWorks
BibTeX
Ref. manager
Endnote
DataCite
NLM
DC