10.1021/acs.analchem.8b05592.s001 Sili Fan Sili Fan Tobias Kind Tobias Kind Tomas Cajka Tomas Cajka Stanley L. Hazen Stanley L. Hazen W. H. Wilson Tang W. H. Wilson Tang Rima Kaddurah-Daouk Rima Kaddurah-Daouk Marguerite R. Irvin Marguerite R. Irvin Donna K. Arnett Donna K. Arnett Dinesh K. Barupal Dinesh K. Barupal Oliver Fiehn Oliver Fiehn Systematic Error Removal Using Random Forest for Normalizing Large-Scale Untargeted Lipidomics Data American Chemical Society 2019 analysis time data acquisition processes sample sets instrument-to-instrument variation Normalizing Large-Scale Untargeted Lipidomics Data Large-scale untargeted lipidomics experiments Random Forest data sets SERRF quality control pool samples novel normalization approach 2696 samples Systematic Error Removal Such data sets Technical data variance error removal lipidomics data sets QC normalization methods batch differences 2019-02-13 00:00:00 Journal contribution https://acs.figshare.com/articles/journal_contribution/Systematic_Error_Removal_Using_Random_Forest_for_Normalizing_Large-Scale_Untargeted_Lipidomics_Data/7733000 Large-scale untargeted lipidomics experiments involve the measurement of hundreds to thousands of samples. Such data sets are usually acquired on one instrument over days or weeks of analysis time. Such extensive data acquisition processes introduce a variety of systematic errors, including batch differences, longitudinal drifts, or even instrument-to-instrument variation. Technical data variance can obscure the true biological signal and hinder biological discoveries. To combat this issue, we present a novel normalization approach based on using quality control pool samples (QC). This method is called systematic error removal using random forest (SERRF) for eliminating the unwanted systematic variations in large sample sets. We compared SERRF with 15 other commonly used normalization methods using six lipidomics data sets from three large cohort studies (832, 1162, and 2696 samples). SERRF reduced the average technical errors for these data sets to 5% relative standard deviation. We conclude that SERRF outperforms other existing methods and can significantly reduce the unwanted systematic variation, revealing biological variance of interest.