pr8b00709_si_004.pdf (57.36 kB)
Download fileA Case Study and Methodology for OpenSWATH Parameter Optimization Using the ProCan90 Data Set and 45 810 Computational Analysis Runs
journal contribution
posted on 2019-01-17, 00:00 authored by Sean Peters, Peter G. Hains, Natasha Lucas, Phillip J. Robinson, Brett TullyIn
the current study, we show how ProCan90, a curated data set
of HEK293 technical replicates, can be used to optimize the configuration
options for algorithms in the OpenSWATH pipeline. Furthermore, we
use this case study as a proof of concept for horizontal scaling of
such a pipeline to allow 45 810 computational analysis runs
of OpenSWATH to be completed within four and a half days on a budget
of US $10 000. Through the use of Amazon Web Services (AWS),
we have successfully processed each of the ProCan 90 files with 506
combinations of input parameters. In total, the project consumed more
than 340 000 core hours of compute and generated in excess
of 26 TB of data. Using the resulting data and a set of quantitative
metrics, we show an analysis pathway that allows the calculation of
two optimal parameter sets, one for a compute rich environment (where
run time is not a constraint), and another for a compute poor environment
(where run time is optimized). For the same input files and the compute
rich parameter set, we show a 29.8% improvement in the number of quality
protein (>2 peptide) identifications found compared to the current
OpenSWATH defaults, with negligible adverse effects on quantification
reproducibility or drop in identification confidence, and a median
run time of 75 min (103% increase). For the compute poor parameter
set, we find a 55% improvement in the run time from the default parameter
set, at the expense of a 3.4% decrease in the number of quality protein
identifications, and an intensity CV decrease from 14.0% to 13.7%.