posted on 2016-08-01, 00:00authored byMatthew R. Lewis, Jake T. M. Pearce, Konstantina Spagou, Martin Green, Anthony
C. Dona, Ada H. Y. Yuen, Mark David, David
J. Berry, Katie Chappell, Verena Horneffer-van der Sluis, Rachel Shaw, Simon Lovestone, Paul Elliott, John Shockcor, John
C. Lindon, Olivier Cloarec, Zoltan Takats, Elaine Holmes, Jeremy K. Nicholson
To better understand the molecular
mechanisms underpinning physiological
variation in human populations, metabolic phenotyping approaches are
increasingly being applied to studies involving hundreds and thousands
of biofluid samples. Hyphenated ultra-performance liquid chromatography–mass
spectrometry (UPLC-MS) has become a fundamental tool for this purpose.
However, the seemingly inevitable need to analyze large studies in
multiple analytical batches for UPLC-MS analysis poses a challenge
to data quality which has been recognized in the field. Herein, we
describe in detail a fit-for-purpose UPLC-MS platform, method set,
and sample analysis workflow, capable of sustained analysis on an
industrial scale and allowing batch-free operation for large studies.
Using complementary reversed-phase chromatography (RPC) and hydrophilic
interaction liquid chromatography (HILIC) together with high resolution
orthogonal acceleration time-of-flight mass spectrometry (oaTOF-MS),
exceptional measurement precision is exemplified with independent
epidemiological sample sets of approximately 650 and 1000 participant
samples. Evaluation of molecular reference targets in repeated injections
of pooled quality control (QC) samples distributed throughout each
experiment demonstrates a mean retention time relative standard deviation
(RSD) of <0.3% across all assays in both studies and a mean peak
area RSD of <15% in the raw data. To more globally assess the quality
of the profiling data, untargeted feature extraction was performed
followed by data filtration according to feature intensity response
to QC sample dilution. Analysis of the remaining features within the
repeated QC sample measurements demonstrated median peak area RSD
values of <20% for the RPC assays and <25% for the HILIC assays.
These values represent the quality of the raw data, as no normalization
or feature-specific intensity correction was applied. While the data
in each experiment was acquired in a single continuous batch, instances
of minor time-dependent intensity drift were observed, highlighting
the utility of data correction techniques despite reducing the dependency
on them for generating high quality data. These results demonstrate
that the platform and methodology presented herein is fit-for-use
in large scale metabolic phenotyping studies, challenging the assertion
that such screening is inherently limited by batch effects. Details
of the pipeline used to generate high quality raw data and mitigate
the need for batch correction are provided.