posted on 2012-08-03, 00:00authored byJan Krumsiek, Karsten Suhre, Thomas Illig, Jerzy Adamski, Fabian J. Theis
Interpreting the complex interplay of metabolites in
heterogeneous
biosamples still poses a challenging task. In this study, we propose
independent component analysis (ICA) as a multivariate analysis tool
for the interpretation of large-scale metabolomics data. In particular,
we employ a Bayesian ICA method based on a mean-field approach, which
allows us to statistically infer the number of independent components
to be reconstructed. The advantage of ICA over correlation-based methods
like principal component analysis (PCA) is the utilization of higher
order statistical dependencies, which not only yield additional information
but also allow a more meaningful representation of the data with fewer
components. We performed the described ICA approach on a large-scale
metabolomics data set of human serum samples, comprising a total of
1764 study probands with 218 measured metabolites. Inspecting the source matrix of statistically independent metabolite profiles
using a weighted enrichment algorithm, we observe strong enrichment
of specific metabolic pathways in all components. This includes signatures
from amino acid metabolism, energy-related processes, carbohydrate
metabolism, and lipid metabolism. Our results imply that the human
blood metabolome is composed of a distinct set of overlaying, statistically
independent signals. ICA furthermore produces a mixing matrix, describing the strength of each independent component for each
of the study probands. Correlating these values with plasma high-density
lipoprotein (HDL) levels, we establish a novel association between
HDL plasma levels and the branched-chain amino acid pathway. We conclude
that the Bayesian ICA methodology has the power and flexibility to
replace many of the nowadays common PCA and clustering-based analyses
common in the research field.