Multivariate Statistical Approaches for the Characterization of Dissolved Organic Matter Analyzed by Ultrahigh Resolution Mass Spectrometry

We apply multivariate statistics to explore the large data sets encountered from Fourier transform ion cyclotron resonance mass spectra of dissolved organic matter (DOM). Molecular formula assignments for the individual constituents of DOM are examined by hierarchal cluster analysis (HCA) and principal component analysis (PCA), to measure the relationships between numerous DOM samples. We compare two approaches: (1) using averages of elemental ratios and double bond equivalents calculated from the formulas, and (2) employing individual formulas and either their presence/absence or relative magnitude in each sample. With approach 2, PCA deciphers which of the thousands of formulas are significant to particular samples, and then a van Krevelen diagram highlights what types of compounds are molecular signatures to the samples. Our dual approach, especially approach 2, allows for complex data sets to be more easily interpreted, aiding in the characterization of DOM from various sources. By applying this methodology, clear trends can be delineated, trends that are not apparent from currently employed methods. Terrestrial DOM contains various lignin-derived compounds, tannins, and condensed aromatics. Marine DOM contains aliphatic compounds with heteroatom functionalities, as well as lignin-like molecules.