posted on 2020-05-21, 15:58authored byDimitri Panagopoulos
Abrahamsson, June-Soo Park, Randolph R. Singh, Marina Sirota, Tracey J. Woodruff
Non-targeted
analysis provides a comprehensive approach to analyze
environmental and biological samples for nearly all chemicals present.
One of the main shortcomings of current analytical methods and workflows is
that they are unable to provide any quantitative information
constituting an important obstacle in understanding environmental
fate and human exposure. Herein, we present an in silico quantification
method using mahine-learning for chemicals analyzed using electrospray
ionization (ESI). We considered three data sets from different instrumental
setups: (i) capillary electrophoresis electrospray ionization-mass
spectrometry (CE-MS) in positive ionization mode (ESI+), (ii) liquid
chromatography quadrupole time-of-flight mass spectrometry (LC-QTOF/MS)
in ESI+ and (iii) LC-QTOF/MS in negative ionization mode (ESI−).
We developed and applied two different machine-learning algorithms:
a random forest (RF) and an artificial neural network (ANN) to predict
the relative response factors (RRFs) of different chemicals based
on their physicochemical properties. Chemical concentrations can then
be calculated by dividing the measured abundance of a chemical, as
peak area or peak height, by its corresponding RRF. We evaluated our
models and tested their predictive power using 5-fold cross-validation
(CV) and y randomization. Both the RF and the
ANN models showed great promise in predicting RRFs. However, the accuracy
of the predictions was dependent on the data set composition and the
experimental setup. For the CE-MS ESI+ data set, the best model predicted
measured RRFs with a mean absolute error (MAE) of 0.19 log units and
a cross-validation coefficient of determination (Q2) of 0.84 for the testing set. For the LC-QTOF/MS ESI+
data set, the best model predicted measured RRFs with an MAE of 0.32
and a Q2 of 0.40. For the LC-QTOF/MS ESI–
data set, the best model predicted measured RRFs with a MAE of 0.50
and a Q2 of 0.20. Our findings suggest
that machine-learning algorithms can be used for predicting concentrations
of nontargeted chemicals with reasonable uncertainties, especially
in ESI+, while the application on ESI– remains a more challenging
problem.