posted on 2021-12-15, 15:34authored byShifa Zhong, Yanping Zhang, Huichun Zhang
To develop predictive models for
the reactivity of organic contaminants
toward four oxidantsSO4•–, HClO, O3, and ClO2all with small
sample sizes, we proposed two approaches: combining small data sets
and transferring knowledge between them. We first merged these data
sets and developed a unified model using machine learning (ML), which
showed better predictive performance than the individual models for
HClO (RMSEtest: 2.1 to 2.04), O3 (2.06 to 1.94),
ClO2 (1.77 to 1.49), and SO4•– (0.75 to 0.70) because the model “corrected” the wrongly
learned effects of several atom groups. We further developed knowledge
transfer models for three pairs of the data sets and observed different
predictive performances: improved for O3 (RMSEtest: 2.06 to 2.01)/HClO (2.10 to 1.98), mixed for O3 (2.06
to 2.01)/ClO2 (1.77 to 1.95), and unchanged for ClO2 (1.77 to 1.77)/HClO (2.1 to 2.1). The effectiveness of the
latter approach depended on whether there was consistent knowledge
shared between the data sets and on the performance of the individual
models. We also compared our approaches with multitask learning and
image-based transfer learning and found that our approaches consistently
improved the predictive performance for all data sets while the other
two did not. This study demonstrated the effectiveness of combining
small, similar data sets and transferring knowledge between them to
improve ML model performance.