es0c01791_si_001.pdf (1 MB)
An Ensemble Learning Approach for Estimating High Spatiotemporal Resolution of Ground-Level Ozone in the Contiguous United States
journal contribution
posted on 2020-09-01, 20:29 authored by Weeberb J. Requia, Qian Di, Rachel Silvern, James T. Kelly, Petros Koutrakis, Loretta J. Mickley, Melissa P. Sulprizio, Heresh Amini, Liuhua Shi, Joel SchwartzIn
this paper, we integrated multiple types of predictor variables
and three types of machine learners (neural network, random forest,
and gradient boosting) into a geographically weighted ensemble model
to estimate the daily maximum 8 h O3 with high resolution
over both space (at 1 km × 1 km grid cells covering the contiguous
United States) and time (daily estimates between 2000 and 2016). We
further quantify monthly model uncertainty for our 1 km × 1 km
gridded domain. The results demonstrate high overall model performance
with an average cross-validated R2 (coefficient
of determination) against observations of 0.90 and 0.86 for annual
averages. Overall, the model performance of the three machine learning
algorithms was quite similar. The overall model performance from the
ensemble model outperformed those from any single algorithm. The East
North Central region of the United States had the highest R2, 0.93, and performance was weakest for the
western mountainous regions (R2 of 0.86)
and New England (R2 of 0.87). For the
cross validation by season, our model had the best performance during
summer with an R2 of 0.88. This study
can be useful for the environmental health community to more accurately
estimate the health impacts of O3 over space and time,
especially in health studies at an intra-urban scale.