An
Ensemble Learning Approach for Estimating High
Spatiotemporal Resolution of Ground-Level Ozone in the Contiguous
United States

Requia, Weeberb J.; Di, Qian; Silvern, Rachel; Kelly, James T.; Koutrakis, Petros; Mickley, Loretta J.; Sulprizio, Melissa P.; Amini, Heresh; Shi, Liuhua; Schwartz, Joel

doi:10.1021/acs.est.0c01791.s001

es0c01791_si_001.pdf (1 MB)

An Ensemble Learning Approach for Estimating High Spatiotemporal Resolution of Ground-Level Ozone in the Contiguous United States

journal contribution

posted on 2020-09-01, 20:29 authored by Weeberb J. Requia, Qian Di, Rachel Silvern, James T. Kelly, Petros Koutrakis, Loretta J. Mickley, Melissa P. Sulprizio, Heresh Amini, Liuhua Shi, Joel Schwartz

In this paper, we integrated multiple types of predictor variables and three types of machine learners (neural network, random forest, and gradient boosting) into a geographically weighted ensemble model to estimate the daily maximum 8 h O₃ with high resolution over both space (at 1 km × 1 km grid cells covering the contiguous United States) and time (daily estimates between 2000 and 2016). We further quantify monthly model uncertainty for our 1 km × 1 km gridded domain. The results demonstrate high overall model performance with an average cross-validated R² (coefficient of determination) against observations of 0.90 and 0.86 for annual averages. Overall, the model performance of the three machine learning algorithms was quite similar. The overall model performance from the ensemble model outperformed those from any single algorithm. The East North Central region of the United States had the highest R², 0.93, and performance was weakest for the western mountainous regions (R² of 0.86) and New England (R² of 0.87). For the cross validation by season, our model had the best performance during summer with an R² of 0.88. This study can be useful for the environmental health community to more accurately estimate the health impacts of O₃ over space and time, especially in health studies at an intra-urban scale.