Version 2 2024-08-19, 18:53Version 2 2024-08-19, 18:53
Version 1 2024-05-02, 07:05Version 1 2024-05-02, 07:05
dataset
posted on 2024-08-19, 18:53authored byJacob Kvasnicka, Nicolò Aurisano, Kerstin von Borries, En-Hsuan Lu, Peter Fantke, Olivier Jolliet, Fred A. Wright, Weihsueh A. Chiu
Chemical points of
departure (PODs) for critical health effects
are crucial for evaluating and managing human health risks and impacts
from exposure. However, PODs are unavailable for most chemicals in
commerce due to a lack of in vivo toxicity data.
We therefore developed a two-stage machine learning (ML) framework
to predict human-equivalent PODs for oral exposure to organic chemicals
based on chemical structure. Utilizing ML-based predictions for structural/physical/chemical/toxicological
properties from OPERA 2.9 as features (Stage 1), ML models using random
forest regression were trained with human-equivalent PODs derived
from in vivo data sets for general noncancer effects
(n = 1,791) and reproductive/developmental effects
(n = 2,228), with robust cross-validation for feature
selection and estimating generalization errors (Stage 2). These two-stage
models accurately predicted PODs for both effect categories with cross-validation-based
root-mean-squared errors less than an order of magnitude. We then
applied one or both models to 34,046 chemicals expected to be in the
environment, revealing several thousand chemicals of moderate concern and several hundred chemicals of high concern
for health effects at estimated median population exposure levels.
Further application can expand by orders of magnitude the coverage
of organic chemicals that can be evaluated for their human health
risks and impacts.