Identification of Potential PBT/POP-Like Chemicals by a Deep Learning Approach Based on 2D Structural Features
journal contributionposted on 16.06.2020, 17:03 by Xiangfei Sun, Xianming Zhang, Derek C.G. Muir, Eddy Y. Zeng
Identifying potential persistent organic pollutants (POPs) and persistent, bioaccumulative, and toxic (PBT) substances from industrial chemical inventories are essential for chemical risk assessment, management, and pollution control. Inspired by the connections between chemical structures and their properties, a deep convolutional neural network (DCNN) model was developed to screen potential PBT/POP-like chemicals. For each chemical, a two-dimensional molecular descriptor representation matrix based on 2424 molecular descriptors was used as the model input. The DCNN model was trained via a supervised learning algorithm with 1306 PBT/POP-like chemicals and 9990 chemicals currently known as non-POPs/PBTs. The model can achieve an average prediction accuracy of 95.3 ± 0.6% and an F-measurement of 79.3 ± 2.5% for PBT/POP-like chemicals (positive samples only) on external data sets. The DCNN model was further evaluated with 52 experimentally determined PBT chemicals in the REACH PBT assessment list and correctly recognized 47 chemicals as PBT/non-PBT chemicals. The DCNN model yielded a total of 4011 suspected PBT/POP like chemicals from 58 079 chemicals merged from five published industrial chemical lists. The proportions of PBT/POP-like substances in the chemical inventories were 6.9–7.8%, higher than a previous estimate of 3–5%. Although additional PBT/POP chemicals were identified, no new family of PBT/POP-like chemicals was observed.