“pySiRC”: Machine Learning Combined with Molecular Fingerprints to Predict the Reaction Rate Constant of the Radical-Based Oxidation Processes of Aqueous Organic Contaminants
datasetposted on 2021-09-02, 20:04 authored by Flávio Olimpio Sanches-Neto, Jefferson Richard Dias-Silva, Luiz Henrique Keng Queiroz Junior, Valter Henrique Carvalho-Silva
We developed a web application structured in a machine learning and molecular fingerprint algorithm for the automatic calculation of the reaction rate constant of the oxidative processes of organic pollutants by •OH and SO4•– radicals in the aqueous phasethe pySiRC platform. The model development followed the OECD principles: internal and external validation, applicability domain, and mechanistic interpretation. Three machine learning algorithms combined with molecular fingerprints were evaluated, and all the models resulted in high goodness-of-fit for the training set with R2 > 0.931 for the •OH radical and R2 > 0.916 for the SO4•– radical and good predictive capacity for the test set with Rext2 = Qext2 values in the range of 0.639–0.823 and 0.767–0.824 for the •OH and SO4•– radicals. The model was interpreted using the SHAP (SHapley Additive exPlanations) method: the results showed that the model developed made the prediction based on a reasonable understanding of how electron-withdrawing and -donating groups interfere with the reactivity of the •OH and SO4•– radicals. We hope that our models and web interface can stimulate and expand the application and interpretation of kinetic research on contaminants in water treatment units based on advanced oxidative technologies.
Read the peer-reviewed publication
shapley additive explanationsreaction rate constantgood predictive capacitydonating groups interfereaqueous phase sup >• supmolecular fingerprint algorithmadvanced oxidative technologiesmodel development followed767 – 0639 – 0based oxidation processes2 supweb application structuredaqueous organic contaminantsmodel developed madeoxidative processes>< supweb interfaceprediction basedorganic pollutantsmolecular fingerprintsr q py training settest setsirc platformresults showedreasonable understandingoecd principlesmachine learningkinetic researchinterpreted usinghigh goodnessexternal validationautomatic calculationapplicability domain