Meta-Analysis of Nanoparticle Cytotoxicity via Data-Mining the Literature

Developing predictive modeling frameworks of potential cytotoxicity of engineered nanoparticles is critical for environmental and health risk analysis. The complexity and the heterogeneity of available data on potential risks of nanoparticles, in addition to interdependency of relevant influential attributes, makes it challenging to develop a generalization of nanoparticle toxicity behavior. Lack of systematic approaches to investigate these risks further adds uncertainties and variability to the body of literature and limits generalizability of existing studies. Here, we developed a rigorous approach for assembling published evidence on cytotoxicity of several organic and inorganic nanoparticles and unraveled hidden relationships that were not targeted in the original publications. We used a machine learning approach that employs decision trees together with feature selection algorithms (e.g., Gain ratio) to analyze a set of published nanoparticle cytotoxicity sample data (2896 samples). The specific studies were selected because they specified nanoparticle-, cell-, and screening method-related attributes. The resultant decision-tree classifiers are sufficiently simple, accurate, and with high prediction power and should be widely applicable to a spectrum of nanoparticle cytotoxicity settings. Among several influential attributes, we show that the cytotoxicity of nanoparticles is primarily predicted from the nanoparticle material chemistry, followed by nanoparticle concentration and size, cell type, and cytotoxicity screening indicator. Overall, our study indicates that following rigorous and transparent methodological experimental approaches, in parallel to continuous addition to this data set developed using our approach, will offer higher predictive power and accuracy and uncover hidden relationships. Results obtained in this study help focus future studies to develop nanoparticles that are safe by design.