Shared Consensus Machine Learning Models for Predicting Blood Stage Malaria Inhibition
datasetposted on 03.03.2017, 00:00 by Andreas Verras, Chris L. Waller, Peter Gedeck, Darren V. S. Green, Thierry Kogej, Anandkumar Raichurkar, Manoranjan Panda, Anang Shelat, Julie Clark, Kip Guy, George Papadatos, Jeremy Burrows
The development of new antimalarial therapies is essential, and lowering the barrier of entry for the screening and discovery of new lead compound classes can spur drug development at organizations that may not have large compound screening libraries or resources to conduct high-throughput screens. Machine learning models have been long established to be more robust and have a larger domain of applicability with larger training sets. Screens over multiple data sets to find compounds with potential malaria blood stage inhibitory activity have been used to generate multiple Bayesian models. Here we describe a method by which Bayesian quantitative structure–activity relationship models, which contain information on thousands to millions of proprietary compounds, can be shared between collaborators at both for-profit and not-for-profit institutions. This model-sharing paradigm allows for the development of consensus models that have increased predictive power over any single model and yet does not reveal the identity of any compounds in the training sets.