Application of Bioactivity Profile-Based Fingerprints for Building Machine Learning Models
journal contributionposted on 09.11.2018, 00:00 by Noé Sturm, Jiangming Sun, Yves Vandriessche, Andreas Mayr, Günter Klambauer, Lars Carlsson, Ola Engkvist, Hongming Chen
The volume of high throughput screening data has considerably increased since the beginning of the automated biochemical and cell-based assays era. This information-rich data source provides tremendous repurposing opportunities for data mining. It was recently shown that biochemical or cell-based assay results can be compiled into so-called high-throughput fingerprints (HTSFPs) as a new type of descriptor describing molecular bioactivity profiles which can be applied in virtual screening, iterative screening, and target deconvolution. However, so far, studies around HTSFPs and machine learning have mainly focused on predicting the outcome of molecules in single high-throughput assays, and no one has reported the modeling of compounds’ biochemical assay activities toward a panel of target proteins. In this article, we aim at comparing how our in-house HTSFPs perform at this when combined with multitask deep learning versus the single task support vector machine method both in terms of hit identification and of scaffold hopping potential. Performances obtained from the two HTSFP models were reported with respect to the performances of multitask deep learning and support vector machine models built with the structural descriptors ECFP. Moreover, we investigated the effect of high throughput screening false positives and negatives on the performance of the generated models. Our results showed that the two fingerprints yielded in similar performances and diverse hits with very little overlap, thus demonstrating the orthogonality of bioactivity profile-based descriptors with structural descriptors. Therefore, modeling compound activity data using ECFPs together with HTSFPs increases the scaffold hopping potential of the predictive models.
Read the peer-reviewed publication
target proteinsdescriptors ECFPtask support vector machine methodbioactivity profileshigh-throughput fingerprintsthroughput screening dataiterative screeningHTSFP modelsthroughput screeningBioactivity Profile-Based Fingerprintsmodeling compound activity dataBuilding Machine Learning Modelsscaffoldcell-based assay resultsassay activitiestarget deconvolutiondata miningmultitaskrepurposing opportunitiescell-based assays erasupport vector machine modelsbioactivity profile-based descriptorsinformation-rich data sourceHTSFPs increasesperformancehigh-throughput assays