ci6b00260_si_001.txt (14.09 kB)
Automated Protocol for Large-Scale Modeling of Gene Expression Data
datasetposted on 2016-10-31, 00:00 authored by Michelle Lynn Hall, David Calkins, Woody Sherman
With the continued rise of phenotypic- and genotypic-based screening projects, computational methods to analyze, process, and ultimately make predictions in this field take on growing importance. Here we show how automated machine learning workflows can produce models that are predictive of differential gene expression as a function of a compound structure using data from A673 cells as a proof of principle. In particular, we present predictive models with an average accuracy of greater than 70% across a highly diverse ∼1000 gene expression profile. In contrast to the usual in silico design paradigm, where one interrogates a particular target-based response, this work opens the opportunity for virtual screening and lead optimization for desired multitarget gene expression profiles.