posted on 2018-11-28, 00:00authored byLi Xue, Bin Tang, Wei Chen, Jiesi Luo
The
CRISPR-Cas9 system derived from adaptive immunity in bacteria
and archaea has been developed into a powerful tool for genome engineering
with wide-ranging applications. Optimizing single-guide RNA (sgRNA)
design to improve efficiency of target cleavage is a key step for
successful gene editing using the CRISPR-Cas9 system. Because not
all sgRNAs that cognate to a given target gene are equally effective,
computational tools have been developed based on experimental data
to increase the likelihood of selecting effective sgRNAs. Despite
considerable efforts to date, it still remains a big challenge to
accurately predict functional sgRNAs directly from large-scale sequence
data. We propose DeepCas9, a deep-learning framework based on the
convolutional neural network (CNN), to automatically learn the sequence
determinants and further enable the identification of functional sgRNAs
for the CRISPR-Cas9 system. We show that the CNN method outperforms
previous methods in both (i) the ability to correctly identify highly
active sgRNAs in experiments not used in the training and (ii) the
ability to accurately predict the target efficacies of sgRNAs in different
organisms. Besides, we further visualize the convolutional kernels
and show the match of identified sequence signatures and known nucleotide
preferences. We finally demonstrate the application of our method
to the design of next-generation genome-scale CRISPRi and CRISPRa
libraries targeting human and mouse genomes. We expect that DeepCas9
will assist in reducing the numbers of sgRNAs that must be experimentally
validated to enable more effective and efficient genetic screens and
genome engineering. DeepCas9 can be freely accessed via the Internet
at https://github.com/lje00006/DeepCas9.