LevSeq: Rapid
Generation of Sequence-Function Data
for Directed Evolution and Machine Learning
Posted on 2024-12-24 - 21:13
Sequence-function data provides valuable information
about the
protein functional landscape but is rarely obtained during directed
evolution campaigns. Here, we present Long-read every variant Sequencing
(LevSeq), a pipeline that combines a dual barcoding strategy with
nanopore sequencing to rapidly generate sequence-function data for
entire protein-coding genes. LevSeq integrates into existing protein
engineering workflows and comes with open-source software for data
analysis and visualization. The pipeline facilitates data-driven protein
engineering by consolidating sequence-function data to inform directed
evolution and provide the requisite data for machine learning-guided
protein engineering (MLPE). LevSeq enables quality control of mutagenesis
libraries prior to screening, which reduces time and resource costs.
Simulation studies demonstrate LevSeq’s ability to accurately
detect variants under various experimental conditions. Finally, we
show LevSeq’s utility in engineering protoglobins for new-to-nature
chemistry. Widespread adoption of LevSeq and sharing of the data will
enhance our understanding of protein sequence-function landscapes
and empower data-driven directed evolution.
CITE THIS COLLECTION
DataCiteDataCite
No result found
Long, Yueming; Mora, Ariane; Li, Francesca-Zhoufan; Gürsoy, Emre; Johnston, Kadina E.; Arnold, Frances H. (2024). LevSeq: Rapid
Generation of Sequence-Function Data
for Directed Evolution and Machine Learning. ACS Publications. Collection. https://doi.org/10.1021/acssynbio.4c00625