Systematically Scrutinizing the Impact of Substitution Sites on Thermostability and Detergent Tolerance for Bacillus subtilis Lipase A
datasetposted on 16.01.2020 by Christina Nutschel, Alexander Fulton, Olav Zimmermann, Ulrich Schwaneberg, Karl-Erich Jaeger, Holger Gohlke
Datasets usually provide raw data for analysis. This raw data often comes in spreadsheet form, but can be any collection of data, on which analysis can be performed.
Improving an enzyme’s (thermo-)stability or tolerance against solvents and detergents is highly relevant in protein engineering and biotechnology. Recent developments have tended toward data-driven approaches, where available knowledge about the protein is used to identify substitution sites with high potential to yield protein variants with improved stability, and subsequently, substitutions are engineered by site-directed or site-saturation (SSM) mutagenesis. However, the development and validation of algorithms for data-driven approaches have been hampered by the lack of availability of large-scale data measured in a uniform way and being unbiased with respect to substitution types and locations. Here, we extend our knowledge on guidelines for protein engineering following a data-driven approach by scrutinizing the impact of substitution sites on thermostability or/and detergent tolerance for Bacillus subtilis lipase A (BsLipA) at very large scale. We systematically analyze a complete experimental SSM library of BsLipA containing all 3439 possible single variants, which was evaluated as to thermostability and tolerances against four detergents under respectively uniform conditions. Our results provide systematic and unbiased reference data at unprecedented scale for a biotechnologically important protein, identify consistently defined hot spot types for evaluating the performance of data-driven protein-engineering approaches, and show that the rigidity theory and ensemble-based approach Constraint Network Analysis yields hot spot predictions with an up to ninefold gain in precision over random classification.