pr8b00949_si_002.xlsx (34.93 MB)
PTML Model of Enzyme Subclasses for Mining the Proteome of Biofuel Producing Microorganisms
datasetposted on 2019-05-13, 00:00 authored by Riccardo Concu, M. Natália. D. S. Cordeiro, Cristian R. Munteanu, Humbert González-Díaz
Predicting enzyme function and enzyme subclasses is always a key objective in fields such as biotechnology, biochemistry, medicinal chemistry, physiology, and so on. The Protein Data Bank (PDB) is the largest information archive of biological macromolecular structures, with more than 150 000 entries for proteins, nucleic acids, and complex assemblies. Among these entries, there are more than 4000 proteins whose functions remain unknown because no detectable homology to proteins whose functions are known has been found. The problem is that our ability to isolate proteins and identify their sequences far exceeds our ability to assign them a defined function. As a result, there is a growing interest in this topic, and several methods have been developed to identify protein function based on these innovative approaches. In this work, we have applied perturbation theory to an original data set consisting of 19 187 enzymes representing all 59 subclasses present in the protein data bank. In addition, we developed a series of artificial neural network models able to predict enzyme–enzyme pairs of query-template sequences with accuracy, specificity, and sensitivity greater than 90% in both training and validation series. As a likely application of this methodology and to further validate our approach, we used our novel model to predict a set of enzymes belonging to the yeast Pichia stipites. This yeast has been widely studied because it is commonly present in nature and produces a high ethanol yield by converting lignocellulosic biomass into bioethanol through the xylose reductase enzyme. Using this premise, we tested our model on 222 enzymes including xylose reductase, that is, the enzyme responsible for the conversion of biomass into bioethanol.
150 000validation seriesprotein data bank19 187 enzymeslignocellulosic biomassnovel modelenzyme subclassesBiofuel Producing Microorganismsxylose reductaseEnzyme Subclasses4000 proteins222 enzymes150 000 entriesnetwork modelsprotein functionPTML Model19 187information archive59 subclassesyeast Pichia stipitesPDBperturbation theoryquery-template sequencesenzyme functionProtein Data Bankxylose reductase enzyme