posted on 2019-05-13, 00:00authored byRiccardo Concu, M. Natália. D. S. Cordeiro, Cristian R. Munteanu, Humbert González-Díaz
Predicting enzyme
function and enzyme subclasses is always a key
objective in fields such as biotechnology, biochemistry, medicinal
chemistry, physiology, and so on. The Protein Data Bank (PDB) is the
largest information archive of biological macromolecular structures,
with more than 150 000 entries for proteins, nucleic acids,
and complex assemblies. Among these entries, there are more than 4000
proteins whose functions remain unknown because no detectable homology
to proteins whose functions are known has been found. The problem
is that our ability to isolate proteins and identify their sequences
far exceeds our ability to assign them a defined function. As a result,
there is a growing interest in this topic, and several methods have
been developed to identify protein function based on these innovative
approaches. In this work, we have applied perturbation theory to an
original data set consisting of 19 187 enzymes representing
all 59 subclasses present in the protein data bank. In addition, we
developed a series of artificial neural network models able to predict
enzyme–enzyme pairs of query-template sequences with accuracy,
specificity, and sensitivity greater than 90% in both training and
validation series. As a likely application of this methodology and
to further validate our approach, we used our novel model to predict
a set of enzymes belonging to the yeast Pichia stipites. This yeast has been widely studied because it is commonly present
in nature and produces a high ethanol yield by converting lignocellulosic
biomass into bioethanol through the xylose reductase enzyme. Using
this premise, we tested our model on 222 enzymes including xylose
reductase, that is, the enzyme responsible for the conversion of biomass
into bioethanol.