pr049835p_si_001.pdf (24.16 kB)

Using Functional Domain Composition To Predict Enzyme Family Classes

Download (24.16 kB)
journal contribution
posted on 14.02.2005, 00:00 by Yu-Dong Cai, Kuo-Chen Chou
According to their main EC (Enzyme Commission) numbers, enzymes are classified into the following 6 main classes:  oxidoreductases, transferases, hydrolases, lyases, isomerases, and ligases. A new method has been developed to predict the enzymatic attribute of proteins by introducing the functional domain composition to formulate a given protein sequence. The advantage by doing so is that both the sequence-order-related features and the function-related features are naturally incorporated in the predictor. As a demonstration, the jackknife cross-validation test was performed on a dataset that consists of proteins with only less than 20% sequence identity to each other in order to get rid of any homologous bias. The overall success rate thus obtained was 85% in identifying the enzyme family classes (including the identification of nonenzyme protein sequences as well). The success rate is significantly higher than those obtained by the other methods on such a stringent dataset. This indicates that using the functional domain composition to represent protein samples for statistical prediction is indeed very promising, and will become a powerful tool in bioinformatics and proteomics. Keywords: classification of enzyme commission • enzymatic attribute • functional domain composition • 20% threshold cutoff • nearest neighbor predictor • bioinformatics • proteomics