pr060404b_si_002.pdf (422.04 kB)
Large-Scale Predictions of Gram-Negative Bacterial Protein Subcellular Locations
journal contribution
posted on 2006-12-01, 00:00 authored by Kuo-Chen Chou, Hong-Bin ShenMany species of Gram-negative bacteria are pathogenic bacteria that can cause disease in a host
organism. This pathogenic capability is usually associated with certain components in Gram-negative
cells. Therefore, developing an automated method for fast and reliabe prediction of Gram-negative
protein subcellular location will allow us to not only timely annotate gene products, but also screen
candidates for drug discovery. However, protein subcellular location prediction is a very difficult
problem, particularly when more location sites need to be involved and when unknown query proteins
do not have significant homology to proteins of known subcellular locations. PSORT-B, a recently
updated version of PSORT, widely used for predicting Gram-negative protein subcellular location, only
covers five location sites. Also, the data set used to train PSORT-B contains many proteins with high
degrees of sequence identity in a same location group and, hence, may bear a strong homology bias.
To overcome these problems, a new predictor, called “Gneg-PLoc”, is developed. Featured by fusing
many basic classifiers each being trained with a stringent data set containing proteins with strictly less
than 25% sequence identity to one another in a same location group, the new predictor can cover
eight subcellular locations; that is, cytoplasm, extracellular space, fimbrium, flagellum, inner membrane,
nucleoid, outer membrane, and periplasm. In comparison with PSORT-B, the new predictor not only
covers more subcellular locations, but also yields remarkably higher success rates. Gneg-PLoc is
available as a Web server at http://202.120.37.186/bioinf/Gneg. To support the demand of people working
in the relevant areas, a downloadable file is provided at the same Web site to list the results identified
by Gneg-PLoc for 49 907 Gram-negative protein entries in the Swiss-Prot database that have no
subcellular location annotations or are annotated with uncertain terms. The large-scale results will be
updated twice a year to cover the new entries of Gram-negative bacterial proteins and reflect the new
development of Gneg-PLoc.
Keywords: Gram-negative • Subcellular compartment • Gene ontology • Amphiphilic pseudo amino acid composition
• Fusion • K-nearest neighbor rule