posted on 2014-01-03, 00:00authored byMohammad
T. Islam, Gagan Garg, William S. Hancock, Brian
A. Risk, Mark S. Baker, Shoba Ranganathan
The chromosome-centric human proteome
project (C-HPP) aims to define
the complete set of proteins encoded in each human chromosome. The
neXtProt database (September 2013) lists 20 128 proteins for
the human proteome, of which 3831 human proteins (∼19%) are
considered “missing” according to the standard metrics
table (released September 27, 2013). In support of the C-HPP initiative,
we have extended the annotation strategy developed for human chromosome
7 “missing” proteins into a semiautomated pipeline to
functionally annotate the “missing” human proteome.
This pipeline integrates a suite of bioinformatics analysis and annotation
software tools to identify homologues and map putative functional
signatures, gene ontology, and biochemical pathways. From sequential
BLAST searches, we have primarily identified homologues from reviewed
nonhuman mammalian proteins with protein evidence for 1271 (33.2%)
“missing” proteins, followed by 703 (18.4%) homologues
from reviewed nonhuman mammalian proteins and subsequently 564 (14.7%)
homologues from reviewed human proteins. Functional annotations for
1945 (50.8%) “missing” proteins were also determined.
To accelerate the identification of “missing” proteins
from proteomics studies, we generated proteotypic peptides in silico. Matching these proteotypic peptides to ENCODE
proteogenomic data resulted in proteomic evidence for 107 (2.8%) of
the 3831 “missing proteins, while evidence from a recent membrane
proteomic study supported the existence for another 15 “missing”
proteins. The chromosome-wise functional annotation of all “missing”
proteins is freely available to the scientific community through our
web server (http://biolinfo.org/protannotator).