pr9b00537_si_001.pdf (4.97 MB)
Blinded Testing of Function Annotation for uPE1 Proteins by I‑TASSER/COFACTOR Pipeline Using the 2018–2019 Additions to neXtProt and the CAFA3 Challenge
journal contribution
posted on 2019-10-18, 15:03 authored by Chengxin Zhang, Lydie Lane, Gilbert S. Omenn, Yang ZhangIn 2018, we reported
a hybrid pipeline that predicts protein structures
with I-TASSER and function with COFACTOR. I-TASSER/COFACTOR achieved
Gene Ontology (GO) high prediction accuracies of Fmax = 0.69 and 0.57
for molecular function (MF) and biological process (BP), respectively,
on 100 comprehensively annotated proteins. Now we report blinded analyses
of newly annotated proteins in the critical assessment of function
annotation (CAFA) three function prediction challenge and in neXtProt.
For CAFA3 results released in May 2019, our predictions on 267 and
912 human proteins with newly annotated MF and BP terms achieved Fmax
= 0.50 and 0.42, respectively, on “No Knowledge” proteins,
and 0.51 and 0.74, respectively, on “Limited Knowledge”
proteins. While COFACTOR consistently outperforms simple homology-based
analysis, its accuracy still depends on template availability. Meanwhile,
in neXtProt 2019–01, 25 proteins acquired new function annotation
through literature curation at UniProt/Swiss-Prot. Before the release
of these curated results, we submitted to neXtProt blinded predictions
of free-text function annotation based on predicted GO terms. For
10 of the 25, a good match of free-text or GO term annotation was
obtained. These blind tests represent rigorous assessments of I-TASSER/COFACTOR.
neXtProt now provides links to precomputed I-TASSER/COFACTOR predictions
for proteins without function annotation to facilitate experimental
planning on “dark proteins”.