ct1c00129_si_001.pdf (637.52 kB)
Download fileTopDomain: Exhaustive Protein Domain Boundary Metaprediction Combining Multisource Information and Deep Learning
journal contribution
posted on 2021-06-23, 22:03 authored by Daniel Mulnaes, Pegah Golchin, Filip Koenig, Holger GohlkeProtein domains are independent,
functional, and stable structural
units of proteins. Accurate protein domain boundary prediction plays
an important role in understanding protein structure and evolution,
as well as for protein structure prediction. Current domain boundary
prediction methods differ in terms of boundary definition, methodology,
and training databases resulting in disparate performance for different
proteins. We developed TopDomain, an exhaustive metapredictor, that
uses deep neural networks to combine multisource information from
sequence- and homology-based features of over 50 primary predictors.
For this purpose, we developed a new domain boundary data set termed
the TopDomain data set, in which the true annotations are informed
by SCOPe annotations, structural domain parsers, human inspection,
and deep learning. We benchmark TopDomain against 2484 targets with
3354 boundaries from the TopDomain test set and achieve F1 scores
of 78.4% and 73.8% for multidomain boundary prediction within ±20
residues and ±10 residues of the true boundary, respectively.
When examined on targets from CASP11-13 competitions, TopDomain achieves
F1 scores of 47.5% and 42.8% for multidomain proteins. TopDomain significantly
outperforms 15 widely used, state-of-the-art ab initio and homology-based domain boundary predictors. Finally, we implemented
TopDomainTMC, which accurately predicts whether domain
parsing is necessary for the target protein.
History
Usage metrics
Read the peer-reviewed publication
Categories
Keywords
multidomain boundary predictionTopDomain TMCExhaustive Protein Domain Boundary ...CASP 11-13 competitionshomology-based featuresbenchmark TopDomainCurrent domain boundary prediction ...F 1 scoresdomain boundary dataTopDomain testDeep Learning Protein domainsab initio3354 boundariesAccurate protein domain boundary pr...TopDomain dataMultisource Informationdomain parsersSCOPe annotations2484 targetshomology-based domain boundary pred...protein structure predictionmultisource informationboundary definitiontraining databasesunderstanding protein structuremultidomain proteinstarget protein