pr9b00566_si_001.xlsx (44.14 kB)
EPIFANY: A Method for Efficient High-Confidence Protein Inference
dataset
posted on 2020-02-13, 18:04 authored by Julianus Pfeuffer, Timo Sachsenberg, Tjeerd M. H. Dijkstra, Oliver Serang, Knut Reinert, Oliver KohlbacherAccurate
protein inference in the presence of shared peptides is
still one of the key problems in bottom-up proteomics. Most protein
inference tools employing simple heuristic inference strategies are
efficient but exhibit reduced accuracy. More advanced probabilistic
methods often exhibit better inference quality but tend to be too
slow for large data sets. Here, we present a novel protein inference
method, EPIFANY, combining a loopy belief propagation algorithm with
convolution trees for efficient processing of Bayesian networks. We
demonstrate that EPIFANY combines the reliable protein inference of
Bayesian methods with significantly shorter runtimes. On the 2016
iPRG protein inference benchmark data, EPIFANY is the only tested
method that finds all true-positive proteins at a 5% protein false
discovery rate (FDR) without strict prefiltering on the peptide-spectrum
match (PSM) level, yielding an increase in identification performance
(+10% in the number of true positives and +14% in partial AUC) compared
to previous approaches. Even very large data sets with hundreds of
thousands of spectra (which are intractable with other Bayesian and
some non-Bayesian tools) can be processed with EPIFANY within minutes.
The increased inference quality including shared peptides results
in better protein inference results and thus increased robustness
of the biological hypotheses generated. EPIFANY is available as open-source
software for all major platforms at https://OpenMS.de/epifany.