Semisupervised Model-Based Validation of Peptide Identifications in Mass Spectrometry-Based Proteomics
journal contributionposted on 04.01.2008, 00:00 by Hyungwon Choi, Alexey I. Nesvizhskii
Development of robust statistical methods for validation of peptide assignments to tandem mass (MS/MS) spectra obtained using database searching remains an important problem. PeptideProphet is one of the commonly used computational tools available for that purpose. An alternative simple approach for validation of peptide assignments is based on addition of decoy (reversed, randomized, or shuffled) sequences to the searched protein sequence database. The probabilistic modeling approach of PeptideProphet and the decoy strategy can be combined within a single semisupervised framework, leading to improved robustness and higher accuracy of computed probabilities even in the case of most challenging data sets. We present a semisupervised expectation-maximization (EM) algorithm for constructing a Bayes classifier for peptide identification using the probability mixture model, extending PeptideProphet to incorporate decoy peptide matches. Using several data sets of varying complexity, from control protein mixtures to a human plasma sample, and using three commonly used database search programs, SEQUEST, MASCOT, and TANDEM/k-score, we illustrate that more accurate mixture estimation leads to an improved control of the false discovery rate in the classification of peptide assignments.