pr100594k_si_001.pdf (91.1 kB)
Efficient Marginalization to Compute Protein Posterior Probabilities from Shotgun Mass Spectrometry Data
journal contribution
posted on 2010-10-01, 00:00 authored by Oliver Serang, Michael J. MacCoss, William Stafford NobleThe problem of identifying proteins from a shotgun proteomics experiment has not been definitively solved. Identifying the proteins in a sample requires ranking them, ideally with interpretable scores. In particular, “degenerate” peptides, which map to multiple proteins, have made such a ranking difficult to compute. The problem of computing posterior probabilities for the proteins, which can be interpreted as confidence in a protein’s presence, has been especially daunting. Previous approaches have either ignored the peptide degeneracy problem completely, addressed it by computing a heuristic set of proteins or heuristic posterior probabilities, or estimated the posterior probabilities with sampling methods. We present a probabilistic model for protein identification in tandem mass spectrometry that recognizes peptide degeneracy. We then introduce graph-transforming algorithms that facilitate efficient computation of protein probabilities, even for large data sets. We evaluate our identification procedure on five different well-characterized data sets and demonstrate our ability to efficiently compute high-quality protein posteriors.
History
Usage metrics
Categories
Keywords
probabilityProtein Posterior Probabilitiesdata setspeptide degeneracy problemtandem mass spectrometryShotgun Mass Spectrometry DataThe problemPrevious approachesidentification procedureprotein posteriorsshotgun proteomics experimentprotein identificationsampling methodsEfficient Marginalizationpeptide degeneracyprotein probabilities
Licence
Exports
RefWorks
BibTeX
Ref. manager
Endnote
DataCite
NLM
DC