Version 2 2019-02-07, 16:25Version 2 2019-02-07, 16:25
Version 1 2019-01-25, 22:15Version 1 2019-01-25, 22:15
dataset
posted on 2019-01-14, 00:00authored byQiang Kou, Zhe Wang, Rachele A. Lubeckyj, Si Wu, Liangliang Sun, Xiaowen Liu
Top-down
mass spectrometry is capable of identifying whole proteoform
sequences with multiple post-translational modifications because it
generates tandem mass spectra directly from intact proteoforms. Many
software tools, such as ProSightPC, MSPathFinder, and TopMG, have
been proposed for identifying proteoforms with modifications. In these
tools, various methods are employed to estimate the statistical significance
of identifications. However, most existing methods are designed for
proteoform identifications without modifications, and the challenge
remains for accurately estimating the statistical significance of
proteoform identifications with modifications. Here we propose TopMCMC,
a method that combines a Markov chain random walk algorithm and a
greedy algorithm for assigning statistical significance to matches
between spectra and protein sequences with variable modifications.
Experimental results showed that TopMCMC achieved high accuracy in
estimating E-values and false discovery rates of
identifications in top-down mass spectrometry. Coupled with TopMG,
TopMCMC identified more spectra than the generating function method
from an MCF-7 top-down mass spectrometry data set.