pr400802z_si_008.pdf (176.49 kB)
Cleaved and Missed Sites for Trypsin, Lys-C, and Lys‑N Can Be Predicted with High Confidence on the Basis of Sequence Context
journal contribution
posted on 2014-02-07, 00:00 authored by Paul D. GershonTrypsin,
Lys-C, and Lys-N are the most broadly used enzymes in
proteomics. Here, on the basis of large-scale peptide mass spectrometry
(MS) data sets, an approach is described to confidently identify missed
cleavage sites in either phosphorylated or unmodified substrates for
these three proteases, or any protease, on the basis of side chain
species present within 15 residues of the cleavage-specificity residue.
Previously known effects of proline, negatively charged side chains,
and phospho-modified residues have been quantified, and additional
side chain effects were noted. By applying a set of quantitative side
chain rules established for each of the three proteases, scissile
and nonscissile sites could be established, on the basis of protein
sequence alone, with near certainty for Lys-C, and with a high degree
of confidence for trypsin or Lys-N. These rules were applicable to
orthogonal peptide data sets, including the two largest in the PeptideAtlas
database. The approach described here facilitates the comprehensive
modeling of substrate recognition in proteolysis.