Cleaved and Missed Sites for Trypsin, Lys-C, and Lys‑N Can Be Predicted with High Confidence on the Basis of Sequence Context
2014-02-07T00:00:00Z (GMT) by
Trypsin, Lys-C, and Lys-N are the most broadly used enzymes in proteomics. Here, on the basis of large-scale peptide mass spectrometry (MS) data sets, an approach is described to confidently identify missed cleavage sites in either phosphorylated or unmodified substrates for these three proteases, or any protease, on the basis of side chain species present within 15 residues of the cleavage-specificity residue. Previously known effects of proline, negatively charged side chains, and phospho-modified residues have been quantified, and additional side chain effects were noted. By applying a set of quantitative side chain rules established for each of the three proteases, scissile and nonscissile sites could be established, on the basis of protein sequence alone, with near certainty for Lys-C, and with a high degree of confidence for trypsin or Lys-N. These rules were applicable to orthogonal peptide data sets, including the two largest in the PeptideAtlas database. The approach described here facilitates the comprehensive modeling of substrate recognition in proteolysis.