ac051127f_si_003.xls (749.5 kB)
Improving Sensitivity in Shotgun Proteomics Using a Peptide-Centric Database with Reduced Complexity: Protease Cleavage and SCX Elution Rules from Data Mining of MS/MS Spectra
dataset
posted on 2006-02-15, 00:00 authored by Chia-Yu Yen, Steve Russell, Alex M. Mendoza, Karen Meyer-Arendt, Shaojun Sun, Krzysztof J. Cios, Natalie G. Ahn, Katheryn A. ResingCorrect identification of a peptide sequence from MS/MS
data is still a challenging research problem, particularly
in proteomic analyses of higher eukaryotes where protein
databases are large. The scoring methods of search
programs often generate cases where incorrect peptide
sequences score higher than correct peptide sequences
(referred to as distraction). Because smaller databases
yield less distraction and better discrimination between
correct and incorrect assignments, we developed a method
for editing a peptide-centric database (PC-DB) to remove
unlikely sequences and strategies for enabling search
programs to utilize this peptide database. Rules for
unlikely missed cleavage and nontryptic proteolysis products were identified by data mining 11 849 high-confidence peptide assignments. We also evaluated ion exchange chromatographic behavior as an editing criterion
to generate subset databases. When used to search a well-annotated test data set of MS/MS spectra, we found no
loss of critical information using PC-DBs, validating the
methods for generating and searching against the databases. On the other hand, improved confidence in peptide
assignments was achieved for tryptic peptides, measured
by changes in ΔCN and RSP. Decreased distraction was
also achieved, consistent with the 3−9-fold decrease in
database size. Data mining identified a major class of
common nonspecific proteolytic products corresponding
to leucine aminopeptidase (LAP) cleavages. Large improvements in identifying LAP products were achieved
using the PC-DB approach when compared with conventional searches against protein databases. These results
demonstrate that peptide properties can be used to reduce
database size, yielding improved accuracy and information
capture due to reduced distraction, but with little loss of
information compared to conventional protein database
searches.