posted on 2005-04-15, 00:00authored byDavid L. Tabb, Chandrasegaran Narasimhan, Michael Brad Strader, Robert L. Hettich
Database search identification algorithms, such as Sequest and Mascot, constitute powerful enablers for proteomic tandem mass spectrometry. We introduce DBDigger, an algorithm that reorganizes the database identification process to remove a problematic bottleneck. Typically such algorithms determine which candidate sequences can be compared to each spectrum. Instead,
DBDigger determines which spectra can be compared to
each candidate sequence, enabling the software to generate candidate sequences only once for each HPLC separation rather than for each spectrum. This reorganization
also reduces the number of times a spectrum must be
predicted for a particular candidate sequence and charge
state. As a result, DBDigger can accelerate some database
searches by more than an order of magnitude. In addition,
the software offers features to reduce the performance
degradation introduced by posttranslational modification
(PTM) searching. DBDigger allows researchers to specify
the sequence context in which each PTM is possible. In
the case of CNBr digests, for example, modified methionine residues can be limited to occur only at the C-termini
of peptides. Use of “context-dependent” PTM searching
reduces the performance penalty relative to traditional
PTM searching. We characterize the performance possible
with DBDigger, showcasing MASPIC, a new statistical
scorer. We describe the implementation of these innovations in the hope that other researchers will employ them
for rapid and highly flexible proteomic database search.