posted on 2017-08-21, 00:00authored byFengchao Yu, Ning Li, Weichuan Yu
Chemical
cross-linking coupled to mass spectrometry is a powerful
tool to study protein–protein interactions and protein conformations.
Two linked peptides are ionized and fragmented to produce a tandem
mass spectrum. In such an experiment, a tandem mass spectrum contains
ions from two peptides. The peptide identification problem becomes
a peptide–peptide pair identification problem. Currently, most
tools do not search all possible pairs due to the quadratic time complexity.
Consequently, missed findings are unavoidable. In our previous work,
we developed a tool named ECL to search all pairs of peptides exhaustively.
Unfortunately, it is very slow due to the quadratic computational
complexity, especially when the database is large. Furthermore, ECL
uses a score function without statistical calibration, while researchers− have proposed that it is inappropriate to directly compare uncalibrated
scores because different spectra have different random score distributions.
Here we propose an advanced version of ECL, named ECL2. It achieves
a linear time and space complexity by taking advantage of the additive
property of a score function. It can search a data set containing
tens of thousands of spectra against a database containing thousands
of proteins in a few hours. Comparison with other five state-of-the-art
tools shows that ECL2 is much faster than pLink, StavroX, ProteinProspector,
and ECL. Kojak is the only one that is faster than ECL2, but Kojak
does not exhaustively search all possible peptide pairs. The comparison
shows that ECL2 has the highest sensitivity among the state-of-the-art
tools. The experiment using a large-scale in vivo cross-linking data
set demonstrates that ECL2 is the only tool that can find the peptide-spectrum
matches (PSMs) passing the false discovery rate/q-value threshold. The result illustrates that the exhaustive search
and a well-calibrated score function are useful to find PSMs from
a huge search space.