pr9b00555_si_003.xlsx (17.2 kB)
Repeat-Preserving Decoy Database for False Discovery Rate Estimation in Peptide Identification
dataset
posted on 2020-02-21, 13:05 authored by Johra
Muhammad Moosa, Shenheng Guan, Michael F. Moran, Bin MaThe
sequence database searching method is widely used in proteomics
for peptide identification. To control the false discovery rate (FDR)
of the searching results, the target–decoy method generates
and searches a decoy database together with the target database. A
known problem is that the target protein sequence database may contain
numerous repeated peptides. The structures of these repeats are not
preserved by most existing decoy generation algorithms. Previous studies
suggest that such discrepancy between the target and decoy databases
may lead to an inaccurate FDR estimation. Based on the de Bruijn graph
model, we propose a new repeat-preserving algorithm to generate decoy
databases. We prove that this algorithm preserves the structures of
the repeats in the target database to a great extent. The de Bruijn
method has been compared with a few other commonly used methods and
demonstrated superior FDR estimation accuracy and an improved number
of peptide identification.