Examining Troughs in the Mass Distribution of All Theoretically Possible Tryptic Peptides
datasetposted on 02.09.2011, 00:00 by Alexey V. Nefedov, Indranil Mitra, Allan R. Brasier, Rovshan G. Sadygov
Datasets usually provide raw data for analysis. This raw data often comes in spreadsheet form, but can be any collection of data, on which analysis can be performed.
This work describes the mass distribution of all theoretically possibly tryptic peptides made of 20 amino acids, up to the mass of 3 kDa, with resolution of 0.001 Da. We characterize regions between the peaks of the distribution, including gaps (forbidden zones) and low-populated areas (quiet zones). We show how the gaps shrink over the mass range and when they completely disappear. We demonstrate that peptide compositions in quiet zones are less diverse than those in the peaks of the distribution and that by eliminating certain types of unrealistic compositions the gaps in the distribution may be increased. The mass distribution is generated using a parallel implementation of a recursive procedure that enumerates all amino acid compositions. It allows us to enumerate all compositions of tryptic peptides below 3 kDa in 48 min using a computer cluster with 12 Intel Xeon X5650 CPUs (72 cores). The results of this work can be used to facilitate protein identification and mass defect labeling in mass spectrometry-based proteomics experiments.