posted on 2023-11-21, 13:20authored byKlaas J. van Wijk, Tami Leppert, Zhi Sun, Alyssa Kearly, Margaret Li, Luis Mendoza, Isabell Guzchenko, Erica Debley, Georgia Sauermann, Pratyush Routray, Sagunya Malhotra, Andrew Nelson, Qi Sun, Eric W. Deutsch
This study describes a new release of the Arabidopsis
thaliana PeptideAtlas proteomics resource (build 2023–10)
providing
protein sequence coverage, matched mass spectrometry (MS) spectra,
selected post-translational modifications (PTMs), and metadata. 70
million MS/MS spectra were matched to the Araport11 annotation, identifying
∼0.6 million unique peptides and 18,267 proteins at the highest
confidence level and 3396 lower confidence proteins, together representing
78.6% of the predicted proteome. Additional identified proteins not
predicted in Araport11 should be considered for the next Arabidopsis genome annotation. This release identified 5198 phosphorylated proteins,
668 ubiquitinated proteins, 3050 N-terminally acetylated proteins,
and 864 lysine-acetylated proteins and mapped their PTM sites. MS
support was lacking for 21.4% (5896 proteins) of the predicted Araport11
proteome: the “dark” proteome. This dark proteome is
highly enriched for E3 ligases, transcription factors, and for certain
(e.g., CLE, IDA, PSY) but not other (e.g., THIONIN, CAP) signaling
peptides families. A machine learning model trained on RNA expression
data and protein properties predicts the probability that proteins
will be detected. The model aids in discovery of proteins with short
half-life (e.g., SIG1,3 and ERF-VII TFs) and for developing strategies
to identify the missing proteins. PeptideAtlas is linked to TAIR,
tracks in JBrowse, and several other community proteomics resources.