Human Spermatozoa as a Model for Detecting Missing Proteins in the Context of the Chromosome-Centric Human Proteome Project

The Chromosome-Centric Human Proteome Project (C-HPP) aims at cataloguing the proteins as gene products encoded by the human genome in a chromosome-centric manner. The existence of products of about 82% of the genes has been confirmed at the protein level. However, the number of so-called “missing proteins” remains significant. It was recently suggested that the expression of proteins that have been systematically missed might be restricted to particular organs or cell types, for example, the testis. Testicular function, and spermatogenesis in particular, is conditioned by the successive activation or repression of thousands of genes and proteins including numerous germ cell- and testis-specific products. Both the testis and postmeiotic germ cells are thus promising sites at which to search for missing proteins, and ejaculated spermatozoa are a potential source of proteins whose expression is restricted to the germ cell lineage. A trans-chromosome-based data analysis was performed to catalog missing proteins in total protein extracts from isolated human spermatozoa. We have identified and manually validated peptide matches to 89 missing proteins in human spermatozoa. In addition, we carefully validated three proteins that were scored as uncertain in the latest neXtProt release (09.19.2014). A focus was then given to the 12 missing proteins encoded on chromosomes 2 and 14, some of which may putatively play roles in ciliation and flagellum mechanistics. The expression pattern of C2orf57 and TEX37 was confirmed in the adult testis by immunohistochemistry. On the basis of transcript expression during human spermatogenesis, we further consider the potential for discovering additional missing proteins in the testicular postmeiotic germ cell lineage and in ejaculated spermatozoa. This project was conducted as part of the C-HPP initiatives on chromosomes 14 (France) and 2 (Switzerland). The mass spectrometry proteomics data have been deposited with the ProteomeXchange Consortium under the data set identifier PXD002367.