Chromosome-Based Proteomic Study for Identifying Novel Protein Variants from Human Hippocampal Tissue Using Customized neXtProt and GENCODE Databases
datasetposted on 2015-12-04, 00:00 authored by Heeyoun Hwang, Gun Wook Park, Kwang Hoe Kim, Ju Yeon Lee, Hyun Kyoung Lee, Eun Sun Ji, Sung-Kyu Robin Park, Tao Xu, John R. Yates, Kyung-Hoon Kwon, Young Mok Park, Hyoung-Joo Lee, Young-Ki Paik, Jin Young Kim, Jong Shin Yoo
The goal of the Chromosome-Centric Human Proteome Project (C-HPP) is to fully provide proteomic information from each human chromosome, including novel proteoforms, such as novel protein-coding variants expressed from noncoding genomic regions, alternative splicing variants (ASVs), and single amino acid variants (SAAVs). In the 144 LC/MS/MS raw files from human hippocampal tissues of control, epilepsy, and Alzheimer’s disease, we identified the novel proteoforms with a workflow including integrated proteomic pipeline using three different search engines, MASCOT, SEQUEST, and MS-GF+. With a <1% false discovery rate (FDR) at the protein level, the 11 detected peptides mapped to four translated long noncoding RNA variants against the customized databases of GENCODE lncRNA, which also mapped to coding-proteins at different chromosomal sites. We also identified four novel ASVs against the customized databases of GENCODE transcript. The target peptides from the variants were validated by tandem MS fragmentation pattern from their corresponding synthetic peptides. Additionally, a total of 128 SAAVs paired with their wild-type peptides were identified with FDR <1% at the peptide level using a customized database from neXtProt including nonsynonymous single nucleotide polymorphism (nsSNP) information. Among these results, several novel variants related in neuro-degenerative disease were identified using the workflow that could be applicable to C-HPP studies. All raw files used in this study were deposited in ProteomeXchange (PXD000395).